LOGO

Hi There 👋

I'm Ben, a Data Scientist with a strong skill in Python, R, fluent in Russian, English and a dash of French. My interests lies in turning data into actionable insights. Using statistics, machine learning, and NLP, I can create solutions that make a real-world impact. Whether solo or in a team, I focus on delivering practical and meaningful results. Off-duty, I'm an outdoor enthusiast—think snowboarding 🏂 and trekking🚶🏽. But water sports 🏊 ? Not my thing.

Bakwenye Benjamin

Projects

Customer Behavior Modelling

This project utilizes free customer data from Kaggle to explore, predict, and visualize customer behavior.

machine learningllmnlp

Natural Language Processing

Utilizing NLP methodology to enhance document ingestion and data acquisition

machine learningllmnlp

Forecasting

A short description here

machine learningllmnlp

Object Oriented Programming

A short description here

machine learningllmnlp

Experience

Data Scientist | Project officer

  • International Monetary Fund · Contract
  • Feb 2026 - Present · 4 mos
  • Barcelona, Catalonia, Spain · Remote

Develop NLP pipelines to extract policy information from IMF reports using paragraph-level text classification.

Construct a large-scale dataset of macroeconomic policy actions across countries and time.

Implement econometric models to analyze the impact of policy interventions on conflict onset risk.

Contribute to research at the International Monetary Fund on the role of macroeconomic policy in preventing political instability and conflict.

Python / Machine LearningNatural Language Processing (NLP)EconometricsText MiningPolicy Analysis

Junior Data Scientist

  • smartheart · Contract
  • Nov 2025 - Jan 2026 · 3 mos
  • Zurich, Switzerland · Remote

Responsible for enhancing the Customer Company’s data and analytics capabilities by developing business intelligence solutions, performing data analysis, optimizing internal data systems and also includes supporting and improving the Salesforce environment.

Key Responsibilities: Business Intelligence & Analytics

  • Develop and maintain dashboards, reports, and automated data pipelines in Python and SQL.
  • Conduct analyses to generate insights supporting business decisions.

Cloud & Data Infrastructure

  • Work with Azure cloud services to ensure scalable, efficient, and secure data solutions.
  • Maintain data reliability and integrity across systems.

Salesforce Support

  • Configure, integrate, and optimize Salesforce with internal tools.
  • Automate workflows, reporting, and data synchronization.
  • Analyze and model large-scale ECG and healthcare datasets.
Extract, Transform, Load (ETL)Azure SQLDashboardsPython (Programming Language)Time Series ForecastingSQLiteSQLAlchemy

Data Scientist

  • Universitat Pompeu Fabra · Contract
  • Nov 2024 - Sep 2025 · 11 mos
  • Barcelona, Catalonia, Spain · Remote

Analyze large structured and unstructured datasets to identify trends and anomalies, providing actionable insights through custom ETL pipelines and data cleaning processes. Develop and implement topic modelling solutions, FastText pretraining, and an LSTM-based gender prediction library achieving 94% accuracy on Catalan names. Collaborate weekly with cross-functional teams to ensure best practices and high-quality results.

Extract, Transform, Load (ETL)Exploratory Data AnalysisData ScrapingLong Short-term Memory (LSTM)LDAA/B Testing

Data Scientist

  • Universitat Pompeu Fabra · Contract
  • Sep 2023 - Nov 2024 · 1 yr 3 mos
  • Barcelona, Catalonia, Spain · Remote

Data cleaning, Data scraping, Data warehousing, Validating unstructured data, automating data pipeline.

Generative AIData ScienceExploratory Data AnalysisUnsupervised LearningPython (Programming Language)RAGScikit-LearnMachine LearningVisualizationNatural Language Processing (NLP)ForecastingTeamwork

Data Scientist

  • IESE Business School · Contract
  • Oct 2023 - Sep 2024 · 1 yr
  • Barcelona, Catalonia, Spain · Remote

Conducted time series analysis, data cleaning, and machine learning model development for regression and classification tasks. Regularly presented findings and collaborated in biweekly meetings to ensure research alignment. Developed a hybrid pipeline combining TabText and LSTM, improving forecast accuracy by 10% by transforming tabular data into natural language. Created DeepQuery, a web-based tool that aggregates and semantically retrieves policy data for economists, leveraging LangChain, OpenAI embeddings, and FAISS for rapid analysis.

Time Series ForecastingGenerative AIA/B TestingData ScienceKerasRAGPredictive ModelingSupervised LearningRAGTensorFlowMachine LearningVisualizationNatural Language Processing (NLP)ForecastingWeb ScrapingTime Series AnalysisGenerative AI ToolsAmazon Web Services (AWS)Machine Learning AlgorithmsTeamworkKnowledge Graph-Based Question AnsweringKnowledge Graph-Based Natural Language ProcessingRemote TeamworkCommunication

Data Scientist Consultant

  • Oxera Consulting LLP · Contract
  • March 2023 - June 2023 · 4 mos
  • Barcelona, Catalonia, Spain · Remote

We developed an LLM-powered pipeline to automate the scraping, labeling, and analysis of legal documents, integrating text prediction capabilities to enhance efficiency and accuracy. Additionally, we built a knowledge-based chatbot that enables interactive document exploration, allowing users to efficiently query and extract relevant insights from legal texts. This system streamlined data extraction, improved document categorization, and optimized legal text processing. Challenge proposed by Oxera to the Barcelona School of Economics.

Retrieval-Augmented Generation (RAG)Natural Language Processing (NLP)BERT (Language Model)Large Language Models (LLM)

Contact

Let’s turn data into decisions. Drop me a message—I’m just one click away.