Skip to content
View JensBender's full-sized avatar

Block or report JensBender

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
JensBender/README.md

Profile-banner

Hi there! I'm Jens

👋 About Me

PhD Researcher with 8+ years in advanced statistical modeling, now applying these skills to solve complex Data Science problems. Expertise in building robust, automated end-to-end ML solutions that translate raw data into actionable insights via deployed web apps and APIs.

🛠️ Skills

Category Skill
Programming Python MySQL
Data Manipulation NumPy Pandas
Data Visualization Matplotlib Seaborn Plotly Power BI
AI & Machine Learning scikit-learn TensorFlow Hugging Face
Big Data Spark
Web Development FastAPI Flask Gradio Pydantic
Version Control Git GitHub
DevOps Docker Airflow GitHub Actions
Testing pytest Selenium
Cloud AWS
Development Environments Jupyter Notebook VS Code PyCharm Cursor

💻 Portfolio

Python Pandas scikit-learn FastAPI pytest Docker Hugging Face
Built an end-to-end machine learning solution for predicting loan defaults using customer application data, enabling financial institutions to make data-driven lending decisions and better manage credit risk. The project includes:

  • Data Preprocessing: Engineered new features (e.g., job stability, city tier, state default rate), handled duplicates, data types, missing values, and outliers, scaled numerical features, and encoded categorical features.
  • Exploratory Data Analysis: Analyzed distributions and relationships using descriptive statistics, correlations, and visualizations.
  • Modeling: Trained and evaluated eight baseline models (e.g., Logistic Regression, Random Forest, XGBoost) and tuned hyperparameters. Selected a Random Forest Classifier with an optimized decision threshold, achieving an AUC-PR of 0.59, recall of 0.79, and precision of 0.51 for the default class on the hold-out test set. Visualized feature importances and showed model prediction examples.
  • Deployment: Served the full machine learning pipeline (preprocessing and model) as a web app using a FastAPI backend and Gradio frontend within a single Docker container, hosted on Hugging Face Spaces. Automated deployment via GitHub Actions to sync web app files to Hugging Face Spaces on every push.
  • Testing: Implemented comprehensive unit, integration, and end-to-end tests to validate individual components, their interactions, and entire user journeys.

Model Pipeline: Hugging Face Hub
Web App: Hugging Face Spaces Web App

ℹ️ Model Performance (click to expand) Precision-Recall Curves

Python MySQL Airflow Docker AWS Hugging Face Power BI
To empower YouTube content creators and marketers with actionable insights into their channel's performance, especially in comparison to related channels, I developed a comprehensive ETL pipeline and designed an interactive Power BI report. This project involved:

  • Data Extraction: Utilized the YouTube API to gather extensive data from three selected channels, including videos and comments.
  • Data Transformation: Performed sentiment analysis on video comments via API requests to a RoBERTa sentiment analysis model, which I deployed using Gradio on a private Hugging Face Space.
  • Data Loading: Stored the transformed data in a MySQL database hosted on AWS.
  • Automation: Managed the ETL workflow using Apache Airflow, Docker, and AWS.
  • Data Visualization: Designed an interactive Power BI report to deliver insights into channel performance, featuring key metrics and comparative analysis.

This project enables YouTube content creators to easily monitor and evaluate their channel's performance relative to their peers, allowing for more informed decision-making and strategic planning.

PowerBI Comments

Python NumPy Pandas Matplotlib scikit-learn Flask Docker

  • Motivation: Simplify the process of finding rental properties in Singapore's expensive real estate market by using machine learning to estimate rental prices.
  • Data Collection: Scraped 1680 property listings from an online property portal, including information on price, size, address, bedrooms, bathrooms and more.
  • Exploratory Data Analysis: Visualized property locations on an interactive map, generated a word cloud to extract insights from property agent descriptions, and examined descriptive statistics, distributions, and correlations.
  • Data Preprocessing: Handled missing address data and engineered location-related features using the Google Maps API, extracted property features from agent descriptions and systematically evaluated multiple outlier handling methods.
  • Model Training: Trained five machine learning models with baseline configurations, selected an XGBoost regression model with optimized hyperparameters, and achieved a test dataset performance with an RMSE of 995, a MAPE of 0.13, and an R² of 0.90.
  • Model Deployment: Created a web application for serving the XGBoost model using the Flask framework. Containerized this application using Docker and successfully deployed the Docker container on render.com.

Python TensorFlow scikit-learn NumPy Pandas Matplotlib Flask

  • Motivation: Develop a hate speech detector for social media comments.
  • Data: Utilized the ETHOS Hate Speech Detection Dataset.
  • Models: Trained and evaluated the performance of three deep learning models using TensorFlow and scikit-learn. The fine-tuned BERT model demonstrated superior performance (78.0% accuracy) compared to the SimpleRNN (66.3%) and LSTM (70.7%) models.
  • Deployment: Prepared the fine-tuned BERT model for production by integrating it into a web application and an API endpoint using the Flask web framework.
Fine-tuned BERT: Confusion Matrix Model Deployment
BERT-confusion-matrix

More Projects

  • Medical Cost Planner: Currently developing an AI-powered application to predict out-of-pocket healthcare costs for personalized financial planning.
  • Machine Learning Template: A versatile, ready-to-use machine learning template for tabular data. Streamlines EDA, data preprocessing, and modeling for regression and classification.
  • ChatGPT Cover Letter Generator: LLM-driven cover letter generator. Transforms job postings into tailored applications using the OpenAI API and Web Scraping.

🏅 Certifications & Courses

ML in Production | DeepLearning.AI | May 2025
Skills: MLOps · Data-Centric ML Lifecycle · Error Analysis

Hugging Face | DeepLearning.AI | July 2024
Skills: Transformers · Multimodal AI · Gradio · HF Hub & Spaces

Data Engineering Foundations | DeepLearning.AI | June 2024
Skills: Requirements Gathering · Tech Specs

IBM Data Engineering | IBM | May 2024
Skills: ETL · Bash · Apache Airflow (DAGs)

Advanced SQL: MySQL for Ecommerce & Web Analytics | Udemy | February 2024 | 🔗 Certificate
Skills: MySQL · Business Intelligence · Data Analysis · Subqueries · Temporary Tables

A/B Testing | Google | February 2024
Skills: Experimental Design · Metric Selection · Power Analysis · Sanity Checks

AWS Certified Cloud Practitioner | AWS | January 2024 | 🔗 Certificate
Skills: Amazon Web Services (AWS) · Cloud Concepts · Security & Compliance · Billing & Pricing

Ultimate AWS Certified Cloud Practitioner CLF-C02 | Udemy | January 2024 | 🔗 Certificate
Skills: Amazon Web Services (AWS) · Identity and Access Management (IAM) · Elastic Compute Cloud (EC2) · Simple Storage Service (S3) · Virtual Private Cloud (VPC) · CloudWatch · Database & Analytics

Spark and Python for Big Data with PySpark | Udemy | January 2024 | 🔗 Certificate
Skills: Apache Spark · PySpark · Spark DataFrames · MLlib · Amazon Web Services (AWS) · Databricks

Microsoft Power BI Data Analyst | Udemy | November 2023 | 🔗 Certificate
Skills: Power BI · Power Query · Data Analysis Expressions (DAX) · Data Modeling · Interactive Dashboards

Product Management for AI | 365 Data Science | November 2023
Skills: AI Strategy · UX · Stakeholder Management

LLM App Development | OpenAI & DeepLearning.AI | June 2023
Skills: OpenAI API · Prompt Engineering

Deep Learning | alfatraining Bildungszentrum GmbH | April 2023
Skills: TensorFlow · Neural Networks · Convolutional Neural Networks (CNN) · Computer Vision · Recurrent Neural Networks (RNN) · Long Short-Term Memory (LSTM) · Natural Language Processing (NLP) · Time Series Analysis

Machine Learning by Stanford University & DeepLearning.AI | Coursera | April 2023 | 🔗 Certificate
Skills: Linear & Logistic Regression · Neural Networks · Recommender Systems · Reinforcement Learning

Python for Machine Learning & Data Science Masterclass | Udemy | March 2023 | 🔗 Certificate
Skills: scikit-learn · Pandas · NumPy · Matplotlib · Seaborn · Random Forest · Gradient Boosting · Support Vector Machines (SVM) · DBSCAN

Machine Learning | alfatraining Bildungszentrum GmbH | February 2023
Skills: Regression · K-Nearest Neighbors (KNN) · Decision Trees · Random Forest · Support Vector Machines (SVM) · Clustering · Principal Component Analysis (PCA) · Feature Engineering · Model Evaluation

The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert | Udemy | December 2022 | 🔗 Certificate
Skills: MySQL · Database Schemas · SQL Joins · Aggregate Functions · Window Functions

The Git & GitHub Bootcamp | Udemy | September 2022
Skills: Git · GitHub · Version Control · Branching · Merging · Pull Requests

100 Days of Code: The Complete Python Pro Bootcamp | Udemy | April 2022 – November 2022
Skills: Python · OOP · Flask · Web Scraping (Beautiful Soup, Selenium) · APIs (Requests) · Automation

👨‍💻 GitHub Statistics

Top Languages

©️ Credits

Profile banner GIF based on the video by RDNE Stock project from Pexels

Pinned Loading

  1. medical-cost-prediction medical-cost-prediction Public

    🏥 An AI-powered application to predict out-of-pocket healthcare costs for personalized financial planning.

    Jupyter Notebook

  2. loan-default-prediction loan-default-prediction Public

    🏦 End-to-end machine learning solution for predicting loan defaults from application data.

    Python

  3. youtube-channel-analytics youtube-channel-analytics Public

    📊 ETL pipeline for YouTube competitor analytics. Orchestrated with Airflow, Docker, and AWS. Features sentiment analysis and a Power BI dashboard.

    Jupyter Notebook 9

  4. rental-price-prediction rental-price-prediction Public

    🏠 Predicting Singapore rental prices using XGBoost. Includes web scraping of properties, feature engineering via Google Maps API, and a Dockerized Flask application for deployment.

    Jupyter Notebook 7 1

  5. hate-speech-detection hate-speech-detection Public

    🛡️ BERT-based hate speech detector for social media. Fine-tuned on the Ethos dataset, outperforming RNN/LSTM. Deployment via Web App and API.

    PureBasic 16 10

  6. machine-learning-template machine-learning-template Public

    🏗️ A versatile, ready-to-use machine learning template for tabular data. Streamlines EDA, data preprocessing, and modeling for regression and classification.

    Jupyter Notebook 5