Goal: Train a supervised learning model to predict customer churn for the fictional company Fabapalooza, and demonstrate how the model can guide targeted retention efforts.
This project focuses on developing a supervised learning model to predict customer churn for a fictional company called Fabapalooza. The goal is to help the company identify which customers to prioritize in a retention plan where each intervention costs €20,000.
Your team must deliver ML models to:
- Predict which customers are likely to churn.
- Estimate expected lost revenue for each customer.
- Identify a profitable subset of customers to target with retention efforts.
- Communicate findings through a recorded presentation and reproducible R code.
Project deliverables:
- A reproducible R Markdown file implementing the full analysis.
- A reproducible Jupyter notebook implementing the full analysis using Python
The project consists of data understanding, model training and tuning, model assessment, and demonstrating how predictions inform retention decisions.
Fabapalooza provides 3D printer hardware for business clients. Its customer base consists largely of small companies and startups, leading to high churn rates. Despite the volatility, long-term customers provide significant value through referrals and stable revenue.
The company plans to:
- Predict each customer’s probability of churn.
- Estimate expected revenue lost if the customer churns.
- Multiply these to calculate expected loss:
- Prioritize customers with the highest expected loss, as long as the expected benefit exceeds €20,000.
Your task is to build the predictive model supporting this plan.
customers.Rdata: the customer-year dataset.
analysis_template.R: Parker’s partially completed analysis file.
Each row represents a customer at the end of a specific year. Simplifying assumptions include fixed customer attributes across years, churn only at year-end, and no returning customers.
Candidate models
-
LASSO logistic regression -
Random forest classifier
Use tidymodels for:
- Preprocessing
- Model tuning
- Cross-validation (5 folds, 4 repeats provided)
Final Model Demonstration
- Retrain the full analysis set (pre-2024)
- Generate soft predictions on assessment set (2024 customers)
- Estimate expected lost revenue for each customer
- Compute expected loss using predicted churn probabilities
- Determine and recommend how many customers should be targeted to maximize expected net value