Skip to content

vicng201/Customer-Churn-Prediction---Personal-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Customer-Churn-Prediction-Personal-Project

Goal: Train a supervised learning model to predict customer churn for the fictional company Fabapalooza, and demonstrate how the model can guide targeted retention efforts.

1. Project Overview

This project focuses on developing a supervised learning model to predict customer churn for a fictional company called Fabapalooza. The goal is to help the company identify which customers to prioritize in a retention plan where each intervention costs €20,000.

Your team must deliver ML models to:

- Predict which customers are likely to churn.

- Estimate expected lost revenue for each customer.

- Identify a profitable subset of customers to target with retention efforts.

- Communicate findings through a recorded presentation and reproducible R code.

Project deliverables:

- A reproducible R Markdown file implementing the full analysis.

- A reproducible Jupyter notebook implementing the full analysis using Python

The project consists of data understanding, model training and tuning, model assessment, and demonstrating how predictions inform retention decisions.

2. Business Case Description

Fabapalooza provides 3D printer hardware for business clients. Its customer base consists largely of small companies and startups, leading to high churn rates. Despite the volatility, long-term customers provide significant value through referrals and stable revenue.

The company plans to:

- Predict each customer’s probability of churn.

- Estimate expected revenue lost if the customer churns.

- Multiply these to calculate expected loss: $Expected Loss=E[Lost Revenue∣Churn]×Pr(Churn)$

- Prioritize customers with the highest expected loss, as long as the expected benefit exceeds €20,000.

Your task is to build the predictive model supporting this plan.

3. Data Description

customers.Rdata: the customer-year dataset.

analysis_template.R: Parker’s partially completed analysis file.

Each row represents a customer at the end of a specific year. Simplifying assumptions include fixed customer attributes across years, churn only at year-end, and no returning customers.

Candidate models

  1. LASSO logistic regression

  2. Random forest classifier

Use tidymodels for:

- Preprocessing

- Model tuning

- Cross-validation (5 folds, 4 repeats provided)

Final Model Demonstration

- Retrain the full analysis set (pre-2024)

- Generate soft predictions on assessment set (2024 customers)

- Estimate expected lost revenue for each customer

- Compute expected loss using predicted churn probabilities

- Determine and recommend how many customers should be targeted to maximize expected net value

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors