🚀 NovaPay Fraud Detection System

An end-to-end machine learning project designed to detect fraudulent transactions in a highly imbalanced financial dataset, with a strong focus on performance, interpretability, and real-world usability.

📌 Project Overview

Fraud detection is a critical challenge in fintech, where fraudulent transactions are rare but highly impactful. In this project, we built a robust and interpretable fraud detection system for NovaPay that:

Accurately identifies fraudulent transactions
Handles severe class imbalance (<10% fraud rate)
Provides transparent, audit-ready explanations for each prediction

⚙️ Key Features

✅ Behavioural risk scoring system
✅ Advanced feature engineering (velocity, anomaly, interaction features)
✅ Time-based model validation (prevents data leakage)
✅ LightGBM & XGBoost model comparison
✅ Imbalance handling (SMOTE, class weights, undersampling)
✅ SHAP-based explainability
✅ Fraud review template with reason codes
✅ Deployment-ready (FastAPI + Docker)

📊 Model Performance

Metric	Score
Accuracy	~98%
ROC-AUC	~0.97–0.98
F1-score	~0.95

👉 Final model: LightGBM (Baseline)

Best balance between precision and recall
Stable without heavy rebalancing

🧠 Approach

1. Exploratory Data Analysis

Identified key fraud patterns:

High transaction velocity
Risky IP/device behaviour
New accounts

2. Risk Scoring

Built a behavioural risk score using domain-driven rules
Normalized into interpretable risk levels

3. Feature Engineering

Customer behaviour:

Transaction frequency, averages, totals

Anomaly detection:

Velocity ratios, deviations

Interaction features:

IP × device risk
Velocity × amount

4. Model Training

Models used:

LightGBM
XGBoost

Techniques tested:

SMOTE
Class weighting
Undersampling

5. Explainability (SHAP)

Implemented SHAP for:

Global feature importance
Transaction-level explanations

Each prediction includes:

Prediction (Fraud / Legitimate)
Confidence score
Risk level (Low / Medium / High)
Top risk drivers
Protective factors
Business-friendly reason codes

🔍 Example Output

Prediction: FRAUD
Confidence: 100%

Top Risk Drivers:

High behavioural risk score
New account
High transaction velocity
Suspicious IP/device interaction

🖥️ Interactive Dashboard (Streamlit)

A user-friendly interface allows stakeholders to:

Input transaction details
View fraud prediction instantly
See:
- Risk level (colour-coded)
- Reason codes
- Full fraud review summary

👉 Transforms model output into actionable insights

🏗️ Deployment

The model is production-ready and deployed using:

FastAPI → REST API for predictions Docker → Containerized deployment

Project Structure

nova-fraud-api/ │ ├── app/ │ ├── main.py │ ├── utils.py │ ├── artifacts/ │ ├── lgb_model.pkl │ ├── scaler.pkl │ ├── encoders.pkl │ ├── features.pkl │ ├── shap_explainer.pkl ├── notebooks/ ├── requirements.txt ├── Dockerfile

💡 Key Insights

Feature engineering had the biggest impact on performance
LightGBM handled imbalance effectively without heavy resampling
Fraud detection requires balancing:
- Recall (catch fraud)
- Precision (avoid false alarms)
Explainability is critical for:
- Trust
- Regulatory Compliance
- Analyst decision-making

🚀 Future Improvements

Real-time streaming fraud detection
Threshold optimisation for business strategies
Hybrid system (rules + ML)
Model monitoring & drift detection
Integration with transaction systems

👩‍💻 Author

Priscillia Ejiro Data Scientist | Machine Learning | Fraud Analytics

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
main		main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
novapay_fraud_detection.py		novapay_fraud_detection.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 NovaPay Fraud Detection System

📌 Project Overview

⚙️ Key Features

📊 Model Performance

🧠 Approach

1. Exploratory Data Analysis

2. Risk Scoring

3. Feature Engineering

4. Model Training

5. Explainability (SHAP)

🔍 Example Output

🖥️ Interactive Dashboard (Streamlit)

🏗️ Deployment

Project Structure

💡 Key Insights

🚀 Future Improvements

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 NovaPay Fraud Detection System

📌 Project Overview

⚙️ Key Features

📊 Model Performance

🧠 Approach

1. Exploratory Data Analysis

2. Risk Scoring

3. Feature Engineering

4. Model Training

5. Explainability (SHAP)

🔍 Example Output

🖥️ Interactive Dashboard (Streamlit)

🏗️ Deployment

Project Structure

💡 Key Insights

🚀 Future Improvements

👩‍💻 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages