- Overview
- Features
- Architecture
- Pipeline Flow
- Tech Stack
- Getting Started
- Training
- MLflow UI
- API Server
- Dashboard
- API Usage
- Model Performance
- Docker
- Testing
- Future Roadmap
- Disclaimer
- Contributing
Loan Prediction System is a production-style, end-to-end machine learning application that predicts whether a loan application should be Approved or Rejected based on financial and demographic data.
The project demonstrates the complete ML lifecycle โ from raw data ingestion and feature engineering, through model training and experiment tracking with MLflow, to a live REST API and interactive Streamlit dashboard. Built with MLOps best practices from the ground up.
โ ๏ธ For educational purposes only. Not intended for real financial decision-making.
| Feature | Description |
|---|---|
| ๐ Full ML Pipeline | Data ingestion โ preprocessing โ training โ evaluation โ serving |
| ๐ค Model Comparison | Random Forest vs XGBoost with automatic best-model selection |
| ๐ MLflow Tracking | Log parameters, metrics, artifacts; full experiment history & model versioning |
| ๐ FastAPI Backend | RESTful prediction endpoint with Pydantic schema validation |
| ๐ Streamlit Dashboard | Upload applicant data and get instant predictions via browser UI |
| ๐ณ Dockerized | Full Docker + Docker Compose setup for one-command production deployment |
| ๐งช Test-Ready | pytest-compatible structure for unit and integration tests |
loan-prediction-system/
โ
โโโ app/
โ โโโ streamlit_app.py # ๐ Streamlit prediction dashboard
โ
โโโ data/
โ โโโ raw/ # ๐ Raw CSV dataset
โ โโโ data_loader.py # Load & split data
โ โโโ preprocess.py # Feature engineering & encoding
โ
โโโ mlruns/ # ๐ MLflow experiment tracking data
โโโ mlflow.db # MLflow SQLite backend
โ
โโโ models/
โ โโโ train_model.py # ๐ค RF & XGBoost training logic
โ โโโ predict.py # Load model + run inference
โ โโโ preprocessor.pkl # Saved preprocessing pipeline
โ โโโ train_model.pkl # Saved best model artifact
โ
โโโ pipelines/
โ โโโ training_pipeline.py # โ๏ธ Orchestrates full training flow
โ
โโโ notebooks/
โ โโโ eda.ipynb # ๐ Exploratory Data Analysis
โ
โโโ src/ # ๐ Core API backend
โ โโโ api/ # FastAPI route handlers
โ โโโ config/ # App configuration & constants
โ โโโ schema/ # Pydantic request/response models
โ โโโ utils/ # Helper functions & logging
โ โโโ main.py # API entrypoint
โ
โโโ tests/
โ โโโ sent_data.py # ๐งช Test payload & assertions
โ
โโโ artifacts/ # ๐พ Additional saved outputs
โโโ Dockerfile
โโโ compose.yaml
โโโ requirements.txt
โโโ .gitignore
โโโ README.md
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RAW DATA INGESTION โ
โ CSV โ data_loader.py โ Train/Test Split โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FEATURE ENGINEERING โ
โ Encoding ยท Scaling ยท Imputation ยท Selection โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODEL TRAINING & COMPARISON โ
โ Random Forest โ XGBoost โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโผโโโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโโโโโโโ
โ Model Evaluation โ โ MLflow Experiment โ
โ Accuracy ยท F1 ยท AUC โ โ Params ยท Metrics ยท โ
โ Best Model Selected โ โ Artifacts ยท Versions โ
โโโโโโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SAVED MODEL (.pkl) โ
โ models/train_model.pkl โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโผโโโโโโโ โโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ FastAPI REST API โ โ Streamlit Dashboard โ
โ POST /predict โ โ Form โ Predict โ Show โ
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Technology |
|---|---|
| ML Models | Random Forest ยท XGBoost |
| Data Processing | Pandas ยท NumPy ยท Scikit-learn |
| Experiment Tracking | MLflow (SQLite backend) |
| Backend API | FastAPI ยท Uvicorn ยท Pydantic |
| Frontend | Streamlit |
| Containerization | Docker ยท Docker Compose |
| Testing | pytest |
- Python 3.10+
- pip / conda
- Docker (optional, for containerized deployment)
git clone https://github.com/Darshit02/loan-prediction-system.git
cd loan-prediction-system# macOS / Linux
python -m venv venv
source venv/bin/activate
# Windows
python -m venv venv
venv\Scripts\activatepip install -r requirements.txtPlace your raw dataset in the data/raw/ directory:
data/
โโโ raw/
โโโ loan_data.csv # Your training dataset
Recommended: Kaggle Loan Prediction Dataset
python pipelines/training_pipeline.pyThis will:
- โ
Load and clean data from
data/raw/ - โ Run feature engineering and preprocessing
- โ Train Random Forest and XGBoost models
- โ Compare performance and select the best model
- โ Log all experiments, metrics, and artifacts to MLflow
- โ
Save the winning model to
models/train_model.pkl
mlflow uiOpen your browser at http://127.0.0.1:5000
You'll see:
- ๐ All experiment runs with parameters and metrics
- ๐ Side-by-side model comparison charts
- ๐ฆ Saved model artifacts per run
- ๐ท๏ธ Model versioning history
uvicorn src.main:app --reload --host 0.0.0.0 --port 8000Interactive API docs available at:
- Swagger UI โ http://localhost:8000/docs
- ReDoc โ http://localhost:8000/redoc
streamlit run app/streamlit_app.pyOpen your browser at http://localhost:8501
Features:
- ๐ Fill in applicant details via form inputs
- ๐ค Instant prediction with confidence score
- ๐ Feature importance visualization
- ๐ Model performance summary panel
POST /predict
{
"income": 50000,
"loan_amount": 200000,
"credit_score": 750,
"employment_status": "Salaried"
}{
"loan_status": "Approved",
"confidence": 0.87,
"model_used": "XGBoost"
}curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"income": 50000, "loan_amount": 200000, "credit_score": 750, "employment_status": "Salaried"}'Models are evaluated on a held-out test set. The best-performing model is automatically selected by the training pipeline.
| Metric | Random Forest | XGBoost |
|---|---|---|
| Accuracy | ~85% | ~88% |
| Precision | ~84% | ~87% |
| Recall | ~83% | ~86% |
| F1-Score | ~83% | ~86% |
| AUC-ROC | ~0.91 | ~0.93 |
Metrics vary based on dataset, hyperparameters, and train/test split. All runs are logged in MLflow for full reproducibility.
docker build -t loan-prediction .
docker run -p 8000:8000 loan-predictiondocker-compose up --buildThis spins up:
apiservice โ FastAPI on port8000dashboardservice โ Streamlit on port8501mlflowservice โ MLflow UI on port5000
# Run all tests
pytest tests/
# With coverage report
pytest tests/ --cov=src --cov-report=term-missingTest coverage includes:
tests/sent_data.pyโ API payload validation and response assertions- Preprocessing pipeline correctness
- Model loading and inference checks
- ๐ Feature importance visualization โ SHAP values in Streamlit
- ๐ง AutoML integration โ Optuna / FLAML hyperparameter search
- ๐ช Feature store โ Feast or Hopsworks integration
- โ Data validation โ Great Expectations for schema & drift checks
- ๐ CI/CD pipeline โ GitHub Actions for automated test + deploy
- ๐ Automated retraining โ scheduled model refresh on new data
- ๐ฆ MLflow Model Registry โ staging โ production promotion workflow
- ๐ก Drift monitoring โ Evidently AI or Whylogs integration
- โ๏ธ Cloud deployment โ AWS SageMaker / GCP Vertex AI
This project is developed strictly for educational and research purposes.
It is not validated for real financial decision-making and must not be used to approve or deny actual loan applications.
Always consult a licensed financial professional for credit-related decisions.
Contributions are welcome! Here's how:
# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/your-feature-name
# 3. Commit your changes
git commit -m "feat: add your feature"
# 4. Push to your fork
git push origin feature/your-feature-name
# 5. Open a Pull RequestPlease follow Conventional Commits for commit messages.
This project is licensed under the MIT License.
See the LICENSE file for details.
Made with โค๏ธ for the ML & MLOps community
โญ Star this repo if you found it useful!