This project demonstrates a complete MLOps pipeline for machine learning model deployment using the Iris dataset for flower species classification. The implementation follows industry best practices and includes all essential components of a production-ready ML system.
The pipeline implements a comprehensive MLOps workflow with the following components:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Layer │ │ ML Pipeline │ │ Deployment │
│ │ │ │ │ │
│ • Iris Dataset │───▶│ • Data Prep │───▶│ • FastAPI │
│ • Preprocessing │ │ • Model Training│ │ • Docker │
│ • Validation │ │ • MLflow Track │ │ • CI/CD │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐
│ Monitoring │
│ │
│ • SQLite Logs │
│ • Metrics API │
│ • Health Checks │
└─────────────────┘
- Dataset: Iris flower classification (150 samples, 4 features, 3 classes)
- Preprocessing: Feature scaling, train/test split with stratification
- Validation: Data quality checks and reproducible preprocessing
- Models Trained:
- Logistic Regression (Accuracy: 93.33%)
- Random Forest (Accuracy: 90.00%)
- MLflow Integration: Complete experiment tracking with metrics, parameters, and model registry
- Model Selection: Automated selection of best-performing model
- Framework: FastAPI with automatic OpenAPI documentation
- Endpoints:
/predict- Single prediction/predict/batch- Batch predictions/health- Health monitoring/metrics- Performance metrics/docs- Interactive API documentation
- Validation: Pydantic models for request/response validation
- Error Handling: Comprehensive error handling and logging
- Docker: Multi-stage build for optimal image size
- Security: Non-root user, minimal base image
- Health Checks: Built-in container health monitoring
- Deployment Script: Automated deployment with health verification
- GitHub Actions: Automated testing, building, and deployment
- Testing: Unit tests, integration tests, and API tests
- Code Quality: Linting with flake8, formatting with black
- Security: Container vulnerability scanning with Trivy
- Deployment: Multi-environment deployment (staging/production)
- Logging: SQLite-based prediction logging
- Metrics: Request count, processing time, model confidence
- Health Monitoring: API health checks and database connectivity
- Audit Trail: Complete request/response logging for compliance
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 93.33% | 93.33% | 93.33% | 93.33% |
| Random Forest | 90.00% | 90.24% | 90.00% | 89.97% |
Best model: Logistic Regression selected based on superior performance
| Component | Technology | Purpose |
|---|---|---|
| ML Framework | scikit-learn | Model training and prediction |
| Experiment Tracking | MLflow | Model versioning and metrics |
| API Framework | FastAPI | REST API development |
| Validation | Pydantic | Data validation and serialization |
| Database | SQLite | Prediction logging and monitoring |
| Containerization | Docker | Application packaging |
| CI/CD | GitHub Actions | Automated deployment pipeline |
| Testing | pytest | Unit and integration testing |
| Code Quality | flake8, black | Linting and formatting |
# Clone the repository
git clone <repository-url>
cd MLOps-project
# Create virtual environment
python -m venv mlops_env
source mlops_env/bin/activate # Linux/Mac
# mlops_env\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Train models
python src/models/train.py
# Start API
uvicorn src.api.main:app --reload
# View MLflow UI
mlflow ui# Build and deploy using the deployment script
./deploy.sh
# Or manually
docker build -t iris-mlops-pipeline .
docker run -d -p 8000:8000 --name iris-api iris-mlops-pipeline# Run all tests
pytest tests/ -v
# Test API endpoint
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}'MLOps-project/
├── src/
│ ├── data/ # Data processing modules
│ ├── models/ # Model training modules
│ ├── api/ # FastAPI application
│ └── monitoring/ # Logging and monitoring
├── tests/ # Unit and integration tests
├── .github/workflows/ # CI/CD pipeline
├── data/ # Processed data storage
├── mlruns/ # MLflow experiment tracking
├── monitoring/ # Database and logs
├── Dockerfile # Container configuration
├── requirements.txt # Python dependencies
├── deploy.sh # Deployment script
└── README.md # Project documentation
GET /- API informationGET /health- Health checkGET /docs- Interactive API documentationGET /metrics- System metricsGET /model/info- Model information
POST /predict- Single flower predictionPOST /predict/batch- Batch predictionGET /predictions/history- Prediction history
// Request
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}
// Response
{
"predictions": ["setosa"],
"probabilities": [[0.98, 0.02, 0.0]],
"model_name": "iris_logistic_regression",
"model_version": "1",
"processing_time_ms": 1.35,
"timestamp": "2024-01-15T10:30:00.123456"
}The system provides comprehensive monitoring through:
- Health Monitoring: API health, database connectivity, model status
- Performance Metrics: Processing time, throughput, error rates
- Business Metrics: Prediction confidence, class distribution
- Audit Logging: Complete request/response tracking
Access monitoring:
- Health:
GET /health - Metrics:
GET /metrics - History:
GET /predictions/history
The GitHub Actions pipeline includes:
- Code Quality: Linting, formatting, import sorting
- Testing: Unit tests, integration tests, API tests
- Security: Dependency scanning, container vulnerability scanning
- Build: Docker image creation and registry push
- Deploy: Multi-environment deployment with health checks
- Input validation with Pydantic
- Container security best practices
- Vulnerability scanning with Trivy
- Non-root container execution
- Secure secrets management
- Request/response logging for audit
- Data Versioning: Reproducible preprocessing and data splits
- Model Versioning: MLflow model registry and tracking
- Code Versioning: Git with proper branching strategy
- Automated Testing: Comprehensive test suite
- Continuous Integration: Automated build and test pipeline
- Continuous Deployment: Automated deployment pipeline
- Monitoring: Request logging and performance metrics
- Documentation: API docs and project documentation
- Containerization: Docker for consistent deployments
- Infrastructure as Code: Docker and deployment scripts
- Production-Ready API: Scalable FastAPI service with comprehensive documentation
- Robust CI/CD: Complete automation from code commit to deployment
- Comprehensive Monitoring: Full observability stack with metrics and logging
- Security First: Multiple security layers and vulnerability scanning
- High Code Quality: 100% test coverage with automated quality checks
- Docker Optimization: Multi-stage builds for efficient containers
- MLflow Integration: Complete experiment tracking and model management
- Model Retraining: Automated retraining on new data
- A/B Testing: Model comparison in production
- Advanced Monitoring: Prometheus/Grafana integration
- Kubernetes: Container orchestration for scaling
- Feature Store: Centralized feature management
- Data Drift Detection: Automated data quality monitoring