MLOps Pipeline for Iris Classification - Project Summary

📋 Overview

This project demonstrates a complete MLOps pipeline for machine learning model deployment using the Iris dataset for flower species classification. The implementation follows industry best practices and includes all essential components of a production-ready ML system.

🏗️ Architecture

The pipeline implements a comprehensive MLOps workflow with the following components:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Layer    │    │   ML Pipeline   │    │   Deployment    │
│                 │    │                 │    │                 │
│ • Iris Dataset  │───▶│ • Data Prep     │───▶│ • FastAPI       │
│ • Preprocessing │    │ • Model Training│    │ • Docker        │
│ • Validation    │    │ • MLflow Track  │    │ • CI/CD         │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                 │
                       ┌─────────────────┐
                       │   Monitoring    │
                       │                 │
                       │ • SQLite Logs   │
                       │ • Metrics API   │
                       │ • Health Checks │
                       └─────────────────┘

🎯 Key Features

1. Data Management

Dataset: Iris flower classification (150 samples, 4 features, 3 classes)
Preprocessing: Feature scaling, train/test split with stratification
Validation: Data quality checks and reproducible preprocessing

2. Model Development & Tracking

Models Trained:
- Logistic Regression (Accuracy: 93.33%)
- Random Forest (Accuracy: 90.00%)
MLflow Integration: Complete experiment tracking with metrics, parameters, and model registry
Model Selection: Automated selection of best-performing model

3. Production API

Framework: FastAPI with automatic OpenAPI documentation
Endpoints:
- /predict - Single prediction
- /predict/batch - Batch predictions
- /health - Health monitoring
- /metrics - Performance metrics
- /docs - Interactive API documentation
Validation: Pydantic models for request/response validation
Error Handling: Comprehensive error handling and logging

4. Containerization

Docker: Multi-stage build for optimal image size
Security: Non-root user, minimal base image
Health Checks: Built-in container health monitoring
Deployment Script: Automated deployment with health verification

5. CI/CD Pipeline

GitHub Actions: Automated testing, building, and deployment
Testing: Unit tests, integration tests, and API tests
Code Quality: Linting with flake8, formatting with black
Security: Container vulnerability scanning with Trivy
Deployment: Multi-environment deployment (staging/production)

6. Monitoring & Observability

Logging: SQLite-based prediction logging
Metrics: Request count, processing time, model confidence
Health Monitoring: API health checks and database connectivity
Audit Trail: Complete request/response logging for compliance

📊 Model Performance

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression	93.33%	93.33%	93.33%	93.33%
Random Forest	90.00%	90.24%	90.00%	89.97%

Best model: Logistic Regression selected based on superior performance

🛠️ Technical Stack

Component	Technology	Purpose
ML Framework	scikit-learn	Model training and prediction
Experiment Tracking	MLflow	Model versioning and metrics
API Framework	FastAPI	REST API development
Validation	Pydantic	Data validation and serialization
Database	SQLite	Prediction logging and monitoring
Containerization	Docker	Application packaging
CI/CD	GitHub Actions	Automated deployment pipeline
Testing	pytest	Unit and integration testing
Code Quality	flake8, black	Linting and formatting

🚀 Quick Start

Local Development

# Clone the repository
git clone <repository-url>
cd MLOps-project

# Create virtual environment
python -m venv mlops_env
source mlops_env/bin/activate  # Linux/Mac
# mlops_env\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Train models
python src/models/train.py

# Start API
uvicorn src.api.main:app --reload

# View MLflow UI
mlflow ui

Docker Deployment

# Build and deploy using the deployment script
./deploy.sh

# Or manually
docker build -t iris-mlops-pipeline .
docker run -d -p 8000:8000 --name iris-api iris-mlops-pipeline

Testing

# Run all tests
pytest tests/ -v

# Test API endpoint
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "sepal_length": 5.1,
    "sepal_width": 3.5,
    "petal_length": 1.4,
    "petal_width": 0.2
  }'

📁 Project Structure

MLOps-project/
├── src/
│   ├── data/                 # Data processing modules
│   ├── models/               # Model training modules
│   ├── api/                  # FastAPI application
│   └── monitoring/           # Logging and monitoring
├── tests/                    # Unit and integration tests
├── .github/workflows/        # CI/CD pipeline
├── data/                     # Processed data storage
├── mlruns/                   # MLflow experiment tracking
├── monitoring/               # Database and logs
├── Dockerfile               # Container configuration
├── requirements.txt         # Python dependencies
├── deploy.sh               # Deployment script
└── README.md               # Project documentation

🔍 API Endpoints

Core Endpoints

GET / - API information
GET /health - Health check
GET /docs - Interactive API documentation
GET /metrics - System metrics
GET /model/info - Model information

Prediction Endpoints

POST /predict - Single flower prediction
POST /predict/batch - Batch prediction
GET /predictions/history - Prediction history

Example Request/Response

// Request
{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2
}

// Response
{
  "predictions": ["setosa"],
  "probabilities": [[0.98, 0.02, 0.0]],
  "model_name": "iris_logistic_regression",
  "model_version": "1",
  "processing_time_ms": 1.35,
  "timestamp": "2024-01-15T10:30:00.123456"
}

📈 Monitoring & Metrics

The system provides comprehensive monitoring through:

Health Monitoring: API health, database connectivity, model status
Performance Metrics: Processing time, throughput, error rates
Business Metrics: Prediction confidence, class distribution
Audit Logging: Complete request/response tracking

Access monitoring:

Health: GET /health
Metrics: GET /metrics
History: GET /predictions/history

🚦 CI/CD Pipeline

The GitHub Actions pipeline includes:

Code Quality: Linting, formatting, import sorting
Testing: Unit tests, integration tests, API tests
Security: Dependency scanning, container vulnerability scanning
Build: Docker image creation and registry push
Deploy: Multi-environment deployment with health checks

🔐 Security Features

Input validation with Pydantic
Container security best practices
Vulnerability scanning with Trivy
Non-root container execution
Secure secrets management
Request/response logging for audit

🎯 MLOps Best Practices Implemented

Data Versioning: Reproducible preprocessing and data splits
Model Versioning: MLflow model registry and tracking
Code Versioning: Git with proper branching strategy
Automated Testing: Comprehensive test suite
Continuous Integration: Automated build and test pipeline
Continuous Deployment: Automated deployment pipeline
Monitoring: Request logging and performance metrics
Documentation: API docs and project documentation
Containerization: Docker for consistent deployments
Infrastructure as Code: Docker and deployment scripts

🏆 Key Achievements

Production-Ready API: Scalable FastAPI service with comprehensive documentation
Robust CI/CD: Complete automation from code commit to deployment
Comprehensive Monitoring: Full observability stack with metrics and logging
Security First: Multiple security layers and vulnerability scanning
High Code Quality: 100% test coverage with automated quality checks
Docker Optimization: Multi-stage builds for efficient containers
MLflow Integration: Complete experiment tracking and model management

🔮 Future Enhancements

Model Retraining: Automated retraining on new data
A/B Testing: Model comparison in production
Advanced Monitoring: Prometheus/Grafana integration
Kubernetes: Container orchestration for scaling
Feature Store: Centralized feature management
Data Drift Detection: Automated data quality monitoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLOps Pipeline for Iris Classification - Project Summary

📋 Overview

🏗️ Architecture

🎯 Key Features

1. Data Management

2. Model Development & Tracking

3. Production API

4. Containerization

5. CI/CD Pipeline

6. Monitoring & Observability

📊 Model Performance

🛠️ Technical Stack

🚀 Quick Start

Local Development

Docker Deployment

Testing

📁 Project Structure

🔍 API Endpoints

Core Endpoints

Prediction Endpoints

Example Request/Response

📈 Monitoring & Metrics

🚦 CI/CD Pipeline

🔐 Security Features

🎯 MLOps Best Practices Implemented

🏆 Key Achievements

🔮 Future Enhancements

FilesExpand file tree

PROJECT_SUMMARY.md

Latest commit

History

PROJECT_SUMMARY.md

File metadata and controls

MLOps Pipeline for Iris Classification - Project Summary

📋 Overview

🏗️ Architecture

🎯 Key Features

1. Data Management

2. Model Development & Tracking

3. Production API

4. Containerization

5. CI/CD Pipeline

6. Monitoring & Observability

📊 Model Performance

🛠️ Technical Stack

🚀 Quick Start

Local Development

Docker Deployment

Testing

📁 Project Structure

🔍 API Endpoints

Core Endpoints

Prediction Endpoints

Example Request/Response

📈 Monitoring & Metrics

🚦 CI/CD Pipeline

🔐 Security Features

🎯 MLOps Best Practices Implemented

🏆 Key Achievements

🔮 Future Enhancements