|
| 1 | +--- |
| 2 | +title: "CI/CD/CT: Automated Pipelines for ML" |
| 3 | +sidebar_label: CI/CD for ML |
| 4 | +description: "Exploring Continuous Integration, Continuous Delivery, and Continuous Training in MLOps." |
| 5 | +tags: [mlops, cicd, continuous-training, automation, jenkins, github-actions] |
| 6 | +--- |
| 7 | + |
| 8 | +In traditional software, we have **CI** (Continuous Integration) and **CD** (Continuous Delivery). However, Machine Learning introduces a third dimension: **Data**. Because data changes over time, we need a third pillar: **CT** (Continuous Training). |
| 9 | + |
| 10 | +## 1. The Three Pillars of MLOps Automation |
| 11 | + |
| 12 | +To build a robust ML system, we must automate three distinct cycles: |
| 13 | + |
| 14 | +### Continuous Integration (CI) |
| 15 | +Beyond testing code, ML CI involves testing **data schemas** and **models**. |
| 16 | +* **Code Testing:** Unit tests for feature engineering logic. |
| 17 | +* **Data Testing:** Validating that incoming data matches expected distributions. |
| 18 | +* **Model Validation:** Ensuring the model architecture compiles and training runs without memory leaks. |
| 19 | + |
| 20 | +### Continuous Delivery (CD) |
| 21 | +This is the automation of deploying the model as a service. |
| 22 | +* **Artifact Packaging:** Wrapping the model in a [Docker container](./model-deployment#2-the-containerization-standard-docker). |
| 23 | +* **Integration Testing:** Ensuring the API endpoint responds correctly to requests. |
| 24 | +* **Deployment:** Moving the model to a staging or production environment using [Canary or Blue-Green strategies](./model-deployment#3-deployment-strategies). |
| 25 | + |
| 26 | +### Continuous Training (CT) |
| 27 | +This is unique to ML. It is a property of an ML system that automatically retrains and serves the model based on new data or [Model Drift](./monitoring#1-why-models-decay). |
| 28 | + |
| 29 | +## 2. The MLOps Maturity Levels |
| 30 | + |
| 31 | +Google defines the evolution of CI/CD in ML through three levels of maturity: |
| 32 | + |
| 33 | +1. **Level 0 (Manual):** Every step (data prep, training, deployment) is done manually in notebooks. |
| 34 | +2. **Level 1 (Automated Training):** The pipeline is automated. Whenever new data arrives, the training and validation happen automatically (CT). |
| 35 | +3. **Level 2 (CI/CD Pipeline Automation):** The entire workflow—from code commits to model monitoring—is a fully automated CI/CD pipeline. |
| 36 | + |
| 37 | +## 3. The Automated Workflow |
| 38 | + |
| 39 | +The following diagram illustrates how a code change or a "Drift" alert triggers a sequence of automated events. |
| 40 | + |
| 41 | +```mermaid |
| 42 | +graph TD |
| 43 | + Code[Code Commit / Data Drift Alert] --> CI[CI: Build & Test] |
| 44 | + |
| 45 | + subgraph Pipeline [Automated ML Pipeline] |
| 46 | + CI --> Train[Continuous Training] |
| 47 | + Train --> Eval[Model Evaluation] |
| 48 | + Eval --> Validate{Meets Threshold?} |
| 49 | + end |
| 50 | + |
| 51 | + Validate -- No --> Fail[Alert Developer] |
| 52 | + Validate -- Yes --> Register[Model Registry] |
| 53 | + |
| 54 | + Register --> CD[CD: Deploy to Prod] |
| 55 | + CD --> Monitor[Monitoring & Observability] |
| 56 | + Monitor -- Drift Detected --> Code |
| 57 | + |
| 58 | + style Pipeline fill:#f0f4ff,stroke:#5c7aff,stroke-width:2px,color:#333 |
| 59 | + style Validate fill:#fff3e0,stroke:#ef6c00,color:#333 |
| 60 | + style Register fill:#c8e6c9,stroke:#2e7d32,color:#333 |
| 61 | +
|
| 62 | +``` |
| 63 | + |
| 64 | +## 4. Key Components of the Pipeline |
| 65 | + |
| 66 | +* **Feature Store:** A centralized repository where features are stored and shared, ensuring that the same feature logic is used in both training and serving. |
| 67 | +* **Model Registry:** A "version control" for models. It stores trained models, their metadata (hyperparameters, accuracy), and their environment dependencies. |
| 68 | +* **Metadata Store:** Records every execution of the pipeline, allowing you to trace a specific model version back to the exact dataset and code used to create it. |
| 69 | + |
| 70 | +## 5. Tools of the Trade |
| 71 | + |
| 72 | +Depending on your cloud provider, the tools for CI/CD/CT vary: |
| 73 | + |
| 74 | +| Component | Open Source | AWS | Google Cloud | |
| 75 | +| --- | --- | --- | --- | |
| 76 | +| **Orchestration** | Kubeflow / Airflow | Step Functions | Vertex AI Pipelines | |
| 77 | +| **CI/CD** | GitHub Actions / GitLab | CodePipeline | Cloud Build | |
| 78 | +| **Tracking** | MLflow | SageMaker Experiments | Vertex AI Metadata | |
| 79 | +| **Storage** | DVC (Data Version Control) | S3 | GCS | |
| 80 | + |
| 81 | +## 6. Implementation: A GitHub Actions Snippet |
| 82 | + |
| 83 | +A simple CI task to check if a model's accuracy meets a threshold before allowing a "Push" to production. |
| 84 | + |
| 85 | +```yaml |
| 86 | +name: Model Training CI |
| 87 | +on: [push] |
| 88 | + |
| 89 | +jobs: |
| 90 | + train-and-validate: |
| 91 | + runs-on: ubuntu-latest |
| 92 | + steps: |
| 93 | + - name: Checkout code |
| 94 | + uses: actions/checkout@v2 |
| 95 | + |
| 96 | + - name: Set up Python |
| 97 | + uses: actions/setup-python@v2 |
| 98 | + |
| 99 | + - name: Install dependencies |
| 100 | + run: pip install -r requirements.txt |
| 101 | + |
| 102 | + - name: Run Training & Evaluation |
| 103 | + run: python train.py # Script generates 'metrics.json' |
| 104 | + |
| 105 | + - name: Check Accuracy Threshold |
| 106 | + run: | |
| 107 | + ACCURACY=$(jq '.accuracy' metrics.json) |
| 108 | + if (( $(echo "$ACCURACY < 0.85" | bc -l) )); then |
| 109 | + echo "Accuracy too low ($ACCURACY). Deployment failed." |
| 110 | + exit 1 |
| 111 | + fi |
| 112 | +
|
| 113 | +``` |
| 114 | +
|
| 115 | +## References |
| 116 | +
|
| 117 | +* **Google Cloud:** [MLOps: Continuous delivery and automation pipelines](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) |
| 118 | +* **ThoughtWorks:** [Continuous Delivery for Machine Learning (CD4ML)](https://martinfowler.com/articles/cd4ml.html) |
| 119 | +* **MLflow:** [Introduction to Model Registry](https://www.mlflow.org/docs/latest/model-registry.html) |
| 120 | +
|
| 121 | +--- |
| 122 | +
|
| 123 | +**With CI/CD/CT, your model is now a living, breathing part of your infrastructure. But how do we ensure it remains ethical and unbiased throughout these cycles?** |
0 commit comments