Skip to content

Commit f89a957

Browse files
committed
Update README
1 parent 8df3462 commit f89a957

1 file changed

Lines changed: 58 additions & 2 deletions

File tree

README.md

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
# 🛠️ GitHub Anomaly Detection Pipeline
44

5+
<span id="motivation"></span>
56
## 💡 Motivation & Use Case
67

78
GitHub hosts an enormous amount of user activity, including pull requests, issues, forks, and stars. Monitoring this activity in real-time is essential for identifying unusual or malicious behavior — such as bots, misuse, or suspicious spikes in contributions.
@@ -33,6 +34,10 @@ A production-grade anomaly detection system for GitHub user behavior using:
3334

3435
---
3536

37+
A quick [guide](#evaluator-guide) for evaluators to verify all requirements and navigate the implementation easily.
38+
39+
---
40+
3641
## 🤖 Too lazy for copy-pasting commands?
3742

3843
If you're like me and hate typing out commands... good news!
@@ -44,6 +49,7 @@ make help
4449

4550
See full Makefile usage [here](#makefile-usage) — from setup to linting, testing, API, Airflow, and Terraform infra!
4651

52+
<span id="project-structure"></span>
4753
## 📦 Project Structure
4854

4955
```java
@@ -72,7 +78,7 @@ See full Makefile usage [here](#makefile-usage) — from setup to linting, testi
7278
```
7379

7480
---
75-
81+
<span id="setup-instructions"></span>
7682
## ⚙️ Setup Instructions
7783

7884
### 1. Clone and install dependencies
@@ -89,6 +95,7 @@ pipenv shell
8995
pip install -r requirements.txt
9096
```
9197

98+
<span id="env-config"></span>
9299
### 📄 .env Configuration (Required)
93100

94101
Before running Airflow, you must create a `.env` file in the project root with at least following content:
@@ -163,12 +170,14 @@ Once up, access:
163170
- Airflow UI: http://localhost:8080 (Login: airflow / airflow)
164171
- MLflow UI: http://localhost:5000
165172

173+
<span id="airflow-dag"></span>
166174
#### ⏱️ 2. Airflow DAGs Overview
167175

168176
- daily_github_inference: Download → Feature Engineering → Inference
169177
- daily_monitoring_dag: Drift checks, cleanup, alerting
170178
- retraining_dag: Triggers model training weekly and logs it to MLflow
171179

180+
<span id="mlflow"></span>
172181
#### 📈 3. MLflow Experiment Tracking
173182

174183
Model training is handled by:
@@ -215,6 +224,7 @@ python github_pipeline/train_model.py
215224
```
216225
The latest parquet file is used automatically. Model and scaler are saved to models/.
217226

227+
<span id="fastapi"></span>
218228
### 4. 🚀 FastAPI Inference
219229

220230
#### Build & Run
@@ -232,6 +242,7 @@ curl -X POST http://localhost:8000/predict \
232242
-d '{"features": [12, 0, 1, 0, 4]}'
233243
```
234244

245+
<span id="alerts"></span>
235246
### 5. 📣 Alerts: Email & Slack
236247

237248
This project includes automated alerting mechanisms for anomaly spikes and data drift, integrated into the daily_monitoring_dag DAG.
@@ -273,6 +284,7 @@ alerts/alerting.py
273284

274285
These generate alert messages and send them through email and Slack if thresholds are breached.
275286

287+
<span id="ci_cd"></span>
276288
### 6. ✅ CI/CD with GitHub Actions
277289

278290
The .github/workflows/ci.yml file runs on push:
@@ -296,6 +308,7 @@ Configured via:
296308
- .pre-commit-config.yaml
297309
- .flake8 (ignore = E501)
298310

311+
<span id="testing"></span>
299312
### 8. 🧪 Testing
300313

301314
This project includes both unit tests and a full integration test to ensure end-to-end pipeline functionality.
@@ -453,6 +466,7 @@ make create-env # Create .env file with AIRFLOW_UID, alert placeholders, and S
453466
make clean # Remove all __pycache__ folders and .pyc files
454467
```
455468

469+
<span id="code-quality"></span>
456470
#### 🧪 Code Quality & Testing
457471

458472
```bash
@@ -511,4 +525,46 @@ make help # Prints a summary of all available targets and their descriptions.
511525
Built by Rajat Gupta as part of an MLOps portfolio.
512526
Inspired by real-time event pipelines and anomaly detection architectures used in production.
513527

514-
### 14. 📝 License
528+
### 15. 📝 License
529+
530+
<span id="evaluator-guide"></span>
531+
### ✅ Evaluation Criteria for MLOps Zoomcamp
532+
533+
Each criterion below links to the relevant section of this README to help evaluators verify the implementation easily.
534+
535+
#### 🧠 Problem Description — 2 points
536+
537+
✅ The project clearly defines the problem of detecting anomalous GitHub activity using real-time machine learning. See [here](#motivation)
538+
539+
#### ☁️ Cloud — 4 points
540+
541+
✅ The project runs in GitHub Codespaces and supports AWS S3 with a USE_S3 toggle. See [here](#env-config)
542+
543+
#### 📈 Experiment Tracking & Model Registry — 4 points
544+
545+
✅ MLflow is fully integrated to track experiments and register models. See [here](#mlflow)
546+
547+
#### 🛠️ Workflow Orchestration — 4 points
548+
549+
✅ Uses Apache Airflow with 3 deployed DAGs for inference, monitoring, and retraining. See [here](#airflow-dag)
550+
551+
#### 🚀 Model Deployment — 4 points
552+
553+
✅ Model is served via FastAPI and fully containerized for deployment. See [here](#fast-api)
554+
555+
#### 📊 Model Monitoring — 4 points
556+
557+
✅ Implements drift detection, anomaly thresholding, and sends alerts via Slack and Email. See [here](#alerts)
558+
559+
#### ♻️ Reproducibility — 4 points
560+
561+
✅ The project is fully reproducible with clear instructions, dependency locking, and data structure. See [here](#setup)
562+
563+
#### ✅ Best Practices — 7 points
564+
565+
- **Unit tests**: Pytest-based unit tests on core components. See [here](#testing)
566+
- **Integration test**: Full integration test to validate the entire pipeline. See [here](#testing)
567+
- **Linter & Code formatter**: Uses Black and Flake8 with Makefile targets and pre-commit hooks. See [here](#code-quality)
568+
- **Makefile**: Includes targets for install, lint, test, format, build, and airflow. See [here](#makefile-usage)
569+
- **Pre-commit hooks**: Automatically formats and checks code before commits. See [here](#code-quality)
570+
- **CI/CD pipeline**: GitHub Actions run tests, lint, and build containers on push. See [here](#ci_cd)

0 commit comments

Comments
 (0)