Skip to content

Commit fb253e4

Browse files
committed
Updated Readme
1 parent 0f7d1a0 commit fb253e4

3 files changed

Lines changed: 72 additions & 23 deletions

File tree

Makefile

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,29 @@ help:
4141
@echo ""
4242
@echo "Setup & Cleanup:"
4343
@echo " install Install all dependencies via Pipenv"
44+
@echo " create-env Create .env file with required AIRFLOW_UID and alert config placeholders"
4445
@echo " clean Remove __pycache__ and .pyc files"
4546

4647
# --------- Setup ---------
4748

4849
install:
4950
pipenv install --dev
5051

52+
create-env:
53+
@if [ -f .env ]; then \
54+
echo "✅ .env file already exists. Skipping creation."; \
55+
else \
56+
echo "🔧 Creating .env template file..."; \
57+
echo "AIRFLOW_UID=50000" > .env; \
58+
echo "# SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXX/YYY/ZZZ" >> .env; \
59+
echo "# ALERT_EMAIL_FROM=your_email@example.com" >> .env; \
60+
echo "# ALERT_EMAIL_TO=recipient@example.com" >> .env; \
61+
echo "# ALERT_EMAIL_PASSWORD=your_app_password" >> .env; \
62+
echo "# ALERT_EMAIL_SMTP=smtp.gmail.com" >> .env; \
63+
echo "# ALERT_EMAIL_PORT=587" >> .env; \
64+
echo "✅ .env file created."; \
65+
fi
66+
5167
# --------- Code Quality ---------
5268

5369
format:

README.md

Lines changed: 56 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,49 @@
11
# 🛠️ GitHub Anomaly Detection Pipeline
22

3+
## 💡 Motivation & Use Case
4+
5+
GitHub hosts an enormous amount of user activity, including pull requests, issues, forks, and stars. Monitoring this activity in real-time is essential for identifying unusual or malicious behavior — such as bots, misuse, or suspicious spikes in contributions.
6+
7+
This project aims to build a **production-grade anomaly detection system** to:
8+
9+
- Detect abnormal GitHub user behavior (e.g., excessive PRs, bot-like stars)
10+
- Alert maintainers and admins in real time via Slack or email
11+
- Serve anomaly scores via API and support continuous retraining
12+
- Visualize trends, drift, and recent activity using an interactive dashboard
13+
14+
---
15+
316
A production-grade anomaly detection system for GitHub user behavior using:
417

518
- **Apache Airflow** for orchestration
619
- **Pandas + Scikit-learn (Isolation Forest)** for modeling and anomaly detection
720
- **Alerts: Email & Slack** alerting mechanisms for anomaly spikes and data drift
821
- **FastAPI** for real-time inference
922
- **Pytest, Black, Flake8** for testing and linting
10-
- **Pre-commit + GitHub Actions** for CI/CD and code quality
11-
- **Streamlit UI** for visualization
23+
- **Pre-commit + GitHub Actions** for CI/CD and code quality
24+
- **Streamlit UI** for visualization
25+
- **Terraform** for infrastructure-as-code provisioning (MLflow)
1226

1327
---
1428

15-
## 📦 Project Structure
16-
17-
# To Do
29+
## 🤖 Too lazy for copy-pasting commands?
1830

31+
If you're like me and hate typing out commands... good news!
32+
Just use the **Makefile** to do all the boring stuff for you:
1933

20-
---
34+
```bash
35+
make help
36+
```
2137

22-
## 📈 Use Case
38+
See full Makefile usage [here](#makefile-usage) — from setup to linting, testing, API, Airflow, and Terraform infra!
2339

24-
The pipeline detects anomalies in GitHub user behavior on an hourly basis and can:
40+
## 📦 Project Structure
2541

26-
- Alert on suspicious activity (e.g., bot-like behavior)
27-
- Serve anomaly scores via API
28-
- Continuously retrain and monitor model health
42+
# Coming Soon
2943

3044
---
3145

32-
## ⚙️ Setup
46+
## ⚙️ Setup Instructions
3347

3448
### 1. Clone and install dependencies
3549

@@ -45,6 +59,34 @@ pipenv shell
4559
pip install -r requirements.txt
4660
```
4761

62+
### 📄 .env Configuration (Required)
63+
64+
Before running Airflow, you must create a `.env` file in the project root with at least this line:
65+
66+
```env
67+
AIRFLOW_UID=50000
68+
```
69+
70+
This is required for Docker to set correct permissions inside the Airflow containers.
71+
72+
#### Optional (For Email & Slack Alerts)
73+
74+
If you'd like to enable alerts, you can also include the following variables:
75+
76+
```env
77+
# Slack Alerts
78+
SLACK_API_TOKEN=xoxb-...
79+
SLACK_CHANNEL=#your-channel
80+
81+
# Email Alerts
82+
EMAIL_SENDER=your_email@example.com
83+
EMAIL_PASSWORD=your_email_app_password
84+
EMAIL_RECEIVER=receiver@example.com
85+
EMAIL_SMTP=smtp.gmail.com
86+
EMAIL_PORT=587
87+
```
88+
---
89+
4890
### 2. ⚙️ Airflow + 📈 MLflow Integration
4991

5092
This project uses Apache Airflow to orchestrate a real-time ML pipeline and MLflow to track model training, metrics, and artifacts.
@@ -317,17 +359,7 @@ This removes the MLflow container provisioned by Terraform.
317359

318360
### 7. 🧭 Architecture
319361

320-
To Do
321-
322-
[GitHub Archive Logs]
323-
324-
[Airflow DAG]
325-
326-
[Feature Engineering]
327-
328-
[Isolation Forest Model]
329-
↓ ↘
330-
[API: FastAPI] [Alerts / Drift Monitor]
362+
![Architecture](assets/architecture.png)
331363

332364
### 8. 🧹 Clean Code
333365

@@ -351,6 +383,7 @@ make lint
351383

352384
```bash
353385
make install # Install all dependencies via Pipenv (both runtime and dev)
386+
make create-env # Create .env file with required AIRFLOW_UID and alert config placeholders
354387
make clean # Remove all __pycache__ folders and .pyc files
355388
```
356389

assets/architecture.png

2.83 MB
Loading

0 commit comments

Comments
 (0)