Skip to content

Commit 375ecf9

Browse files
committed
formatting README
1 parent 2ed9811 commit 375ecf9

2 files changed

Lines changed: 68 additions & 366 deletions

File tree

1_README.md

Lines changed: 68 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,11 @@ This project predicts **French departmental crime rates** by category using offi
4242
---
4343

4444
## 📁 Project Structure
45+
46+
4547
oasis-security/
4648
├── .github/
47-
│ └── workflows/ # GitHub Act ions CI/CD
49+
│ └── workflows/ # GitHub Actions CI/CD
4850
├── data/ # Cleaned datasets (.parquet)
4951
├── docs/
5052
│ └── crime_predictor/ # Technical documentation
@@ -66,7 +68,7 @@ oasis-security/
6668
├── app.py # Main Streamlit dashboard
6769
├── script_crimes_et_delits.py # Data collection & cleaning
6870
├── Dockerfile # Multi-stage build (train → production)
69-
├── docker-compose.yml # Full stack (MLflow + Postgres + API)
71+
├── docker-compose.yml # Full stack (MLflow + Postgres + API)
7072
├── requirements.txt
7173
└── README.md
7274

@@ -84,27 +86,28 @@ pip install -r requirements.txt
8486

8587
### 2. Download and Clean Data
8688
python script_crimes_et_delits.py
87-
# → generates data/crimes_clean.parquet
89+
→ generates data/crimes_clean.parquet
8890

89-
### 3. Train the Model
91+
### 3. Train the Model
9092
python models/crime_predictor/src/train.py
91-
# → compares 4 models, logs to MLflow, saves best model
92-
# → generates models/crime_predictor/models/crime_predictor.pkl
93-
# → generates models/crime_predictor/models/metrics.json
93+
→ compares 4 models, logs to MLflow, saves best model
94+
→ generates models/crime_predictor/models/crime_predictor.pkl
95+
→ generates models/crime_predictor/models/metrics.json
9496

95-
### 4. Launch the Dashboard
97+
#### 4. Launch the Dashboard
9698
streamlit run app.py
97-
# http://localhost:8501
99+
http://localhost:8501
98100

99101
### 5. Launch the API
100102
uvicorn models.crime_predictor.src.predict:app --reload --port 8000
101-
# http://localhost:8000/docs
103+
http://localhost:8000/docs
102104

103105
---
104-
🔄 Data Pipeline
105-
data.gouv.fr (SSMSI)
106+
107+
## 🔄 Data Pipeline
108+
data.gouv.fr (SSMSI)
106109
107-
script_crimes_et_delits.py
110+
script_crimes_et_delits.py
108111
├── Download CSV (requests)
109112
├── Normalize column names (snake_case)
110113
├── Remove duplicates
@@ -114,43 +117,46 @@ script_crimes_et_delits.py
114117
│ ├── annual_rate_change (pct_change by dep × category)
115118
│ └── year_norm (normalized [0, 1])
116119
└── Save Parquet (Snappy)
117-
120+
118121
data/crimes_clean.parquet
119-
Raw data: 8 columns, ~50,000 rows
120-
After cleaning: 10 columns, ~49,000 rows (<2% loss)
122+
123+
Raw data: 8 columns, ~50,000 rows
124+
After cleaning: 10 columns, ~49,000 rows (<2% loss)
121125

122126
---
123-
🤖 Modeling & Results
127+
128+
## 🤖 Modeling & Results
124129
Features
125-
| Feature | Description |
126-
| ------------- | ----------------------------- |
127-
| `annee` | Year (int) |
128-
| `dep_encoded` | Department (LabelEncoded) |
129-
| `cat_encoded` | Crime category (LabelEncoded) |
130-
| `annee_norm` | Normalized year [0,1] |
130+
Feature Description
131+
annee Year (int)
132+
dep_encoded Department (LabelEncoded)
133+
cat_encoded Crime category (LabelEncoded)
134+
annee_norm Normalized year [0,1]
131135

132136
Target: tauxpour100000hab (crime rate per 100,000 inhabitants)
133137
Split: 80% train / 20% test — random seed 42
134138
Validation: 5-fold cross-validation on training set
135139

136-
Model Comparison (Test Set)
137140
| Model | R² test | RMSE | MAE | CV R² (±std) |
138141
| ----------------- | -------- | -------- | -------- | --------------- |
139142
| Ridge | 0.71 | 87.4 | 62.1 | 0.69 ± 0.03 |
140143
| Random Forest | 0.89 | 54.2 | 38.7 | 0.87 ± 0.02 |
141144
| Gradient Boosting | 0.88 | 56.1 | 40.2 | 0.86 ± 0.02 |
142145
| **XGBoost**| **0.91** | **49.8** | **35.3** | **0.90 ± 0.01** |
143146

147+
144148
Best model: XGBoost — R²=0.91 on test set
145149
Low train/test gap → no significant overfitting
146150
Low CV variance → confirmed robustness
147151

148152
MLflow Tracking
149153
mlflow ui --backend-store-uri models/crime_predictor/mlruns
150-
# http://localhost:5000
154+
http://localhost:5000
151155

152156
---
153-
📊 Streamlit Dashboard
157+
158+
## 📊 Streamlit Dashboard
159+
154160
5 interactive pages:
155161
| Page | Content |
156162
| -------------------- | ----------------------------------------------------- |
@@ -161,7 +167,8 @@ mlflow ui --backend-store-uri models/crime_predictor/mlruns
161167
| Ethics & Limitations | Biases and usage limits |
162168

163169
---
164-
🌐 FastAPI Endpoints
170+
171+
## 🌐 FastAPI Endpoints
165172
Available Endpoints
166173
| Method | Endpoint | Description |
167174
| ------ | ---------- | --------------------------------- |
@@ -173,7 +180,6 @@ Example Request
173180
curl -X POST http://localhost:8000/predict \
174181
-H "Content-Type: application/json" \
175182
-d '{"annee": 2025, "dep_encoded": 5, "cat_encoded": 0, "annee_norm": 1.0}'
176-
177183
{
178184
"predicted_rate": 312.47,
179185
"unit": "incidents per 100,000 inhabitants",
@@ -182,7 +188,8 @@ curl -X POST http://localhost:8000/predict \
182188
}
183189

184190
---
185-
🧪 Tests
191+
192+
## 🧪 Tests
186193
# Run all tests
187194
pytest models/crime_predictor/tests/ -v
188195

@@ -196,61 +203,66 @@ Test coverage:
196203
| `TestModel` | Shape, type, positivity, R², determinism (7 assertions) |
197204
| `TestSerialization` | Joblib serialization, metrics.json structure (2 assertions) |
198205

199-
---
200-
🐳 Docker & CI/CD
201-
Multi-stage Docker
202-
# Build (training → production)
206+
---
207+
208+
## 🐳 Docker & CI/CD
209+
# Multi-stage Docker
210+
Build (training → production)
203211
docker build -t oasis-security:latest .
204212

205213
# Run API
206214
docker run -p 8000:8000 oasis-security:latest
207215

208-
Full Stack (MLflow + Postgres + API)
216+
# Full Stack (MLflow + Postgres + API)
209217
docker-compose up -d
210-
# MLflow UI → http://localhost:5000
211-
# API → http://localhost:8000/docs
218+
MLflow UI → http://localhost:5000
219+
API → http://localhost:8000/docs
220+
221+
# GitHub Actions CI/CD
212222

213-
GitHub Actions CI/CD
214223
.github/workflows/ci-cd.yml triggers on each push:
215224

216-
Linting (flake8)
217-
Unit tests (pytest)
218-
Docker build
219-
Push Docker image to GHCR
225+
1. Linting (flake8)
226+
2. Unit tests (pytest)
227+
3. Docker build
228+
4. Push Docker image to GHCR
220229

221-
---
222-
⚠️ Ethics & Limitations
230+
---
231+
232+
## ⚠️ Ethics & Limitations
223233

224234
This model is a statistical exploration tool, not an operational decision system.
225235

226236
Data limitations:
227237

228-
Covers only recorded crimes (dark figure estimated 50–80%)
229-
Recording practices vary by department
230-
No infra-departmental data
238+
- Covers only recorded crimes (dark figure estimated 50–80%)
239+
- Recording practices vary by department
240+
- No infra-departmental data
231241

232242
Model biases:
233243

234-
Reflects reporting biases
235-
Correlation ≠ causation
236-
Not suitable for external shocks (COVID, economic crises)
244+
- Reflects reporting biases
245+
- Correlation ≠ causation
246+
- Not suitable for external shocks (COVID, economic crises)
237247

238248
Prohibited use:
239249

240-
Predictive targeting of individuals or geographic areas
241-
Judicial or penal decision-making
250+
- Predictive targeting of individuals or geographic areas
251+
- Judicial or penal decision-making
242252

243253
Compliance: aggregated anonymized open data — no personal data used.
244254

245255
---
256+
246257
📜 License
247-
MIT — see LICENSE
258+
MIT — see LICENSE
248259

249260
---
261+
250262
👤 Author
251-
Frédéric Tellier — Data Scientist
252-
LinkedIN: https://www.linkedin.com/in/fr%C3%A9d%C3%A9ric-tellier-8a9170283/ ; Portfolio: https://github.com/Dreipfelt/
253263

254-
Project developed as part of CDSD certification — 2025
264+
Frédéric Tellier — Data Scientist
265+
[LinkedIn](**url**https://www.linkedin.com/in/fr%C3%A9d%C3%A9ric-tellier-8a9170283/)
266+
| [Portfolio]([url](https://github.com/Dreipfelt/))
255267

256-
---
268+
Project developed as part of CDSD certification — 2026

0 commit comments

Comments
 (0)