An end-to-end machine learning pipeline for detecting anomalies in multivariate vehicle sensor data using Isolation Forest and Airflow orchestration.
This project simulates a predictive maintenance system using:
- NASA CMAPSS turbofan engine dataset
- Sliding window time-series feature generation
- Unsupervised anomaly detection
- Airflow DAGs for repeatable ML workflows
- Jupyter-based visual dashboards
- Python, Pandas, NumPy, Scikit-learn
- Airflow (via Docker)
- Matplotlib, Seaborn
- Isolation Forest (unsupervised ML)
- Clone the repo:
git clone https://github.com/privaelo/engine-sensor-anomaly-detection.git
cd engine-sensor-anomaly-pipeline- Set up Airflow with Docker:
cd docker-compose-airflow
docker-compose up airflow-init
docker-compose up- Add your DAG to
dags/and run it via the Airflow UI.
- FastAPI scoring microservice
- Slack/email anomaly alerts
- Retraining DAG
- Streamlit dashboard
engine-sensor-anomaly-pipeline/
├── README.md
├── requirements.txt
├── docker-compose-airflow/
│ └── docker-compose.yaml
├── dags/
│ ├── score_pipeline.py
│ ├── features.py
├── data/
│ └── processed/
├── models/
│ └── isolation_forest.pkl
├── outputs/
│ ├── anomaly_alerts.csv
│ ├── sensor_trend_anomalies.png
│ ├── anomaly_counts_per_engine.png
│ └── sensor_correlation_heatmap.png
├── notebooks/
│ └── anomaly_dashboard.ipynb