An end-to-end decision system for logistics planning that combines demand forecasting, stochastic simulation, and network optimization.
The project is designed to showcase production-oriented applied science and engineering skills at the intersection of:
- Operations Research
- Machine Learning
- Data Engineering
- MLOps
- API-based deployment
The objective of this project is to build a scalable logistics decision engine that can:
- generate or ingest historical shipment and demand data,
- forecast future demand,
- simulate uncertain logistics scenarios,
- optimize origin-destination flows under capacity and cost constraints,
- expose the full pipeline through an API.
This repository is meant to reflect how real-world planning systems are built: not only with mathematical models, but also with robust data pipelines, modular software design, and deployable services.
- Synthetic or open logistics data generation
- Data processing with Polars
- Analytical queries with DuckDB
- Efficient storage in Parquet format
- Demand prediction using baseline models (Naive, Seasonal Lag, Rolling Window)
- Automatic model evaluation with MAE, MSE, MAPE, WAPE metrics
- Model selection: automatic ranking and best-model selection by configurable metric
- Forecast extraction into standardized schema and demand aggregation per destination
- Feature engineering with lag and rolling statistics
- Experiment tracking with MLflow
- Event-driven simulation of shipment arrivals, delays, and processing
- Stochastic demand generation
- Scenario analysis under uncertainty
- Minimum-cost transportation LP using OR-Tools (GLOP / CBC solvers)
- Capacity-constrained origin-to-destination flow assignment
- Input validation with pre-solve feasibility checks (unreachable destinations, insufficient capacity)
- Integration of forecast-derived demand into downstream optimization
- FastAPI endpoints for simulation, forecasting, and optimization
- Reproducible configuration and modular architecture
- Python 3.11+
- Polars — high-performance DataFrames
- OR-Tools — linear programming solvers (GLOP, CBC)
- Scikit-learn — forecasting evaluation metrics
- NumPy — numerical operations
- Matplotlib — visualization
- PyYAML — configuration management
- pytest / Hypothesis — testing and property-based testing
- DuckDB (planned)
- FastAPI (planned)
- MLflow (planned)
- SimPy (planned, for simulation extensions)
decision-intelligence-logistics-engine/
│
├── data/
│ ├── synthetic/ # parquet files (demand_history, origins, destinations, lanes)
│ └── output/ # metrics summaries, plots
├── notebooks/ # exploratory analysis and prototyping
├── src/
│ ├── data/
│ │ ├── ingestion.py # Reader: parquet file loading
│ │ ├── input_data.py # InputData dataclass
│ │ └── processing/ # per-dataset processors (demand, origins, lanes, destinations)
│ ├── forecasting/
│ │ ├── models/ # BaseForecaster, NaiveForecaster, SeasonalForecaster, RollingWindowForecaster
│ │ ├── pipeline.py # ForecastingPipeline: runs models sequentially
│ │ ├── evaluator.py # Evaluator: MAE, MSE, MAPE, WAPE
│ │ ├── model_selector.py # ModelSelector: best-model selection and ranking
│ │ └── forecast_extractor.py # ForecastExtractor: extraction and demand aggregation
│ ├── optimization/
│ │ └── optimizer.py # Optimizer: min-cost transportation LP (OR-Tools)
│ ├── postprocessing/
│ │ ├── metrics_summary.py # MetricsSummary: collect and export evaluation results
│ │ └── visualization.py # VisualizationEngine: time series plots
│ ├── simulation/ # (planned) stochastic and event-driven simulation
│ ├── api/ # (planned) FastAPI endpoints
│ └── utils/ # config loading, system paths
│
├── tests/ # unit and integration tests
├── configs/ # YAML configuration files
├── scripts/ # runnable end-to-end pipeline scripts
├── pyproject.toml
└── README.md
The end-to-end pipeline (scripts/example_end_to_end_pipeline.py) executes the following stages:
Reader → DataProcessor → ForecastingPipeline → Evaluator → MetricsSummary
→ ModelSelector → ForecastExtractor → Optimizer → Flow decisions
- Data Ingestion — reads parquet files (demand history, origins, destinations, lanes)
- Data Processing — validates, deduplicates, and sorts each dataset
- Forecasting — runs Naive, Seasonal, and Rolling Window models
- Evaluation — computes MAE, MSE, MAPE, WAPE per model
- Model Selection — picks the best model by WAPE
- Demand Aggregation — extracts forecasts and computes average daily demand per destination
- Optimization — solves a min-cost transportation LP to allocate supply to destinations
- Output — prints optimal flow allocation and total shipping cost
- Synthetic logistics network generator
- Demand generation pipeline
- Baseline forecasting model
- Event-driven simulator
- Network optimization model
- API endpoints for end-to-end execution
- MLflow experiment tracking
- Performance benchmarking with Polars vs Pandas
- Docker support
This project is a portfolio piece built to demonstrate the ability to design and implement decision systems that go beyond isolated models.
It emphasizes:
- scalable data handling,
- integration between ML and optimization,
- software engineering discipline,
- reproducibility and deployability.
Core pipeline implemented and functional: data ingestion → forecasting → model selection → optimization.
- Add FastAPI endpoints for end-to-end execution
- Integrate MLflow experiment tracking
- Add Docker support
- Implement stochastic simulation layer
- (Optional) integrate an AI powered layer to answer business related questions on data
Author
Christian Piermarini Applied Scientist / Operations Research / Machine Learning