Demand Forecasting + Inventory Optimization System

A production-style retail analytics project that turns historical sales into demand forecasts and reorder recommendations.

Validated on 956,500 daily sales rows across 500 store-item series, with best backtest RMSSE = 0.690 and 311 reorder-now recommendations generated from the latest run.

This repository is designed to look and behave like a real internal decision-support tool:

a reproducible data pipeline built on DuckDB and SQL
a 3-model forecast benchmark using a shared rolling backtest
an API layer for serving metrics, forecasts, and recommendations
a Streamlit dashboard for planners and managers

Important project note: the M5 dataset includes retail sales, prices, and calendar signals, but it does not include real on-hand inventory or purchase-order tables. For that reason, the inventory layer uses documented, reproducible policy inputs to convert forecasts into realistic reorder decisions.

The goal is not just model accuracy. The goal is to answer a business question:

What should a retail manager order next, and why?

What The System Does

Ingests raw M5 retail data from data/raw/m5
Filters to the top 50 items by historical demand
Builds a store-item-day warehouse table in DuckDB
Engineers lag, rolling, calendar, SNAP, event, and price features
Benchmarks 3 forecast approaches:
- seasonal naive baseline
- Holt-Winters exponential smoothing
- HistGradientBoostingRegressor
Compares models with:
- MAE
- RMSE
- RMSSE
- WAPE as a supplemental business-facing metric
Selects one production model family for v1
Generates inventory recommendations with:
- lead time demand
- safety stock
- reorder point
- days of cover
- case-pack rounded recommended order quantity
Serves results through FastAPI and visualizes them in Streamlit

Why RMSSE Instead Of MAPE

MAPE can behave badly when actual demand is zero or close to zero, which happens often in retail item-store series.

RMSSE is a better fit here because it:

normalizes error by each series' historical scale
supports comparison across fast-moving and slow-moving items
aligns with M5-style retail forecasting conventions

WAPE is still included because it is easier to explain to business stakeholders.

Architecture

flowchart LR
    A["M5 CSV Files"] --> B["Ingestion Pipeline"]
    B --> C["DuckDB Warehouse"]
    C --> D["Feature Engineering"]
    D --> E["Rolling Backtest"]
    E --> F["Model Summary"]
    E --> G["Production Forecast"]
    G --> H["Inventory Recommendation Engine"]
    C --> I["SQL Views"]
    F --> J["FastAPI"]
    G --> J
    H --> J
    I --> J
    J --> K["Streamlit Dashboard"]

Validated V1 Results

The current v1 build has already been run end to end on the real M5 dataset subset used by this project.

Historical window: 2011-01-29 to 2016-04-24
Forecast horizon: 28 days from 2016-04-25 to 2016-05-22
Warehouse size: 956,500 daily sales rows across 500 store-item series
Production forecast output: 14,000 forecast rows
Selected production model: HistGradientBoostingRegressor
Best validation metrics:
- MAE = 6.10
- RMSE = 7.51
- RMSSE = 0.690
- WAPE = 2.04 (supplemental business-facing metric)
Inventory actionability: 311 series were flagged for reorder in the latest recommendation run

Project Structure

.
|-- data/raw/m5
|-- docs/images/
|-- sql/
|-- src/retail_forecasting/
|   |-- api/
|   |-- dashboard/
|   |-- forecasting/
|   `-- pipeline/
|-- tests/
|-- Dockerfile.api
|-- Dockerfile.dashboard
`-- docker-compose.yml

Expected Raw Data

Place these files inside data/raw/m5:

calendar.csv
sell_prices.csv
sales_train_validation.csv

Quick Start

1. Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate

2. Install dependencies

pip install '.[dev]'

3. Review the example configuration

The repository includes a public .env.example file that documents the expected paths and local service ports.

It contains example values only. The real .env file is intentionally ignored by git and should stay local if you create one for your own setup.

4. Run the full pipeline

retail-forecast run-all

This creates:

DuckDB warehouse tables in outputs/warehouse/m5.duckdb
model metadata in outputs/artifacts/model_metadata.json
production forecasts and inventory recommendations in DuckDB tables

5. Start the API

retail-forecast serve-api

6. Start the dashboard

streamlit run src/retail_forecasting/dashboard/app.py

CLI Commands

retail-forecast ingest
retail-forecast train
retail-forecast recommend
retail-forecast run-all
retail-forecast serve-api

SQL Visibility

SQL is intentionally part of the project story, not hidden behind Python:

API Endpoints

GET /health
GET /catalog
GET /metrics
GET /forecast?store_id=&item_id=&horizon=
GET /recommendations?store_id=
GET /series/{store_id}/{item_id}

Dashboard Views

Executive overview with KPI cards
Forecast explorer for a store-item series
Model comparison for MAE, RMSE, RMSSE, and WAPE
Inventory planner with reorder recommendations

Screenshots

Executive Overview

Forecast Explorer

Inventory Logic In V1

This version intentionally keeps the inventory math simple and easy to explain:

lead_time_demand = sum of forecast demand over lead time
safety_stock = z(service_level) * validation_rmse * sqrt(lead_time_days)
reorder_point = lead_time_demand + safety_stock
days_of_cover = current_on_hand / mean_daily_forecast
recommended_order_qty = max(reorder_point - current_on_hand, 0) rounded up to case pack

This is enough to demonstrate business decision-making without drifting into unrealistic cost optimization.

Testing

Run the test suite with:

pytest

The test suite covers:

metric calculations
leakage-safe feature engineering
inventory formulas and case-pack rounding
API response shapes
an end-to-end pipeline smoke test on a synthetic M5-like dataset

Docker

Run the API and dashboard with Docker Compose:

docker compose up --build

Development

Contributor setup and local workflow notes live in CONTRIBUTING.md
The project targets Python 3.11+
Run pytest before opening a pull request or pushing a larger refactor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demand Forecasting + Inventory Optimization System

What The System Does

Why RMSSE Instead Of MAPE

Architecture

Validated V1 Results

Project Structure

Expected Raw Data

Quick Start

1. Create a virtual environment

2. Install dependencies

3. Review the example configuration

4. Run the full pipeline

5. Start the API

6. Start the dashboard

CLI Commands

SQL Visibility

API Endpoints

Dashboard Views

Screenshots

Executive Overview

Forecast Explorer

Inventory Logic In V1

Testing

Docker

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs/images		docs/images
outputs		outputs
sql		sql
src/retail_forecasting		src/retail_forecasting
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.api		Dockerfile.api
Dockerfile.dashboard		Dockerfile.dashboard
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Demand Forecasting + Inventory Optimization System

What The System Does

Why RMSSE Instead Of MAPE

Architecture

Validated V1 Results

Project Structure

Expected Raw Data

Quick Start

1. Create a virtual environment

2. Install dependencies

3. Review the example configuration

4. Run the full pipeline

5. Start the API

6. Start the dashboard

CLI Commands

SQL Visibility

API Endpoints

Dashboard Views

Screenshots

Executive Overview

Forecast Explorer

Inventory Logic In V1

Testing

Docker

Development

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages