LineLab is an NYC transit planning sandbox for drafting proposed subway corridors, simulating demand impact, and comparing proposed travel time/cost against current service.
It includes:
- FastAPI backend for simulation and ML-backed inference
- React + TypeScript + Vite frontend with an interactive map
- Data ingestion/parsing scripts for subway, bus, and census inputs
- Model training scripts for ridership, operating cost, and route travel time
If you only want to run the app quickly (without rebuilding all data/models):
# 1) install dependencies
make setup
# 2) run backend (terminal A)
make backend
# 3) run frontend (terminal B)
make frontendOpen:
- frontend:
http://localhost:5173 - backend health:
http://localhost:8000/health
Quick sanity checks:
# backend syntax
source .venv/bin/activate && python -m py_compile src/api.py
# frontend build
cd frontend && npm run buildWhen to skip this and use the full guide:
- if you need fresh pulls from NY Open Data
- if you changed training code or want to retrain models
- if you need to regenerate
data/processedartifacts
- Python backend: FastAPI, NetworkX, XGBoost, pandas/geopandas
- Frontend: React 19, TypeScript, Vite, Leaflet
- Data sources: GTFS, NY Open Data (Socrata), Census API, OSM
- src: backend API + path utilities
- frontend: web app
- data: data pull/parsing/training scripts and artifacts
- data/raw: raw ingested datasets
- data/processed: cleaned/engineered tables and graphs
- data/models: trained model artifacts used at runtime
- Makefile: common setup/run helpers
- Python 3.11+
- Node.js 18+
- npm
- make
- curl + unzip
Linux note:
- XGBoost needs OpenMP runtime (typically available via libgomp on Linux).
macOS note:
- install libomp (for XGBoost):
brew install libomp
From repo root:
make setupThis creates:
- Python virtual environment at
.venv - Python dependencies from requirements.txt
- Frontend dependencies from frontend/package.json
- base directories and
.envtemplate
Optional: activate the venv manually for direct script runs:
source .venv/bin/activateUse two terminals.
Terminal 1 (backend):
make backendTerminal 2 (frontend):
make frontendEndpoints:
- Backend health:
http://localhost:8000/health - Frontend:
http://localhost:5173
A template .env is generated by make setup.
Common variables:
NYC_OPEN_DATA_APP_TOKEN=optional Socrata token for higher limitsSOCRATA_APP_TOKEN=optional token for pull scriptsMTA_BUS_RESOURCE_ID=optional default bus dataset idCENSUS_API_KEY=required only for census fetch in data/fetch_features.py
If your repo already contains populated data/processed and data/models, you can skip this section.
make dataThis downloads into data/raw:
gtfs_subway.zip+ extractedgtfs_subway/mta_ridership.csv
From repo root:
source .venv/bin/activate
python src/build_graph.pyOutput:
source .venv/bin/activate
python data/mta_parser.py \
--input data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
--output-dir data/processedPrimary outputs:
- data/processed/stations.json
- data/processed/ridership_daily.csv
- data/processed/ridership_hourly.csv
- data/processed/line_summary.csv
- data/processed/hourly_patterns.json
- data/processed/network_graph.json
source .venv/bin/activate
python data/ridership_pull.py --app-token "$SOCRATA_APP_TOKEN"
python data/cost_pull.py --app-token "$SOCRATA_APP_TOKEN"Outputs:
source .venv/bin/activate
python data/fetch_features.py --census-key "$CENSUS_API_KEY"Output used by model training/inference:
Pull:
source .venv/bin/activate
python data/fetch_bus_data.py --resource-id "$MTA_BUS_RESOURCE_ID" --app-token "$SOCRATA_APP_TOKEN"Parse:
source .venv/bin/activate
python data/bus_parser.py \
--input data/raw/MTA_Bus_Hourly_Ridership.csv \
--output-dir data/processedsource .venv/bin/activate
python data/train_model.py \
--features data/raw/station_features.csv \
--hourly-csv data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
--out-dir data/modelsArtifacts:
- data/models/ridership_model.json
- data/models/peak_factor_model.json (if hourly CSV supplied)
- data/models/feature_columns.json
- data/models/training_report.json
source .venv/bin/activate
python data/price_train.py \
--labels-csv data/raw/monthly_operating_cost.csv \
--ridership-monthly-csv data/processed/ridership_monthly.csv \
--line-summary-csv data/processed/line_summary.csv \
--out-dir data/modelsArtifacts:
- data/models/cost_model.json
- data/models/cost_feature_columns.json
- data/models/cost_training_report.json
Create training set from GTFS graph:
source .venv/bin/activate
python data/timegraph_parser.pyTrain and save model:
source .venv/bin/activate
python -c "from data.time_predict import train_model, save_model; m=train_model(); save_model(m)"Artifacts:
There is no formal pytest suite currently. Use the following checks.
source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.pycd frontend
npm run buildcd frontend
npm run lintStart backend, then run:
curl -X POST http://localhost:8000/api/simulate \
-H "Content-Type: application/json" \
-d '{
"train_service": "local",
"stations": [
{"id":"127","name":"Times Sq-42 St","lat":40.75529,"lon":-73.987495,"is_new":false},
{"id":"124","name":"34 St-Penn Station","lat":40.750373,"lon":-73.991057,"is_new":false}
]
}'You should receive JSON with fields such as:
new_line_ridershipoperational_cost_monthlyroute_comparison.is_walking_only
Typical local loop:
# one-time
make setup
# each session
make backend
make frontend
# when changing backend logic
source .venv/bin/activate && python -m py_compile src/api.py
# when changing frontend
cd frontend && npm run buildCurrent Makefile includes preprocess and train targets that call src/preprocess.py and src/train.py, but those files are not present in this repository right now.
For now, use the explicit script commands documented above in sections 7 and 8.
make backendexits with code 130: usually a manual Ctrl+C stop, not a crash.libomperrors on macOS: runbrew install libomp.- Missing model artifacts at runtime: ensure files exist under data/models.
- Missing graph or station data: regenerate data/processed/mta_time_graph.json and data/processed/stations.json.
- Socrata requests fail/rate-limit: set
SOCRATA_APP_TOKENorNYC_OPEN_DATA_APP_TOKEN.
See LICENSE.
This project does not yet include container/IaC deployment files, so use this section as the baseline production runbook.
Use an environment-specific host/port and run behind a reverse proxy.
source .venv/bin/activate
uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 2Recommended production setup:
- reverse proxy with TLS (for example Nginx/Caddy)
- process supervisor (
systemd,supervisord, or equivalent) - separate runtime user (non-root)
- pre-provisioned artifacts under data/models and data/processed
Minimal backend preflight before deploy:
source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.py
curl http://127.0.0.1:8000/healthBuild static assets:
cd frontend
npm ci
npm run buildThen serve frontend/dist from static hosting or a reverse proxy.
Important:
- ensure frontend can reach backend API URL (same-origin proxy or CORS-allowed host)
- rebuild frontend when API payload contracts change