LineLab

LineLab is an NYC transit planning sandbox for drafting proposed subway corridors, simulating demand impact, and comparing proposed travel time/cost against current service.

It includes:

FastAPI backend for simulation and ML-backed inference
React + TypeScript + Vite frontend with an interactive map
Data ingestion/parsing scripts for subway, bus, and census inputs
Model training scripts for ridership, operating cost, and route travel time

0. 5-Minute Quickstart

If you only want to run the app quickly (without rebuilding all data/models):

# 1) install dependencies
make setup

# 2) run backend (terminal A)
make backend

# 3) run frontend (terminal B)
make frontend

Open:

frontend: http://localhost:5173
backend health: http://localhost:8000/health

Quick sanity checks:

# backend syntax
source .venv/bin/activate && python -m py_compile src/api.py

# frontend build
cd frontend && npm run build

When to skip this and use the full guide:

if you need fresh pulls from NY Open Data
if you changed training code or want to retrain models
if you need to regenerate data/processed artifacts

1. Tech Stack

Python backend: FastAPI, NetworkX, XGBoost, pandas/geopandas
Frontend: React 19, TypeScript, Vite, Leaflet
Data sources: GTFS, NY Open Data (Socrata), Census API, OSM

2. Repository Structure

src: backend API + path utilities
frontend: web app
data: data pull/parsing/training scripts and artifacts
data/raw: raw ingested datasets
data/processed: cleaned/engineered tables and graphs
data/models: trained model artifacts used at runtime
Makefile: common setup/run helpers

3. Prerequisites

Python 3.11+
Node.js 18+
npm
make
curl + unzip

Linux note:

XGBoost needs OpenMP runtime (typically available via libgomp on Linux).

macOS note:

install libomp (for XGBoost): brew install libomp

4. First-Time Setup

From repo root:

make setup

This creates:

Python virtual environment at .venv
Python dependencies from requirements.txt
Frontend dependencies from frontend/package.json
base directories and .env template

Optional: activate the venv manually for direct script runs:

source .venv/bin/activate

5. Running the App Locally

Use two terminals.

Terminal 1 (backend):

make backend

Terminal 2 (frontend):

make frontend

Endpoints:

Backend health: http://localhost:8000/health
Frontend: http://localhost:5173

6. Environment Variables

A template .env is generated by make setup.

Common variables:

NYC_OPEN_DATA_APP_TOKEN= optional Socrata token for higher limits
SOCRATA_APP_TOKEN= optional token for pull scripts
MTA_BUS_RESOURCE_ID= optional default bus dataset id
CENSUS_API_KEY= required only for census fetch in data/fetch_features.py

7. Data Pull and Processing

7.1 Minimal Runtime Data (if you need to rebuild artifacts)

If your repo already contains populated data/processed and data/models, you can skip this section.

7.2 Subway GTFS + ridership pulls via Makefile

make data

This downloads into data/raw:

gtfs_subway.zip + extracted gtfs_subway/
mta_ridership.csv

7.3 Build GTFS time graph used by routing

From repo root:

source .venv/bin/activate
python src/build_graph.py

Output:

data/processed/mta_time_graph.json

7.4 Parse subway hourly ridership into model-ready files

source .venv/bin/activate
python data/mta_parser.py \
  --input data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
  --output-dir data/processed

Primary outputs:

7.5 Pull monthly ridership and monthly operating cost labels

source .venv/bin/activate
python data/ridership_pull.py --app-token "$SOCRATA_APP_TOKEN"
python data/cost_pull.py --app-token "$SOCRATA_APP_TOKEN"

Outputs:

7.6 Optional: fetch census + OSM features and station feature table

source .venv/bin/activate
python data/fetch_features.py --census-key "$CENSUS_API_KEY"

Output used by model training/inference:

data/raw/station_features.csv

7.7 Optional: bus data pipeline

Pull:

source .venv/bin/activate
python data/fetch_bus_data.py --resource-id "$MTA_BUS_RESOURCE_ID" --app-token "$SOCRATA_APP_TOKEN"

Parse:

source .venv/bin/activate
python data/bus_parser.py \
  --input data/raw/MTA_Bus_Hourly_Ridership.csv \
  --output-dir data/processed

8. Model Training

8.1 Ridership + peak-factor models

source .venv/bin/activate
python data/train_model.py \
  --features data/raw/station_features.csv \
  --hourly-csv data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
  --out-dir data/models

Artifacts:

data/models/ridership_model.json
data/models/peak_factor_model.json (if hourly CSV supplied)
data/models/feature_columns.json
data/models/training_report.json

8.2 Monthly operating cost model

source .venv/bin/activate
python data/price_train.py \
  --labels-csv data/raw/monthly_operating_cost.csv \
  --ridership-monthly-csv data/processed/ridership_monthly.csv \
  --line-summary-csv data/processed/line_summary.csv \
  --out-dir data/models

Artifacts:

8.3 Time model for proposed-corridor travel time

Create training set from GTFS graph:

source .venv/bin/activate
python data/timegraph_parser.py

Train and save model:

source .venv/bin/activate
python -c "from data.time_predict import train_model, save_model; m=train_model(); save_model(m)"

Artifacts:

9. Testing and Validation

There is no formal pytest suite currently. Use the following checks.

9.1 Backend syntax check

source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.py

9.2 Frontend build check

cd frontend
npm run build

9.3 Frontend lint

cd frontend
npm run lint

9.4 API smoke test

Start backend, then run:

curl -X POST http://localhost:8000/api/simulate \
  -H "Content-Type: application/json" \
  -d '{
    "train_service": "local",
    "stations": [
      {"id":"127","name":"Times Sq-42 St","lat":40.75529,"lon":-73.987495,"is_new":false},
      {"id":"124","name":"34 St-Penn Station","lat":40.750373,"lon":-73.991057,"is_new":false}
    ]
  }'

You should receive JSON with fields such as:

new_line_ridership
operational_cost_monthly
route_comparison.is_walking_only

10. Dev Workflow

Typical local loop:

# one-time
make setup

# each session
make backend
make frontend

# when changing backend logic
source .venv/bin/activate && python -m py_compile src/api.py

# when changing frontend
cd frontend && npm run build

11. Notes About Make Targets

Current Makefile includes preprocess and train targets that call src/preprocess.py and src/train.py, but those files are not present in this repository right now.

For now, use the explicit script commands documented above in sections 7 and 8.

12. Troubleshooting

make backend exits with code 130: usually a manual Ctrl+C stop, not a crash.
libomp errors on macOS: run brew install libomp.
Missing model artifacts at runtime: ensure files exist under data/models.
Missing graph or station data: regenerate data/processed/mta_time_graph.json and data/processed/stations.json.
Socrata requests fail/rate-limit: set SOCRATA_APP_TOKEN or NYC_OPEN_DATA_APP_TOKEN.

13. License

See LICENSE.

14. Deployment / Production Run

This project does not yet include container/IaC deployment files, so use this section as the baseline production runbook.

14.1 Backend (FastAPI)

Use an environment-specific host/port and run behind a reverse proxy.

source .venv/bin/activate
uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 2

Recommended production setup:

reverse proxy with TLS (for example Nginx/Caddy)
process supervisor (systemd, supervisord, or equivalent)
separate runtime user (non-root)
pre-provisioned artifacts under data/models and data/processed

Minimal backend preflight before deploy:

source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.py
curl http://127.0.0.1:8000/health

14.2 Frontend (Vite build)

Build static assets:

cd frontend
npm ci
npm run build

Then serve frontend/dist from static hosting or a reverse proxy.

Important:

ensure frontend can reach backend API URL (same-origin proxy or CORS-allowed host)
rebuild frontend when API payload contracts change

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
data		data
frontend		frontend
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ibx.json		ibx.json
requirements.txt		requirements.txt
roadmap.txt		roadmap.txt
setup.md		setup.md

Folders and files

Latest commit

History

Repository files navigation

LineLab

0. 5-Minute Quickstart

1. Tech Stack

2. Repository Structure

3. Prerequisites

4. First-Time Setup

5. Running the App Locally

6. Environment Variables

7. Data Pull and Processing

7.1 Minimal Runtime Data (if you need to rebuild artifacts)

7.2 Subway GTFS + ridership pulls via Makefile

7.3 Build GTFS time graph used by routing

7.4 Parse subway hourly ridership into model-ready files

7.5 Pull monthly ridership and monthly operating cost labels

7.6 Optional: fetch census + OSM features and station feature table

7.7 Optional: bus data pipeline

8. Model Training

8.1 Ridership + peak-factor models

8.2 Monthly operating cost model

8.3 Time model for proposed-corridor travel time

9. Testing and Validation

9.1 Backend syntax check

9.2 Frontend build check

9.3 Frontend lint

9.4 API smoke test

10. Dev Workflow

11. Notes About Make Targets

12. Troubleshooting

13. License

14. Deployment / Production Run

14.1 Backend (FastAPI)

14.2 Frontend (Vite build)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages