Skip to content

Tandonites/LineLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LineLab

LineLab is an NYC transit planning sandbox for drafting proposed subway corridors, simulating demand impact, and comparing proposed travel time/cost against current service.

It includes:

  • FastAPI backend for simulation and ML-backed inference
  • React + TypeScript + Vite frontend with an interactive map
  • Data ingestion/parsing scripts for subway, bus, and census inputs
  • Model training scripts for ridership, operating cost, and route travel time

0. 5-Minute Quickstart

If you only want to run the app quickly (without rebuilding all data/models):

# 1) install dependencies
make setup

# 2) run backend (terminal A)
make backend

# 3) run frontend (terminal B)
make frontend

Open:

  • frontend: http://localhost:5173
  • backend health: http://localhost:8000/health

Quick sanity checks:

# backend syntax
source .venv/bin/activate && python -m py_compile src/api.py

# frontend build
cd frontend && npm run build

When to skip this and use the full guide:

  • if you need fresh pulls from NY Open Data
  • if you changed training code or want to retrain models
  • if you need to regenerate data/processed artifacts

1. Tech Stack

  • Python backend: FastAPI, NetworkX, XGBoost, pandas/geopandas
  • Frontend: React 19, TypeScript, Vite, Leaflet
  • Data sources: GTFS, NY Open Data (Socrata), Census API, OSM

2. Repository Structure

  • src: backend API + path utilities
  • frontend: web app
  • data: data pull/parsing/training scripts and artifacts
  • data/raw: raw ingested datasets
  • data/processed: cleaned/engineered tables and graphs
  • data/models: trained model artifacts used at runtime
  • Makefile: common setup/run helpers

3. Prerequisites

  • Python 3.11+
  • Node.js 18+
  • npm
  • make
  • curl + unzip

Linux note:

  • XGBoost needs OpenMP runtime (typically available via libgomp on Linux).

macOS note:

  • install libomp (for XGBoost): brew install libomp

4. First-Time Setup

From repo root:

make setup

This creates:

Optional: activate the venv manually for direct script runs:

source .venv/bin/activate

5. Running the App Locally

Use two terminals.

Terminal 1 (backend):

make backend

Terminal 2 (frontend):

make frontend

Endpoints:

  • Backend health: http://localhost:8000/health
  • Frontend: http://localhost:5173

6. Environment Variables

A template .env is generated by make setup.

Common variables:

  • NYC_OPEN_DATA_APP_TOKEN= optional Socrata token for higher limits
  • SOCRATA_APP_TOKEN= optional token for pull scripts
  • MTA_BUS_RESOURCE_ID= optional default bus dataset id
  • CENSUS_API_KEY= required only for census fetch in data/fetch_features.py

7. Data Pull and Processing

7.1 Minimal Runtime Data (if you need to rebuild artifacts)

If your repo already contains populated data/processed and data/models, you can skip this section.

7.2 Subway GTFS + ridership pulls via Makefile

make data

This downloads into data/raw:

  • gtfs_subway.zip + extracted gtfs_subway/
  • mta_ridership.csv

7.3 Build GTFS time graph used by routing

From repo root:

source .venv/bin/activate
python src/build_graph.py

Output:

7.4 Parse subway hourly ridership into model-ready files

source .venv/bin/activate
python data/mta_parser.py \
  --input data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
  --output-dir data/processed

Primary outputs:

7.5 Pull monthly ridership and monthly operating cost labels

source .venv/bin/activate
python data/ridership_pull.py --app-token "$SOCRATA_APP_TOKEN"
python data/cost_pull.py --app-token "$SOCRATA_APP_TOKEN"

Outputs:

7.6 Optional: fetch census + OSM features and station feature table

source .venv/bin/activate
python data/fetch_features.py --census-key "$CENSUS_API_KEY"

Output used by model training/inference:

7.7 Optional: bus data pipeline

Pull:

source .venv/bin/activate
python data/fetch_bus_data.py --resource-id "$MTA_BUS_RESOURCE_ID" --app-token "$SOCRATA_APP_TOKEN"

Parse:

source .venv/bin/activate
python data/bus_parser.py \
  --input data/raw/MTA_Bus_Hourly_Ridership.csv \
  --output-dir data/processed

8. Model Training

8.1 Ridership + peak-factor models

source .venv/bin/activate
python data/train_model.py \
  --features data/raw/station_features.csv \
  --hourly-csv data/raw/MTA_Subway_Hourly_Ridership__2020-2024_20260426.csv \
  --out-dir data/models

Artifacts:

8.2 Monthly operating cost model

source .venv/bin/activate
python data/price_train.py \
  --labels-csv data/raw/monthly_operating_cost.csv \
  --ridership-monthly-csv data/processed/ridership_monthly.csv \
  --line-summary-csv data/processed/line_summary.csv \
  --out-dir data/models

Artifacts:

8.3 Time model for proposed-corridor travel time

Create training set from GTFS graph:

source .venv/bin/activate
python data/timegraph_parser.py

Train and save model:

source .venv/bin/activate
python -c "from data.time_predict import train_model, save_model; m=train_model(); save_model(m)"

Artifacts:

9. Testing and Validation

There is no formal pytest suite currently. Use the following checks.

9.1 Backend syntax check

source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.py

9.2 Frontend build check

cd frontend
npm run build

9.3 Frontend lint

cd frontend
npm run lint

9.4 API smoke test

Start backend, then run:

curl -X POST http://localhost:8000/api/simulate \
  -H "Content-Type: application/json" \
  -d '{
    "train_service": "local",
    "stations": [
      {"id":"127","name":"Times Sq-42 St","lat":40.75529,"lon":-73.987495,"is_new":false},
      {"id":"124","name":"34 St-Penn Station","lat":40.750373,"lon":-73.991057,"is_new":false}
    ]
  }'

You should receive JSON with fields such as:

  • new_line_ridership
  • operational_cost_monthly
  • route_comparison.is_walking_only

10. Dev Workflow

Typical local loop:

# one-time
make setup

# each session
make backend
make frontend

# when changing backend logic
source .venv/bin/activate && python -m py_compile src/api.py

# when changing frontend
cd frontend && npm run build

11. Notes About Make Targets

Current Makefile includes preprocess and train targets that call src/preprocess.py and src/train.py, but those files are not present in this repository right now.

For now, use the explicit script commands documented above in sections 7 and 8.

12. Troubleshooting

  • make backend exits with code 130: usually a manual Ctrl+C stop, not a crash.
  • libomp errors on macOS: run brew install libomp.
  • Missing model artifacts at runtime: ensure files exist under data/models.
  • Missing graph or station data: regenerate data/processed/mta_time_graph.json and data/processed/stations.json.
  • Socrata requests fail/rate-limit: set SOCRATA_APP_TOKEN or NYC_OPEN_DATA_APP_TOKEN.

13. License

See LICENSE.

14. Deployment / Production Run

This project does not yet include container/IaC deployment files, so use this section as the baseline production runbook.

14.1 Backend (FastAPI)

Use an environment-specific host/port and run behind a reverse proxy.

source .venv/bin/activate
uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 2

Recommended production setup:

  • reverse proxy with TLS (for example Nginx/Caddy)
  • process supervisor (systemd, supervisord, or equivalent)
  • separate runtime user (non-root)
  • pre-provisioned artifacts under data/models and data/processed

Minimal backend preflight before deploy:

source .venv/bin/activate
python -m py_compile src/api.py data/time_predict.py src/predict.py
curl http://127.0.0.1:8000/health

14.2 Frontend (Vite build)

Build static assets:

cd frontend
npm ci
npm run build

Then serve frontend/dist from static hosting or a reverse proxy.

Important:

  • ensure frontend can reach backend API URL (same-origin proxy or CORS-allowed host)
  • rebuild frontend when API payload contracts change

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors