Skip to content

Commit 9dac5cd

Browse files
Refresh README for current platform state
1 parent a26c068 commit 9dac5cd

1 file changed

Lines changed: 137 additions & 99 deletions

File tree

README.md

Lines changed: 137 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,61 @@
11
# worldmodel-gym
22

3-
WorldModel Gym is a production-ready benchmark platform for long-horizon planning agents. It combines reproducible environments, planner/world-model baselines, a FastAPI submission service, and a Next.js leaderboard dashboard into one deployable monorepo.
3+
WorldModel Gym is an end-to-end benchmark platform for long-horizon planning agents. It combines reproducible benchmark environments, planner and world-model baselines, a FastAPI submission service, and a polished Next.js leaderboard into one deployable monorepo.
44

5-
## Screenshots
5+
## Why This Repo Stands Out
66

7-
![Homepage dashboard](docs/images/homepage.png)
7+
- Reproducible benchmark tasks designed around sparse rewards, partial observability, and procedural generalization
8+
- Modular research stack spanning environments, agents, planners, and world models
9+
- Production-minded backend with Alembic migrations, scoped API keys, rate limiting, readiness checks, structured logging, and Prometheus metrics
10+
- Modern frontend with a custom editorial product UI, same-origin proxying, SEO metadata, and Playwright smoke coverage
11+
- Full-stack delivery workflow with GitHub Actions, Render deployment support, and Vercel deployment support
812

9-
![Leaderboard dashboard](docs/images/leaderboard.png)
13+
## Stack
1014

11-
## What makes this repo strong
12-
13-
- Reproducible benchmark tasks for sparse rewards, partial observability, and procedural generalization
14-
- FastAPI backend with Alembic migrations, scoped API keys, rate limiting, readiness checks, and structured logging
15-
- Pluggable artifact storage with local and S3-compatible backends
16-
- Next.js dashboard with proxy-based API access, seeded demo data support, metadata/SEO, and Playwright smoke tests
17-
- CI coverage for Ruff, pytest, Next.js production builds, and browser E2E verification
18-
19-
## Quickstart
20-
21-
```bash
22-
make setup
23-
make demo
24-
```
25-
26-
Local development uses built-in defaults. If you need overrides, export environment variables in your shell instead of committing env files to the repo.
27-
28-
`make demo` will:
29-
30-
- start the API + web stack with Docker when available
31-
- fall back to local API execution when Docker is unavailable
32-
- create a benchmark run
33-
- upload artifacts through the API
34-
- populate the leaderboard
35-
36-
Open:
37-
38-
- [http://localhost:3000](http://localhost:3000)
39-
- [http://localhost:8000/docs](http://localhost:8000/docs)
15+
- `core/`: benchmark environments, traces, and evaluation harness
16+
- `agents/`: baseline agents and agent registry
17+
- `planners/`: planning algorithms such as MCTS and MPC-CEM
18+
- `worldmodels/`: deterministic, stochastic, and ensemble world model baselines
19+
- `server/`: FastAPI API, auth, migrations, storage, CLI, and demo-data seeding
20+
- `web/`: Next.js App Router dashboard and API proxy
21+
- `mobile/`: Expo-based mobile viewer
22+
- `paper/`: manuscript sources and generated PDF artifacts
4023

4124
## Architecture
4225

4326
```mermaid
4427
flowchart LR
45-
subgraph Benchmark
46-
E["Environments"]
47-
A["Agents"]
48-
P["Planners"]
49-
W["World Models"]
50-
H["Evaluation Harness"]
28+
subgraph Research["Research Layer"]
29+
ENV["Benchmark Environments"]
30+
AGENT["Agents"]
31+
PLAN["Planners"]
32+
MODEL["World Models"]
33+
HARNESS["Evaluation Harness"]
5134
end
5235
53-
subgraph Platform
54-
API["FastAPI API"]
55-
DB[("Postgres / SQLite")]
56-
STORE[("Local or S3 Artifacts")]
57-
WEB["Next.js Dashboard"]
58-
PROXY["Next.js API Proxy"]
59-
MOB["Expo Mobile"]
36+
subgraph Platform["Platform Layer"]
37+
API["FastAPI API"]
38+
DB[("Postgres / SQLite")]
39+
STORE[("Local / S3 Artifact Store")]
40+
WEB["Next.js Dashboard"]
41+
PROXY["Next.js API Proxy"]
42+
MOBILE["Expo Viewer"]
6043
end
6144
62-
A --> E
63-
A --> P
64-
A --> W
65-
P --> W
66-
H --> A
67-
H --> E
68-
H --> API
45+
AGENT --> ENV
46+
AGENT --> PLAN
47+
AGENT --> MODEL
48+
PLAN --> MODEL
49+
HARNESS --> AGENT
50+
HARNESS --> ENV
51+
HARNESS --> API
6952
API --> DB
7053
API --> STORE
7154
WEB --> PROXY
7255
PROXY --> API
73-
MOB --> API
56+
MOBILE --> API
7457
```
7558

76-
## Production Features
77-
78-
- Alembic migrations replace implicit `create_all()` table creation
79-
- API writes are protected by scoped API keys with a legacy upload-token compatibility path
80-
- Public API traffic is rate limited and logged with request IDs
81-
- Prometheus metrics are exposed from the server at `/metrics`
82-
- Browser clients use the Next.js proxy route instead of direct cross-origin calls to the API
83-
- Demo leaderboard data can be seeded with `WMG_SEED_DEMO_DATA=true`
84-
8559
## Deployment Topology
8660

8761
```mermaid
@@ -90,8 +64,8 @@ flowchart LR
9064
CI["GitHub Actions"]
9165
V["Vercel Web"]
9266
R["Render API"]
93-
PG[("Render Postgres")]
94-
S3[("S3-Compatible Bucket")]
67+
PG[("Managed Postgres")]
68+
S3[("S3-Compatible Storage")]
9569
9670
GH --> CI
9771
GH --> V
@@ -101,9 +75,73 @@ flowchart LR
10175
V --> R
10276
```
10377

104-
Full deployment instructions live in [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md).
78+
The default production shape is:
79+
80+
- FastAPI API on Render
81+
- Next.js dashboard on Vercel
82+
- managed Postgres for run metadata
83+
- local or S3-compatible storage for trace and metrics artifacts
84+
- browser requests routed through the Next.js proxy instead of direct cross-origin API calls
85+
86+
## Core Product Capabilities
87+
88+
- create benchmark runs through the API
89+
- upload metrics, traces, and config artifacts
90+
- inspect public leaderboard data by track
91+
- browse tasks and benchmark context from the web dashboard
92+
- verify public deployment health with readiness, liveness, and smoke checks
93+
- seed demo leaderboard data for first-run and demo environments
94+
95+
## Local Quickstart
96+
97+
```bash
98+
make setup
99+
make demo
100+
```
101+
102+
`make demo` will:
103+
104+
- start the API and web stack with Docker when available
105+
- fall back to a local API process if Docker is unavailable
106+
- create a benchmark run
107+
- upload artifacts through the API
108+
- populate the leaderboard flow end to end
109+
110+
Open:
111+
112+
- [http://localhost:3000](http://localhost:3000)
113+
- [http://localhost:8000/docs](http://localhost:8000/docs)
114+
115+
Local development uses built-in defaults. If you need overrides, export environment variables in your shell or configure them in your deployment provider. Do not commit env files to the repository.
116+
117+
## Developer Commands
118+
119+
```bash
120+
make lint
121+
make test
122+
make demo
123+
make seed-demo
124+
make create-api-key NAME=local-writer SCOPE=runs:write
125+
make verify-deployment
126+
make deploy
127+
make stop
128+
make deploy-public
129+
make stop-public
130+
make deploy-vercel
131+
```
132+
133+
## Production Features
134+
135+
- Alembic migrations replace implicit schema creation
136+
- scoped API keys support `runs:write` and admin-style access control
137+
- legacy upload-token support exists only as a compatibility path and can be disabled
138+
- authenticated writes and public reads are rate limited separately
139+
- structured logs include request IDs, durations, and startup/readiness events
140+
- `/healthz`, `/readyz`, and `/metrics` expose runtime health and monitoring hooks
141+
- the frontend uses a same-origin proxy route for safer browser-to-API access
142+
- demo data seeding and demo-run upload tooling are built into the repo
105143

106-
## Auth and Operations
144+
## Auth, Data, and Operations
107145

108146
Create a scoped API key:
109147

@@ -119,57 +157,57 @@ Seed demo data:
119157
.venv/bin/python -m worldmodel_server.cli seed-demo-data --force
120158
```
121159

122-
Upload a demo run against a live or local API:
160+
Upload a demo run against a local or hosted API:
123161

124162
```bash
125-
.venv/bin/python scripts/demo_run.py --api-base http://localhost:8000
163+
.venv/bin/python scripts/demo_run.py \
164+
--api-base http://localhost:8000
126165
```
127166

128-
Verify the public deployment end to end:
167+
Verify the full public deployment:
129168

130169
```bash
131170
.venv/bin/python scripts/verify_deployment.py \
132171
--api-base https://worldmodel-gym-api.onrender.com \
133172
--web-base https://world-model-gym.vercel.app
134173
```
135174

136-
Operational runbook:
175+
Useful runtime endpoints:
137176

138-
- [docs/OPERATIONS.md](docs/OPERATIONS.md)
139-
- [SECURITY.md](SECURITY.md)
177+
- API liveness: `/healthz`
178+
- API readiness: `/readyz`
179+
- API metrics: `/metrics`
180+
- web smoke path: `/api/proxy/api/leaderboard?track=test`
140181

141-
## Monorepo Layout
182+
## Deployment Notes
142183

143-
- `core/`: environments, traces, evaluation harness
144-
- `planners/`: MCTS, MPC-CEM, and trajectory-sampling planners
145-
- `worldmodels/`: deterministic, stochastic, and ensemble world models
146-
- `agents/`: baseline agents and registry
147-
- `server/`: FastAPI API, auth, migrations, storage, and seeding
148-
- `web/`: Next.js dashboard and proxy routes
149-
- `mobile/`: Expo mobile viewer
150-
- `paper/`: manuscript sources and generated PDF
184+
- Deploy the API from [render.yaml](render.yaml)
185+
- Deploy the web app from the `web/` root directory in Vercel
186+
- Store production secrets in Render and Vercel, not in repo files
187+
- Switch artifact storage to S3-compatible storage for durable production uploads
188+
- Remove `WMG_BOOTSTRAP_API_KEY` after the first durable writer key is created
151189

152-
## Developer Commands
190+
Full deployment and operations references:
153191

154-
```bash
155-
make lint
156-
make test
157-
make demo
158-
make seed-demo
159-
make create-api-key NAME=local-writer SCOPE=runs:write
160-
make verify-deployment
161-
make deploy
162-
make stop
163-
make deploy-public
164-
make stop-public
165-
make deploy-vercel
166-
```
192+
- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)
193+
- [docs/OPERATIONS.md](docs/OPERATIONS.md)
194+
- [SECURITY.md](SECURITY.md)
195+
- [ROADMAP.md](ROADMAP.md)
196+
197+
## Quality Gates
198+
199+
- Ruff lint and formatting checks
200+
- Pytest coverage for backend behavior
201+
- Next.js production build verification
202+
- Playwright smoke tests for the web flow
203+
- scheduled production smoke checks against public deployment surfaces
167204

168205
## Resume-Friendly Highlights
169206

170-
- Shipped an end-to-end benchmark platform spanning environments, planners, model baselines, backend APIs, and frontend dashboards
171-
- Hardened the service with migrations, auth, rate limiting, structured logging, and cloud deployment support
172-
- Added automated browser verification and production smoke checks on top of standard lint/test/build CI
207+
- Built and shipped a full-stack benchmark product spanning environments, planners, model baselines, backend APIs, and frontend dashboards
208+
- Hardened the backend with migrations, auth, rate limiting, structured logging, and cloud deployment support
209+
- Added deployment verification, browser E2E coverage, and production smoke automation on top of standard CI
210+
- Designed a custom frontend UI system rather than relying on a boilerplate template
173211

174212
## License
175213

0 commit comments

Comments
 (0)