Skip to content

Commit b9caaee

Browse files
feat: AI policy analyst demo with Modal Sandbox (#24)
* feat: API hierarchy design with consistent naming Establishes three-level API architecture: - Level 0: Simulations (single-world snapshots) - Level 1: Analyses (operations on simulations) - Level 2: Reports (AI-generated documents, future) Renames Modal functions to match hierarchy: - calculate_household_* → simulate_household_* - run_report_* → economy_comparison_* Adds docs/DESIGN.md with full endpoint mapping to policyengine package functions. * feat: Add AI policy analyst demo with Modal Sandbox Adds an interactive demo where users can ask natural language questions about tax/benefit policy and get AI-generated analysis reports. Components: - demo_modal.py: Modal function running Claude agent with PE API tools - demo_agent.py: Standalone agent script for local testing - api/demo.py: FastAPI endpoint (/demo/ask, /demo/status) - policy-chat.tsx: React chat component for docs - docs/demo/page.tsx: Demo page in docs site The agent uses Claude Sonnet with tool calling to: 1. Search for relevant parameters 2. Create policy reforms 3. Run economic impact analyses 4. Generate markdown reports Requires Modal secret 'anthropic-api-key' to be configured. * ci: Deploy Modal functions on merge to main * feat: Add env configuration for demo agent * feat: use Claude Code CLI in Modal sandbox with streaming SSE - Replace custom tool-calling agent with actual Claude Code CLI - Run in Modal sandbox with bun for package management - Stream output via SSE to frontend chat UI - Configure MCP server for PolicyEngine API access - Add pytest-asyncio and 10 tests for Claude invocation * fix: default demo_use_modal to True (container lacks Claude CLI) * feat: add Claude Code CLI to Docker image * refactor: copy bun from docs stage instead of apt-get * fix: symlink bun as node for Claude Code CLI * test: add integration test that actually runs Claude CLI * fix: use -p flag for claude prompt to avoid arg parsing issues * fix: pass ANTHROPIC_API_KEY to docker container * feat: connect MCP server via SSE transport and improve frontend parsing - Switch from mount_http to mount_sse for Claude Code compatibility - Mount SSE endpoint at /mcp for better UX - Add POLICYENGINE_API_URL to docker-compose for local testing - Update policy-chat.tsx to parse stream-json format with: - MCP connection status indicator - Live tool call tracking with expandable details - Proper assistant message rendering - Final result display * feat: add local compute for household calculations When DEMO_USE_MODAL=false, run PolicyEngine calculations locally instead of spawning Modal functions. Enables full local development without Modal deployment. * feat: improve chat UI with code font, typing animation, and markdown - Add font-mono for code-style text rendering - Add typing animation with green cursor - Add react-markdown for proper markdown rendering in responses * fix: add line breaks to chat output with remark-breaks * feat: add local compute for UK economy comparison
1 parent 52ee4bc commit b9caaee

27 files changed

Lines changed: 2336 additions & 81 deletions

.env.example

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,8 @@ LOGFIRE_ENVIRONMENT=local
2424
# Get tokens from modal.com dashboard or via `modal token new`
2525
MODAL_TOKEN_ID=ak-...
2626
MODAL_TOKEN_SECRET=as-...
27+
28+
# Demo agent
29+
ANTHROPIC_API_KEY=sk-ant-...
30+
DEMO_USE_MODAL=false
31+
POLICYENGINE_API_URL=http://localhost:8000

.github/workflows/deploy.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,4 +117,6 @@ jobs:
117117
env:
118118
MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}
119119
MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}
120-
run: uv run modal deploy src/policyengine_api/modal_app.py
120+
run: |
121+
uv run modal deploy src/policyengine_api/modal_app.py
122+
uv run modal deploy src/policyengine_api/demo_sandbox.py

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ htmlcov/
3232
# Environment
3333
.env
3434
.env.local
35+
.env.prod
36+
docs/.env
37+
docs/.env.local
3538

3639
# IDE
3740
.vscode/
@@ -49,4 +52,3 @@ htmlcov/
4952
data/
5053
*.h5
5154
*.db
52-
.env.prod

Dockerfile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
1111

1212
WORKDIR /app
1313

14+
# Install bun and Claude Code CLI (symlink bun as node for CLI compatibility)
15+
COPY --from=docs-builder /usr/local/bin/bun /usr/local/bin/bun
16+
RUN ln -s /usr/local/bin/bun /usr/local/bin/node && \
17+
bun install -g @anthropic-ai/claude-code
18+
ENV PATH="/root/.bun/bin:$PATH"
19+
1420
# Copy project files
1521
COPY pyproject.toml README.md ./
1622
COPY src/ ./src/

docker-compose.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ services:
1212
LOGFIRE_TOKEN: ${LOGFIRE_TOKEN}
1313
DEBUG: "false"
1414
API_PORT: ${API_PORT:-8000}
15+
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
16+
POLICYENGINE_API_URL: http://localhost:${API_PORT:-8000}
17+
DEMO_USE_MODAL: "false"
1518
volumes:
1619
- ./src:/app/src
1720
- ./docs/out:/app/docs/out

docs/.env.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Docs site environment variables
2+
NEXT_PUBLIC_API_URL=http://localhost:8000

docs/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ yarn-error.log*
3232

3333
# env files (can opt-in for committing if needed)
3434
.env*
35+
!.env.example
3536

3637
# vercel
3738
.vercel

docs/DESIGN.md

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
# API hierarchy design
2+
3+
## Levels of analysis
4+
5+
```
6+
┌─────────────────────────────────────────────────────────────┐
7+
│ Level 2: Reports │
8+
│ AI-generated documents, orchestrating multiple jobs │
9+
│ /reports/* │
10+
├─────────────────────────────────────────────────────────────┤
11+
│ Level 1: Analyses │
12+
│ Operations on simulations - thin wrappers around │
13+
│ policyengine package functions │
14+
│ │
15+
│ Common (baked-in, trivial to call): │
16+
│ /analysis/decile-impact/* │
17+
│ /analysis/budget-impact/* │
18+
│ /analysis/winners-losers/* │
19+
│ │
20+
│ Flexible (configurable): │
21+
│ /analysis/compare/* │
22+
├─────────────────────────────────────────────────────────────┤
23+
│ Level 0: Simulations │
24+
│ Single world-state calculations │
25+
│ /simulate/household │
26+
│ /simulate/economy │
27+
└─────────────────────────────────────────────────────────────┘
28+
```
29+
30+
All operations are **async** (Modal compute). The API is a thin orchestration layer - all analysis logic lives in the `policyengine` package.
31+
32+
## Mapping to policyengine package
33+
34+
| API endpoint | policyengine function |
35+
|--------------|----------------------|
36+
| `/simulate/household` | `calculate_household_impact()` |
37+
| `/simulate/economy` | `Simulation.run()` |
38+
| `/analysis/decile-impact/*` | `calculate_decile_impacts()` |
39+
| `/analysis/budget-impact/*` | `ProgrammeStatistics` |
40+
| `/analysis/winners-losers/*` | `ChangeAggregate` with filters |
41+
| `/analysis/compare/*` | `economic_impact_analysis()` or custom |
42+
43+
## Level 0: Simulations
44+
45+
### `/simulate/household`
46+
47+
Single household calculation. Wraps `policyengine.tax_benefit_models.uk.analysis.calculate_household_impact()`.
48+
49+
```
50+
POST /simulate/household
51+
{
52+
"model": "policyengine_uk",
53+
"household": {
54+
"people": [{"age": 30, "employment_income": 50000}],
55+
"benunit": {},
56+
"household": {}
57+
},
58+
"year": 2026,
59+
"policy_id": null
60+
}
61+
62+
→ Returns job_id, poll for results
63+
```
64+
65+
### `/simulate/economy`
66+
67+
Population simulation. Creates and runs a `policyengine.core.Simulation`.
68+
69+
```
70+
POST /simulate/economy
71+
{
72+
"model": "policyengine_uk",
73+
"dataset_id": "...",
74+
"policy_id": null,
75+
"dynamic_id": null
76+
}
77+
78+
→ Returns simulation_id, poll for results
79+
```
80+
81+
Economy simulations are **deterministic and cached** by (dataset_id, model_version, policy_id, dynamic_id).
82+
83+
## Level 1: Analyses - Common (baked-in)
84+
85+
These are the bread-and-butter analyses. Trivial to call, no configuration needed.
86+
87+
### `/analysis/decile-impact/economy`
88+
89+
Income decile breakdown. Wraps `calculate_decile_impacts()`.
90+
91+
```
92+
POST /analysis/decile-impact/economy
93+
{
94+
"model": "policyengine_uk",
95+
"dataset_id": "...",
96+
"baseline_policy_id": null,
97+
"reform_policy_id": "..."
98+
}
99+
100+
→ Returns job_id
101+
102+
GET /analysis/decile-impact/economy/{job_id}
103+
→ Returns:
104+
{
105+
"status": "completed",
106+
"deciles": [
107+
{"decile": 1, "baseline_mean": 15000, "reform_mean": 15500, "change": 500, "pct_change": 3.3, ...},
108+
{"decile": 2, ...},
109+
...
110+
{"decile": 10, ...}
111+
]
112+
}
113+
```
114+
115+
### `/analysis/budget-impact/economy`
116+
117+
Tax and benefit programme totals. Wraps `ProgrammeStatistics`.
118+
119+
```
120+
POST /analysis/budget-impact/economy
121+
{
122+
"model": "policyengine_uk",
123+
"dataset_id": "...",
124+
"baseline_policy_id": null,
125+
"reform_policy_id": "..."
126+
}
127+
128+
→ Returns job_id
129+
130+
GET /analysis/budget-impact/economy/{job_id}
131+
→ Returns:
132+
{
133+
"status": "completed",
134+
"net_budget_impact": -20000000000,
135+
"programmes": [
136+
{"name": "income_tax", "is_tax": true, "baseline_total": 200e9, "reform_total": 180e9, "change": -20e9},
137+
{"name": "universal_credit", "is_tax": false, "baseline_total": 50e9, "reform_total": 52e9, "change": 2e9},
138+
...
139+
]
140+
}
141+
```
142+
143+
### `/analysis/winners-losers/economy`
144+
145+
Who gains and loses. Wraps `ChangeAggregate` with change filters.
146+
147+
```
148+
POST /analysis/winners-losers/economy
149+
{
150+
"model": "policyengine_uk",
151+
"dataset_id": "...",
152+
"baseline_policy_id": null,
153+
"reform_policy_id": "...",
154+
"threshold": 0 // Change threshold (default: any change)
155+
}
156+
157+
→ Returns job_id
158+
159+
GET /analysis/winners-losers/economy/{job_id}
160+
→ Returns:
161+
{
162+
"status": "completed",
163+
"winners": {"count": 15000000, "mean_gain": 500},
164+
"losers": {"count": 5000000, "mean_loss": -200},
165+
"unchanged": {"count": 30000000}
166+
}
167+
```
168+
169+
### `/analysis/decile-impact/household`
170+
171+
Compare household across scenarios by artificial decile assignment.
172+
173+
```
174+
POST /analysis/decile-impact/household
175+
{
176+
"model": "policyengine_uk",
177+
"household": {"people": [{"employment_income": 50000}]},
178+
"year": 2026,
179+
"baseline_policy_id": null,
180+
"reform_policy_id": "..."
181+
}
182+
183+
→ Returns which decile this household falls into and their change
184+
```
185+
186+
## Level 1: Analyses - Flexible
187+
188+
### `/analysis/compare/economy`
189+
190+
Full comparison with all outputs. Wraps `economic_impact_analysis()`.
191+
192+
```
193+
POST /analysis/compare/economy
194+
{
195+
"model": "policyengine_uk",
196+
"dataset_id": "...",
197+
"scenarios": [
198+
{"label": "baseline"},
199+
{"label": "reform", "policy_id": "..."},
200+
{"label": "reform_dynamic", "policy_id": "...", "dynamic_id": "..."}
201+
]
202+
}
203+
204+
→ Returns job_id
205+
206+
GET /analysis/compare/economy/{job_id}
207+
→ Returns:
208+
{
209+
"status": "completed",
210+
"scenarios": {...simulation results...},
211+
"comparisons": {
212+
"reform": {
213+
"relative_to": "baseline",
214+
"decile_impacts": [...],
215+
"budget_impact": {...},
216+
"winners_losers": {...}
217+
},
218+
"reform_dynamic": {...}
219+
}
220+
}
221+
```
222+
223+
### `/analysis/compare/household`
224+
225+
Compare multiple scenarios for a household.
226+
227+
```
228+
POST /analysis/compare/household
229+
{
230+
"model": "policyengine_uk",
231+
"household": {...},
232+
"year": 2026,
233+
"scenarios": [
234+
{"label": "baseline"},
235+
{"label": "reform", "policy_id": "..."}
236+
]
237+
}
238+
239+
→ Returns all scenario results + computed differences
240+
```
241+
242+
### `/analysis/aggregate/economy` (power user)
243+
244+
Custom aggregation with full filter control. Directly exposes `Aggregate` / `ChangeAggregate`.
245+
246+
```
247+
POST /analysis/aggregate/economy
248+
{
249+
"model": "policyengine_uk",
250+
"dataset_id": "...",
251+
"simulation_id": "...", // or policy_id to create
252+
"variable": "household_net_income",
253+
"aggregate_type": "mean",
254+
"entity": "household",
255+
"filters": {
256+
"quantile": {"variable": "household_net_income", "n": 10, "eq": 1}
257+
}
258+
}
259+
260+
→ Returns single aggregate value
261+
```
262+
263+
## Adding new analysis types
264+
265+
To add a new common analysis (e.g. marginal tax rates):
266+
267+
1. **policyengine package**: Add `MarginalTaxRate` output class and `calculate_marginal_rates()` function
268+
2. **API**: Add `/analysis/marginal-rates/*` endpoint that wraps the function
269+
3. **Modal**: Add function to run it
270+
271+
The API endpoint is ~20 lines - just parameter parsing and calling the policyengine function.
272+
273+
## URL structure summary
274+
275+
```
276+
# Level 0: Simulations
277+
POST /simulate/household
278+
GET /simulate/household/{job_id}
279+
POST /simulate/economy
280+
GET /simulate/economy/{simulation_id}
281+
282+
# Level 1: Common analyses (baked-in, trivial)
283+
POST /analysis/decile-impact/economy
284+
GET /analysis/decile-impact/economy/{job_id}
285+
POST /analysis/budget-impact/economy
286+
GET /analysis/budget-impact/economy/{job_id}
287+
POST /analysis/winners-losers/economy
288+
GET /analysis/winners-losers/economy/{job_id}
289+
290+
# Level 1: Flexible analyses
291+
POST /analysis/compare/economy
292+
GET /analysis/compare/economy/{job_id}
293+
POST /analysis/compare/household
294+
GET /analysis/compare/household/{job_id}
295+
POST /analysis/aggregate/economy
296+
GET /analysis/aggregate/economy/{job_id}
297+
298+
# Level 2: Reports (future)
299+
POST /reports/policy-impact
300+
GET /reports/policy-impact/{report_id}
301+
```
302+
303+
## Use cases
304+
305+
| Use case | Endpoint |
306+
|----------|----------|
307+
| My tax under current law | `/simulate/household` |
308+
| Reform impact on my household | `/analysis/compare/household` with 2 scenarios |
309+
| Revenue impact of reform | `/analysis/budget-impact/economy` |
310+
| Decile breakdown of reform | `/analysis/decile-impact/economy` |
311+
| Who wins and loses | `/analysis/winners-losers/economy` |
312+
| Full reform analysis | `/analysis/compare/economy` |
313+
| Compare 3 reform proposals | `/analysis/compare/economy` with 4 scenarios |
314+
| Static vs dynamic comparison | `/analysis/compare/economy` with 3 scenarios |
315+
| Custom aggregation | `/analysis/aggregate/economy` |
316+
317+
## Migration
318+
319+
Deprecate existing endpoints:
320+
- `/household/calculate``/simulate/household`
321+
- `/household/impact``/analysis/compare/household`
322+
- `/analysis/economic-impact``/analysis/compare/economy`
323+
324+
## Implementation notes
325+
326+
1. All Modal functions import from `policyengine` package
327+
2. API endpoints do minimal work: parse request, call Modal, store results
328+
3. New analysis types require:
329+
- Add to policyengine package (logic)
330+
- Add API endpoint (orchestration)
331+
- Add Modal function (compute)

0 commit comments

Comments
 (0)