Skip to content

Commit ff52628

Browse files
chore: add HUGGING_FACE_TOKEN to env example
1 parent 3483459 commit ff52628

4 files changed

Lines changed: 266 additions & 19 deletions

File tree

.env.example

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@ DEBUG=true
4242
LOGFIRE_TOKEN=
4343
LOGFIRE_ENVIRONMENT=local
4444

45+
# =============================================================================
46+
# HUGGING FACE (for dataset downloads)
47+
# =============================================================================
48+
# Get token from https://huggingface.co/settings/tokens
49+
HUGGING_FACE_TOKEN=hf_...
50+
4551
# =============================================================================
4652
# AGENT (Claude Code)
4753
# =============================================================================

docs/AGENT_TESTING.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,12 +146,35 @@ Tests are in `tests/test_agent_policy_questions.py` (integration tests requiring
146146
| Seed script | Deduplicate parameters by name |
147147
| System prompt | Fixed model names (hyphen not underscore) |
148148

149+
### Issue 8: Agent not using country filter in economy-wide analysis
150+
151+
**Problem**: When answering economic impact questions, agent didn't use `tax_benefit_model_name` filter despite it being in the system prompt. This led to 18 turns for UK budgetary impact question (12 turns just searching for parameter).
152+
153+
**Root cause**: System prompt mentioned the filter but didn't emphasize it enough; economic impact workflow didn't show the filter in example.
154+
155+
**Solution implemented**: Restructured system prompt with:
156+
- **CRITICAL** section at the top emphasizing country filter
157+
- Explanation of why filter is needed (mixed results waste turns)
158+
- Added filter to all workflow examples including economic impact
159+
160+
**Result**: UK budgetary impact question now completes in **6 turns** (down from 18).
161+
162+
### Issue 9: Key US parameters missing from database
163+
164+
**Problem**: Core CTC parameters like `gov.irs.credits.ctc.amount.base[0].amount` have `label=None` in policyengine-us package, so they're not seeded (seed script only includes parameters with labels).
165+
166+
**Impact**: Agent can't find the main CTC amount parameter to double it. Had to use `refundable.individual_max` as a proxy.
167+
168+
**Solution needed**: Add labels to core parameters in policyengine-us package (upstream fix).
169+
149170
## Measurements
150171

151172
| Question type | Baseline | After improvements | Target |
152173
|---------------|----------|-------------------|--------|
153174
| Parameter lookup (UK personal allowance) | 10 turns | **3 turns** | 3-4 |
154175
| Household calculation (UK £50k income) | 6 turns | - | 5-6 |
176+
| Economy-wide (UK budgetary impact) | 18 turns | **6 turns** | 5-8 |
177+
| Economy-wide (US CTC impact) | 20+ turns | - | 8-10 |
155178

156179
## Progress log
157180

@@ -163,3 +186,7 @@ Tests are in `tests/test_agent_policy_questions.py` (integration tests requiring
163186
- 2024-12-30: Fixed model name mismatch (policyengine-uk with hyphen, not underscore)
164187
- 2024-12-30: Added case-insensitive search using ILIKE
165188
- 2024-12-30: Tested personal allowance lookup - **3 turns** (target met!)
189+
- 2025-12-30: Tested UK economy-wide (budgetary impact) - 18 turns initially
190+
- 2025-12-30: Restructured system prompt to emphasize country filter at top
191+
- 2025-12-30: UK economy-wide now **6 turns** (3x improvement)
192+
- 2025-12-30: Discovered US CTC parameters missing labels (upstream issue in policyengine-us)

docs/src/app/modal/page.tsx

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
export default function ModalPage() {
2+
return (
3+
<div className="max-w-4xl">
4+
<h1 className="text-3xl font-semibold text-[var(--color-text-primary)] mb-4">
5+
Modal compute
6+
</h1>
7+
<p className="text-lg text-[var(--color-text-secondary)] mb-8">
8+
PolicyEngine uses Modal.com for serverless compute, with two separate apps for different workloads.
9+
</p>
10+
11+
<div className="space-y-8">
12+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
13+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">Why two apps?</h2>
14+
<p className="text-sm text-[var(--color-text-secondary)] mb-4">
15+
The API uses two separate Modal apps rather than one combined app. This separation is intentional and provides several benefits:
16+
</p>
17+
<div className="space-y-4">
18+
<div>
19+
<h3 className="font-medium text-[var(--color-text-primary)] mb-2">Image size</h3>
20+
<p className="text-sm text-[var(--color-text-secondary)]">
21+
The <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">policyengine</code> app has massive container images (multiple GB) with the full UK and US tax-benefit models pre-loaded. The <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">policyengine-sandbox</code> app is minimal - just the Anthropic SDK and requests library.
22+
</p>
23+
</div>
24+
<div>
25+
<h3 className="font-medium text-[var(--color-text-primary)] mb-2">Cold start optimisation</h3>
26+
<p className="text-sm text-[var(--color-text-secondary)]">
27+
The main app uses Modal&apos;s memory snapshot feature to pre-load PolicyEngine models at build time. When a function cold starts, it restores from the snapshot rather than re-importing the models, achieving sub-1s cold starts for functions that would otherwise take 30+ seconds to import.
28+
</p>
29+
</div>
30+
<div>
31+
<h3 className="font-medium text-[var(--color-text-primary)] mb-2">Architectural decoupling</h3>
32+
<p className="text-sm text-[var(--color-text-secondary)]">
33+
The sandbox/agent calls the public API endpoints, which then trigger the simulation functions. They&apos;re independent - the agent doesn&apos;t directly import PolicyEngine models, it makes HTTP calls.
34+
</p>
35+
</div>
36+
<div>
37+
<h3 className="font-medium text-[var(--color-text-primary)] mb-2">Independent scaling</h3>
38+
<p className="text-sm text-[var(--color-text-secondary)]">
39+
Simulation workloads scale differently from agent chat sessions. Keeping them separate lets Modal scale each independently based on demand.
40+
</p>
41+
</div>
42+
</div>
43+
</section>
44+
45+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
46+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">policyengine app</h2>
47+
<p className="text-sm text-[var(--color-text-secondary)] mb-4">
48+
The main compute app for running simulations. Located at <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">src/policyengine_api/modal_app.py</code>.
49+
</p>
50+
51+
<div className="overflow-x-auto">
52+
<table className="w-full text-sm">
53+
<thead>
54+
<tr className="border-b border-[var(--color-border)]">
55+
<th className="text-left py-2 pr-4 font-medium text-[var(--color-text-primary)]">Function</th>
56+
<th className="text-left py-2 pr-4 font-medium text-[var(--color-text-primary)]">Image</th>
57+
<th className="text-left py-2 pr-4 font-medium text-[var(--color-text-primary)]">Memory</th>
58+
<th className="text-left py-2 font-medium text-[var(--color-text-primary)]">Purpose</th>
59+
</tr>
60+
</thead>
61+
<tbody className="text-[var(--color-text-secondary)]">
62+
<tr className="border-b border-[var(--color-border)]">
63+
<td className="py-2 pr-4 font-mono text-xs">simulate_household_uk</td>
64+
<td className="py-2 pr-4">uk_image</td>
65+
<td className="py-2 pr-4">4GB</td>
66+
<td className="py-2">Single UK household calculation</td>
67+
</tr>
68+
<tr className="border-b border-[var(--color-border)]">
69+
<td className="py-2 pr-4 font-mono text-xs">simulate_household_us</td>
70+
<td className="py-2 pr-4">us_image</td>
71+
<td className="py-2 pr-4">4GB</td>
72+
<td className="py-2">Single US household calculation</td>
73+
</tr>
74+
<tr className="border-b border-[var(--color-border)]">
75+
<td className="py-2 pr-4 font-mono text-xs">simulate_economy_uk</td>
76+
<td className="py-2 pr-4">uk_image</td>
77+
<td className="py-2 pr-4">8GB</td>
78+
<td className="py-2">UK economy simulation</td>
79+
</tr>
80+
<tr className="border-b border-[var(--color-border)]">
81+
<td className="py-2 pr-4 font-mono text-xs">simulate_economy_us</td>
82+
<td className="py-2 pr-4">us_image</td>
83+
<td className="py-2 pr-4">8GB</td>
84+
<td className="py-2">US economy simulation</td>
85+
</tr>
86+
<tr className="border-b border-[var(--color-border)]">
87+
<td className="py-2 pr-4 font-mono text-xs">economy_comparison_uk</td>
88+
<td className="py-2 pr-4">uk_image</td>
89+
<td className="py-2 pr-4">8GB</td>
90+
<td className="py-2">UK decile impacts, budget impact</td>
91+
</tr>
92+
<tr>
93+
<td className="py-2 pr-4 font-mono text-xs">economy_comparison_us</td>
94+
<td className="py-2 pr-4">us_image</td>
95+
<td className="py-2 pr-4">8GB</td>
96+
<td className="py-2">US decile impacts, budget impact</td>
97+
</tr>
98+
</tbody>
99+
</table>
100+
</div>
101+
102+
<div className="mt-4 p-3 bg-[var(--color-surface-sunken)] rounded-lg">
103+
<p className="text-xs text-[var(--color-text-muted)]">
104+
Deploy with: <code className="font-mono">modal deploy src/policyengine_api/modal_app.py</code>
105+
</p>
106+
</div>
107+
</section>
108+
109+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
110+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">policyengine-sandbox app</h2>
111+
<p className="text-sm text-[var(--color-text-secondary)] mb-4">
112+
Lightweight app for the AI agent. Located at <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">src/policyengine_api/agent_sandbox.py</code>.
113+
</p>
114+
115+
<div className="overflow-x-auto">
116+
<table className="w-full text-sm">
117+
<thead>
118+
<tr className="border-b border-[var(--color-border)]">
119+
<th className="text-left py-2 pr-4 font-medium text-[var(--color-text-primary)]">Function</th>
120+
<th className="text-left py-2 pr-4 font-medium text-[var(--color-text-primary)]">Dependencies</th>
121+
<th className="text-left py-2 font-medium text-[var(--color-text-primary)]">Purpose</th>
122+
</tr>
123+
</thead>
124+
<tbody className="text-[var(--color-text-secondary)]">
125+
<tr>
126+
<td className="py-2 pr-4 font-mono text-xs">run_agent</td>
127+
<td className="py-2 pr-4">anthropic, requests</td>
128+
<td className="py-2">Agentic loop using Claude with API tools</td>
129+
</tr>
130+
</tbody>
131+
</table>
132+
</div>
133+
134+
<p className="text-sm text-[var(--color-text-secondary)] mt-4">
135+
The agent dynamically generates Claude tools from the OpenAPI spec, then executes an agentic loop to answer policy questions by making API calls. It doesn&apos;t import PolicyEngine directly.
136+
</p>
137+
138+
<div className="mt-4 p-3 bg-[var(--color-surface-sunken)] rounded-lg">
139+
<p className="text-xs text-[var(--color-text-muted)]">
140+
Deploy with: <code className="font-mono">modal deploy src/policyengine_api/agent_sandbox.py</code>
141+
</p>
142+
</div>
143+
</section>
144+
145+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
146+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">Memory snapshots</h2>
147+
<p className="text-sm text-[var(--color-text-secondary)] mb-4">
148+
The <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">policyengine</code> app uses Modal&apos;s <code className="px-1.5 py-0.5 bg-[var(--color-surface-sunken)] rounded text-xs">run_function</code> to snapshot the Python interpreter state after importing the models:
149+
</p>
150+
<pre className="p-4 bg-[var(--color-surface-sunken)] rounded-lg text-xs font-mono overflow-x-auto text-[var(--color-text-secondary)]">
151+
{`def _import_uk():
152+
from policyengine.tax_benefit_models.uk import uk_latest
153+
print("UK model loaded and snapshotted")
154+
155+
uk_image = base_image.run_commands(
156+
"uv pip install --system policyengine-uk>=2.0.0"
157+
).run_function(_import_uk)`}
158+
</pre>
159+
<p className="text-sm text-[var(--color-text-secondary)] mt-4">
160+
When a cold start happens, Modal restores from this snapshot rather than re-running the imports. This turns a 30+ second import into sub-second startup.
161+
</p>
162+
</section>
163+
164+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
165+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">Secrets</h2>
166+
<p className="text-sm text-[var(--color-text-secondary)] mb-4">
167+
Each app uses different Modal secrets:
168+
</p>
169+
<div className="space-y-3">
170+
<div className="flex items-start gap-3">
171+
<span className="px-2 py-1 bg-[var(--color-surface-sunken)] rounded text-xs font-mono text-[var(--color-text-secondary)]">policyengine-db</span>
172+
<p className="text-sm text-[var(--color-text-secondary)]">Database credentials for the main app (DATABASE_URL, SUPABASE_URL, SUPABASE_KEY)</p>
173+
</div>
174+
<div className="flex items-start gap-3">
175+
<span className="px-2 py-1 bg-[var(--color-surface-sunken)] rounded text-xs font-mono text-[var(--color-text-secondary)]">anthropic-api-key</span>
176+
<p className="text-sm text-[var(--color-text-secondary)]">Anthropic API key for the sandbox app (ANTHROPIC_API_KEY)</p>
177+
</div>
178+
</div>
179+
</section>
180+
181+
<section className="p-6 border border-[var(--color-border)] rounded-xl bg-white">
182+
<h2 className="text-xl font-semibold text-[var(--color-text-primary)] mb-4">Request flow</h2>
183+
<div className="space-y-3">
184+
{[
185+
"Client calls API endpoint (e.g. POST /household/calculate)",
186+
"FastAPI validates request and creates job record in Supabase",
187+
"FastAPI triggers Modal function asynchronously",
188+
"API returns job ID immediately",
189+
"Modal function runs calculation with pre-loaded models",
190+
"Modal function writes results directly to Supabase",
191+
"Client polls API until job status = completed",
192+
].map((step, index) => (
193+
<div key={index} className="flex items-start gap-3">
194+
<span className="flex-shrink-0 w-6 h-6 rounded-full bg-[var(--color-pe-green)] text-white text-xs font-medium flex items-center justify-center">
195+
{index + 1}
196+
</span>
197+
<p className="text-sm text-[var(--color-text-secondary)] pt-0.5">{step}</p>
198+
</div>
199+
))}
200+
</div>
201+
</section>
202+
</div>
203+
</div>
204+
);
205+
}

src/policyengine_api/agent_sandbox.py

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -18,30 +18,39 @@
1818

1919
SYSTEM_PROMPT = """You are a PolicyEngine assistant that helps users understand tax and benefit policies.
2020
21-
You have access to the full PolicyEngine API. Key workflows:
21+
You have access to the full PolicyEngine API.
2222
23-
1. **Household calculations**: POST to /household/calculate with people array, then poll GET /household/calculate/{job_id}
24-
2. **Parameter lookup**: GET /parameters/ with search query and tax_benefit_model_name, then GET /parameter-values/ with parameter_id
25-
3. **Economic impact**:
26-
- GET /parameters/ to find parameter_id
23+
## CRITICAL: Always filter by country
24+
25+
When searching for parameters or datasets, ALWAYS include tax_benefit_model_name:
26+
- "policyengine-uk" for UK questions
27+
- "policyengine-us" for US questions
28+
29+
Parameters and datasets from both countries are in the same database. Without the filter, you'll get mixed results and waste turns finding the right ones.
30+
31+
## Key workflows
32+
33+
1. **Household calculations**:
34+
- POST /household/calculate with model_name and people array
35+
- Poll GET /household/calculate/{job_id} until completed
36+
37+
2. **Parameter lookup**:
38+
- GET /parameters/?search=...&tax_benefit_model_name=policyengine-uk (ALWAYS include country filter)
39+
- GET /parameter-values/?parameter_id=...&current=true for the current value
40+
41+
3. **Economic impact analysis** (budget impact, decile impacts):
42+
- GET /parameters/?search=...&tax_benefit_model_name=policyengine-uk to find parameter_id
2743
- POST /policies/ to create reform with parameter_values
28-
- GET /datasets/ to find dataset_id
29-
- POST /analysis/economic-impact with policy_id and dataset_id
30-
- Poll GET /analysis/economic-impact/{report_id} until completed
44+
- GET /datasets/?tax_benefit_model_name=policyengine-uk to find dataset_id
45+
- POST /analysis/economic-impact with tax_benefit_model_name, policy_id and dataset_id
46+
- GET /analysis/economic-impact/{report_id} for results (includes decile_impacts and program_statistics)
3147
32-
When searching for parameters, use tax_benefit_model_name to filter by country:
33-
- "policyengine-uk" for UK parameters
34-
- "policyengine-us" for US parameters
48+
## Guidelines
3549
36-
When answering questions:
3750
1. Use the API tools to get accurate, current data
38-
2. Show your calculations clearly
39-
3. Be concise but thorough
40-
4. For UK, amounts are in GBP. For US, amounts are in USD.
41-
5. Poll async endpoints until status is "completed"
42-
43-
IMPORTANT: When polling async endpoints, ALWAYS use the sleep tool to wait 5-10 seconds between requests.
44-
Do not poll in a tight loop - this wastes resources and may hit rate limits.
51+
2. Be concise but thorough
52+
3. For UK, amounts are in GBP. For US, amounts are in USD.
53+
4. When polling async endpoints, use the sleep tool to wait 5-10 seconds between requests
4554
"""
4655

4756
# Sleep tool for polling delays

0 commit comments

Comments
 (0)