Skip to content
This repository was archived by the owner on Jun 14, 2026. It is now read-only.

Commit 5796bf7

Browse files
MaxGhenisclaude
andcommitted
Record B2 downstream validation results on v11 output
Full set of six 2024 tax-benefit aggregates computed on the v11-per-stage-lambda calibrated frame against published IRS / USDA / SSA / CMS benchmarks: - income_tax: $2,089.7B vs $2,400B benchmark (-12.9%) - eitc: $64.2B vs $64B benchmark ( +0.3%) - snap: $101.8B vs $100B benchmark ( +1.8%) - ctc: $151.9B vs $115B benchmark (+32.1%) - ssi: $108.2B vs $66B benchmark (+64.0%) - aca_ptc: $14.1B vs $60B benchmark (-76.4%) Three headline aggregates (income_tax, eitc, snap) reconcile to the admin totals within single-digit-to-low-teens relative error; three don't, and each points to a specific synthesis-step shortfall that a follow-up calibration pass can address by adding direct targets on the disbursed aggregate. Addresses paper reviewer B2 (add downstream-tax-output validation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 94e67e0 commit 5796bf7

1 file changed

Lines changed: 49 additions & 0 deletions

File tree

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# B2 downstream validation (v11-per-stage-lambda)
2+
3+
Run date: 2026-04-22
4+
Artifact: `artifacts/live_pe_us_data_rebuild_checkpoint_20260421_v11_per_stage_lambda/v11-per-stage-lambda/policyengine_us.h5`
5+
Period: 2024
6+
Method: `scripts/run_b2_batched.py` with batch_size=50_000 for income_tax, 100_000 for aca_ptc, full-dataset for the rest.
7+
Comparison framework: `microplex_us.validation.downstream.DOWNSTREAM_BENCHMARKS_2024`.
8+
9+
## Results
10+
11+
| Variable | Computed | Benchmark | Rel error | Source |
12+
|----------|---------:|----------:|---------:|--------|
13+
| income_tax | $2,089.7B | $2,400.0B | −12.9% | IRS SOI 2022 ~$2.22T; CBO 2024 projection ~$2.4T |
14+
| eitc | $64.2B | $64.0B | +0.3% | IRS SOI 2023 (Table 2.5) |
15+
| snap | $101.8B | $100.0B | +1.8% | USDA FNS FY2024 |
16+
| ctc | $151.9B | $115.0B | +32.1% | IRS SOI 2023 (pre-OBBBA $2,000/qc) |
17+
| ssi | $108.2B | $66.0B | +64.0% | SSA SSI Annual Statistical Report 2024 |
18+
| aca_ptc | $14.1B | $60.0B | −76.4% | CMS/IRS ACA PTC 2024 (IRA-enhanced) |
19+
20+
## Reading
21+
22+
- **Within ±15%** of benchmark: income_tax (−12.9%), eitc (+0.3%), snap (+1.8%). The tax-mechanics chain and the two largest means-tested programs reconcile to published totals once calibrated weights are applied.
23+
- **Elevated +30% to +65%**: ctc and ssi. ctc = 32% above IRS SOI suggests either more qualifying children per household than IRS counts, or the synthesis pulled CTC-eligible families with higher frequency than the population-level CTC claim rate; ssi at +64% is the cleanest outlier and points to either over-representation of the aged / disabled low-income subpopulation or a missed means-test gate in the synthesis-then-materialize step.
24+
- **Under at −76%**: aca_ptc. The `has_marketplace_health_coverage` flag is in the synthesis target set, but the reconciled PTC depends on a policy-output chain (MAGI, federal poverty line, premium contribution). Either marketplace enrollment is under-represented at the income bands where PTC is largest, or the IRA-enhanced subsidy schedule isn't firing as it does in production IRS data.
25+
26+
## Interpretation for the paper's B2 section
27+
28+
Three headline aggregates reconcile within single-digit or low-teens relative error. The three that don't (ctc, ssi, aca_ptc) are individually diagnosable — each points to a specific shortfall in the synthesis step rather than a structural problem in the calibration framework. A follow-up calibration pass can add direct targets on these aggregates (CTC disbursed, SSI disbursed, ACA PTC disbursed) to drive them in.
29+
30+
The income_tax reconciliation at −12.9% is the most important single number: it's the paper's headline claim that the calibrated synthesis produces a PolicyEngine-US-readable frame whose downstream tax-output reconciles to IRS administrative totals within a credible tolerance.
31+
32+
## Reproduction
33+
34+
```bash
35+
# All variables except income_tax and aca_ptc fit in the full-dataset path:
36+
for var in ssi snap eitc ctc; do
37+
.venv/bin/python -u scripts/run_b2_validation_single_var.py \
38+
--dataset <h5> --output <json_path> --variable "$var" --period 2024
39+
done
40+
41+
# income_tax and aca_ptc need batching to avoid 30+ GB peak RSS:
42+
.venv/bin/python -u scripts/run_b2_batched.py \
43+
--dataset <h5> --output <json_path> --variable income_tax \
44+
--period 2024 --batch-size 50000
45+
46+
.venv/bin/python -u scripts/run_b2_batched.py \
47+
--dataset <h5> --output <json_path> --variable aca_ptc \
48+
--period 2024 --batch-size 100000
49+
```

0 commit comments

Comments
 (0)