Skip to content

Commit 1f57e98

Browse files
MaxGhenisclaude
andauthored
Add state and AGI cross-tab EITC calibration targets (#802) (#803)
* Add state and AGI cross-tab EITC calibration targets (#802) Extend build_loss_matrix() with two new target families sourced from IRS SOI: * Per-state EITC returns and amounts from Historical Table 2 (eitc_state.csv), ~102 new loss-matrix columns covering 50 states + DC. * Per-(qualifying-children x AGI bucket) EITC returns and amounts from Publication 1304 Table 2.5 (eitc_by_agi_and_children.csv), ~224 new columns over the SOI small-bin AGI structure. Both targets use the existing eitc_spending_uprating / population_uprating factors so they move with the Treasury EITC and population trajectories. A _skip_unverified_target helper keeps the optimizer from consuming "[TO BE CALCULATED]" placeholders. Also adds refresh_eitc_state_and_agi_targets.py, a parameterized data-pull script that future-year refreshes can run with --year <tax_year>, plus tests/unit/calibration/ test_eitc_extended_targets.py covering CSV shape, the IRS state-sum-to-national crosscheck, loss-matrix column naming, and placeholder skipping. State sum crosscheck for TY2022: 23,679,560 returns / $59,178,091,000 vs IRS US row 23,692,190 returns / $59,204,588,000 — ~0.05% off, within disclosure rounding. Gap vs Treasury outlay target ($77.3B) reflects the refundable-only Treasury definition; IRS SOI is the correct comparator for the full eitc variable. Related to #802. * fixup! Add state and AGI cross-tab EITC calibration targets (#802) * Drop contradictory Treasury+legacy EITC targets; add regression tests Codex review of #803 found two internal contradictions in the EITC target set: (1) the loss function targeted Treasury's $67B outlay parameter alongside the new SOI-derived $59B state-row sum and $60B AGI×children-row sum, forcing the optimizer onto an unsatisfiable pareto front; (2) the legacy eitc.csv carried TY2020 per-child-count values that duplicated (and conflicted with) the new cross-tab. Fix by anchoring EITC calibration on IRS SOI TY2022 tables alone: keep state and (child × AGI bucket) targets, drop the Treasury aggregate column and the stale per-child-count rows. Treasury's parameter is still used to derive the dollar uprating trajectory. New tests cover the cases Codex flagged as unverified: mixed- placeholder rows (valid returns + [TO BE CALCULATED] amount) must keep the valid metric and drop the invalid one without breaking matrix/target alignment; the "3 or more children" bucket uses >= so a 4-child household registers once, in c3 only; non-unity uprating factors propagate to target values. Two regression tests pin the removals: nation/treasury/eitc must never appear as a loss-matrix column, and count_children_ slugs stay out of the source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply ruff format to cps.py Pre-existing format drift from #801 that ruff 0.9.0+ flags; unblocks the lint check on this branch. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b1904f0 commit 1f57e98

9 files changed

Lines changed: 1140 additions & 53 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ node_modules
1616
!medicaid_enrollment_2024.csv
1717
!medicaid_enrollment_2025.csv
1818
!eitc.csv
19+
!eitc_state.csv
20+
!eitc_by_agi_and_children.csv
1921
!spm_threshold_agi.csv
2022
!population_by_state.csv
2123
!aca_spending_and_enrollment_2024.csv
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Rebuilt EITC calibration on a coherent IRS SOI TY2022 target set. Added ~102 per-state targets (SOI Historical Table 2) and ~224 per-(child x AGI) targets (Publication 1304 Table 2.5), and removed the contradictory Treasury `tax_expenditures.eitc` aggregate column (which measures outlays, not total claimed) plus the stale TY2020 `eitc.csv` per-child-count targets. The optimizer now has geographic and AGI-shape coverage over EITC without fighting definition mismatches between outlay- and claim-based totals. Addresses #802.

policyengine_us_data/datasets/cps/cps.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -491,14 +491,11 @@ def add_marketplace_plan_benchmark_ratio(self):
491491
map_to="tax_unit",
492492
period=period,
493493
).values
494-
takes_up_aca = (
495-
baseline.calculate(
496-
"takes_up_aca_if_eligible",
497-
map_to="tax_unit",
498-
period=period,
499-
)
500-
.values.astype(bool)
501-
)
494+
takes_up_aca = baseline.calculate(
495+
"takes_up_aca_if_eligible",
496+
map_to="tax_unit",
497+
period=period,
498+
).values.astype(bool)
502499

503500
data["selected_marketplace_plan_benchmark_ratio"] = (
504501
compute_marketplace_plan_benchmark_ratio(

policyengine_us_data/storage/calibration_targets/eitc.csv

Lines changed: 0 additions & 5 deletions
This file was deleted.
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# IRS SOI Publication 1304 Table 2.5, Tax Year 2022 (2022in25ic.xls). 'Total earned income credit' columns by qualifying-children bucket. count_children=3 means 'three or more'. Amount converted from thousands of dollars to dollars. Pulled from https://www.irs.gov/pub/irs-soi/22in25ic.xls.
2+
count_children,agi_lower,agi_upper,returns,amount
3+
0,-inf,1,97411,41206000
4+
0,1,1000,285178,22736000
5+
0,1000,2000,281955,40315000
6+
0,2000,3000,275689,59028000
7+
0,3000,4000,301329,88188000
8+
0,4000,5000,358319,133584000
9+
0,5000,6000,358897,167605000
10+
0,6000,7000,404201,207757000
11+
0,7000,8000,437284,251673000
12+
0,8000,9000,462996,270165000
13+
0,9000,10000,482714,294050000
14+
0,10000,11000,420008,235544000
15+
0,11000,12000,381887,168944000
16+
0,12000,13000,446747,179250000
17+
0,13000,14000,459049,150419000
18+
0,14000,15000,437013,103694000
19+
0,15000,16000,439608,87875000
20+
0,16000,17000,232043,43063000
21+
0,17000,18000,51053,24150000
22+
0,18000,19000,63385,23912000
23+
0,19000,20000,62954,17414000
24+
0,20000,25000,134454,20098000
25+
0,25000,30000,4170,535000
26+
0,30000,35000,0,0
27+
0,35000,40000,0,0
28+
0,40000,45000,0,0
29+
0,45000,50000,0,0
30+
0,50000,inf,0,0
31+
1,-inf,1,23329,62880000
32+
1,1,1000,34761,17017000
33+
1,1000,2000,50466,27935000
34+
1,2000,3000,64957,62144000
35+
1,3000,4000,93280,110863000
36+
1,4000,5000,88772,137179000
37+
1,5000,6000,104160,193576000
38+
1,6000,7000,119274,264356000
39+
1,7000,8000,135586,340845000
40+
1,8000,9000,154283,444678000
41+
1,9000,10000,174617,551338000
42+
1,10000,11000,293113,1024890000
43+
1,11000,12000,421475,1552950000
44+
1,12000,13000,320843,1180848000
45+
1,13000,14000,245140,906659000
46+
1,14000,15000,221785,807535000
47+
1,15000,16000,187967,683914000
48+
1,16000,17000,224242,830137000
49+
1,17000,18000,212303,782155000
50+
1,18000,19000,211815,777619000
51+
1,19000,20000,223321,829736000
52+
1,20000,25000,995954,3334770000
53+
1,25000,30000,1025698,2717219000
54+
1,30000,35000,1101137,2056413000
55+
1,35000,40000,988295,1108084000
56+
1,40000,45000,637801,323346000
57+
1,45000,50000,136044,53661000
58+
1,50000,inf,0,0
59+
2,-inf,1,14584,57695000
60+
2,1,1000,15469,15804000
61+
2,1000,2000,27954,27509000
62+
2,2000,3000,23008,21351000
63+
2,3000,4000,34563,50099000
64+
2,4000,5000,44777,87719000
65+
2,5000,6000,46699,96938000
66+
2,6000,7000,53604,135658000
67+
2,7000,8000,45036,135960000
68+
2,8000,9000,59455,199938000
69+
2,9000,10000,62223,235095000
70+
2,10000,11000,63624,263643000
71+
2,11000,12000,81574,376262000
72+
2,12000,13000,92973,456085000
73+
2,13000,14000,117291,630684000
74+
2,14000,15000,136024,779526000
75+
2,15000,16000,307492,1847652000
76+
2,16000,17000,189001,1153296000
77+
2,17000,18000,185878,1129855000
78+
2,18000,19000,194242,1170287000
79+
2,19000,20000,155523,953889000
80+
2,20000,25000,653409,3743482000
81+
2,25000,30000,622129,2987570000
82+
2,30000,35000,638436,2440873000
83+
2,35000,40000,638815,1816631000
84+
2,40000,45000,503441,913571000
85+
2,45000,50000,419620,411869000
86+
2,50000,inf,201243,115099000
87+
3,-inf,1,13131,65450000
88+
3,1,1000,7152,2079000
89+
3,1000,2000,7569,11110000
90+
3,2000,3000,18271,26498000
91+
3,3000,4000,9059,17226000
92+
3,4000,5000,8381,23895000
93+
3,5000,6000,15194,38890000
94+
3,6000,7000,17035,48284000
95+
3,7000,8000,25041,80532000
96+
3,8000,9000,27984,102798000
97+
3,9000,10000,31279,131685000
98+
3,10000,11000,29005,136747000
99+
3,11000,12000,53383,265942000
100+
3,12000,13000,49918,277924000
101+
3,13000,14000,51676,299649000
102+
3,14000,15000,59028,370537000
103+
3,15000,16000,150741,1034334000
104+
3,16000,17000,129318,893694000
105+
3,17000,18000,80126,552247000
106+
3,18000,19000,93481,642431000
107+
3,19000,20000,82251,566422000
108+
3,20000,25000,361521,2355192000
109+
3,25000,30000,317284,1821957000
110+
3,30000,35000,329316,1571389000
111+
3,35000,40000,321723,1204326000
112+
3,40000,45000,262410,722213000
113+
3,45000,50000,280652,503449000
114+
3,50000,inf,257570,234030000
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# IRS SOI Historical Table 2 (2022in55cmcsv.csv), EIC columns N59660 (returns) and A59660 (amount, thousands USD). Pulled from https://www.irs.gov/pub/irs-soi/22in55cmcsv.csv. Amount converted to dollars.
2+
GEO_ID,Returns,Amount
3+
0400000US01,440510,1262145000
4+
0400000US02,38650,82237000
5+
0400000US04,515920,1311591000
6+
0400000US05,266840,726773000
7+
0400000US06,2519120,5770703000
8+
0400000US08,302060,657919000
9+
0400000US09,200860,454175000
10+
0400000US10,68480,167226000
11+
0400000US11,45200,108269000
12+
0400000US12,2035890,5049026000
13+
0400000US13,1010000,2824708000
14+
0400000US15,84970,185872000
15+
0400000US16,111860,256435000
16+
0400000US17,823080,2083249000
17+
0400000US18,469900,1166303000
18+
0400000US19,177190,417372000
19+
0400000US20,176390,425246000
20+
0400000US21,354590,897259000
21+
0400000US22,471100,1395618000
22+
0400000US23,81740,165071000
23+
0400000US24,379150,902856000
24+
0400000US25,335060,716200000
25+
0400000US26,695110,1767534000
26+
0400000US27,287420,646938000
27+
0400000US28,331450,981642000
28+
0400000US29,434870,1084558000
29+
0400000US30,67140,144278000
30+
0400000US31,112520,268520000
31+
0400000US32,246160,602123000
32+
0400000US33,59420,116091000
33+
0400000US34,545770,1302340000
34+
0400000US35,185030,450664000
35+
0400000US36,1451910,3464518000
36+
0400000US37,822140,2090845000
37+
0400000US38,39070,88053000
38+
0400000US39,818070,2069390000
39+
0400000US40,322440,839474000
40+
0400000US41,228960,477312000
41+
0400000US42,812680,1920624000
42+
0400000US44,69890,160174000
43+
0400000US45,443280,1150240000
44+
0400000US46,52530,118308000
45+
0400000US47,557800,1448214000
46+
0400000US48,2610520,7329572000
47+
0400000US49,161140,366882000
48+
0400000US50,35070,67922000
49+
0400000US51,535670,1293831000
50+
0400000US53,368920,804110000
51+
0400000US54,129510,309932000
52+
0400000US55,313960,715162000
53+
0400000US56,32550,72587000

0 commit comments

Comments
 (0)