Context
This issue was broken out of the skipped-tests umbrella issue (#501) so the design question can be discussed on its own, without crowding the three concrete PRs that the umbrella proposes (fingerprint, SOI sanity checks, cleanup).
@martinholmer — would welcome your thoughts on this. If you want me to make a concrete proposal first, please let me know.
Intent
We want a running test that weighted total individual income tax (iitax) and weighted total payroll tax (payrolltax) on TMD data are reasonable at a small number of future years. This is the one kind of check that exercises the growfactor / uprating path for more than one year out. The running tests/test_tax_expenditures.py already exercises uprating for 2022 → 2023 (tax-expenditure estimates compared against committed reference values), so single-step uprating is covered; what is not covered today is multi-year or far-future uprating — which is precisely where growfactors and post-OBBBA policy effects matter most.
A post-OBBBA CBO source is available: the 2026-02-01 Winter baseline in US-CBO/eval-projections/input_data/baselines.csv. In that vintage, individual income tax is FY26 = $2,751.291 B and FY33 = $3,743.854 B — about 3% lower than the pre-OBBBA values currently shipped in tests/expected_itax_rev_2022_data.yaml.
Before writing this test we need to figure out what to compare against and how strict a tolerance is defensible — which is the subject of this issue.
The problem: tax-calculator and CBO don't measure the same thing
When we set up a 2022 baseline comparison to check how well the two sides agree before we project forward, we hit a roughly 13% gap on individual income tax and about 11% on payroll tax. Neither is a TMD bug — both come from differences in what each side is measuring:
- CBO "Individual Income Taxes" is Treasury cash receipts on Monthly Treasury Statement (MTS) methodology — cash basis, net of refunds, on a combined unified basis. Per MTS notes, individual income tax is derived as the residual of the combined payment after SS / Medicare estimates are deducted from combined FICA+IIT Treasury deposits. That line structurally includes Form 1041 (estates and trusts), Form 1042 (nonresident-alien withholding), cash-vs-accrual timing effects, and refund-netting. TaxCalc
iitax is an accrued-liability measure on the 1040 universe only (c09200 − refund in calcfunctions.py).
- CBO "Payroll Taxes" = OASDI + HI (Medicare Part A) + Unemployment Insurance + federal employees' retirement + Railroad Retirement, both employer and employee shares, including SECA. TaxCalc
payrolltax = ptax_was + extra_payrolltax — FICA on wages plus SECA only, with no UI, federal employee retirement, or Railroad Retirement.
Observed gaps, calendar-year 2022
Converting fiscal-year CBO figures to calendar year using FY22 + 0.25·(FY23 − FY22) (the same FY→CY interpolation pattern used in the skipped test_tax_revenue.py). The TMD side uses the formulas from that test: for itax, iitax + refund on PUF records (the + refund adds back the refundable-credit payout portion to match CBO's treatment of those as outlays rather than negative revenue); for ptax, payrolltax on all records.
| Aggregate |
TMD CY2022 |
CBO CY2022 (pre-OBBBA interp) |
Δ |
iitax + refund (PUF records, weighted) |
$2,253.9 B |
$2,605.3 B |
−13.5% |
payrolltax (all records, weighted) |
$1,342.8 B |
$1,503.2 B |
−10.7% |
What plausibly accounts for the 13.5% individual-income-tax gap
The TMD side is lower by roughly $350 B in 2022. No single factor dominates; the gap is a combination of items that are in CBO but not in the TaxCalc 1040 universe:
| Component |
Approximate magnitude |
Form 1041 (estates and trusts income tax) — in CBO, not in TaxCalc iitax |
$30–40 B |
| Form 1042 / NRA withholding, net of refunds and treaty/credit offsets |
$50–100 B |
| Late assessments, audit collections, penalties, interest on 1040 accounts |
$10–30 B |
| Cash-vs-accrual timing, unusually large in 2022 from ARPA CTC reconciliation and pandemic-era processing backlogs |
$50–100 B |
| Treasury MTS residual methodology (individual income tax derived as residual after SS / Medicare estimates are subtracted from combined deposits) |
indeterminate, nonzero |
| Total plausibly accounted for |
$140–270 B |
| Unexplained residual |
$80–210 B |
We can effectively rule out one hypothesis: "withholding from people who never file for refunds." The IRS reports only $1–2 B per year of unclaimed refunds, far too small to contribute meaningfully. Cash-vs-accrual timing would normally average out over time; 2022 is an unusual year because of ARPA CTC advance-payment reconciliation and pandemic-era processing backlogs.
Cross-check against SOI
The same TMD iitax ($2,147.4 B) matches SOI tottax ($2,139.9 B) to within 0.35% — see the SOI sanity-check PR in the umbrella for detail. SOI and TaxCalc agree closely on 1040-universe individual income tax liability. CBO disagrees with both SOI and TaxCalc by 13–17% in 2022, which is evidence that the CBO-vs-TMD gap is primarily a CBO-vs-SOI definitional gap, not a TMD modeling problem.
Four options for how to build the future-year test
We don't have a strong view and would appreciate your read before committing to an approach.
-
Growth-rate comparison. Check (TMD_future / TMD_2022) against (CBO_future / CBO_2022) for each aggregate. Taking ratios cancels the level gap (it appears in both numerator and denominator) and directly tests the uprating path — which is the thing we actually want to validate. A tight tolerance (3–5%) becomes possible. Cleanest option, but assumes the definitional gap stays roughly constant over time.
-
Level comparison with wide tolerance. Accept the level gap and use a ~25% tolerance. Simpler, but less useful diagnostically — a failure could not tell us whether the growfactor is wrong or the base year drifted.
-
Narrower CBO / JCT / Treasury publication. Is there a breakdown that separates 1040-only individual income tax (excluding 1041, 1042, NRA withholding, and refund netting)? Or a payroll-tax subset that excludes UI, Railroad Retirement, and federal employee retirement? If yes, level comparison with a tight tolerance becomes viable. (We did not find one, but you may know the literature better.)
-
Restrict the test to population as the primary future-year check. tmd/storage/input/cbo26_population.yaml extends to 2075 and is cleanly comparable to CBO (no definition issues — 0% gap in 2022). Keep itax and payrolltax out of the test entirely until there is a clean comparator.
Proposed anchor years
Once the comparability question is settled: FY2026 (near-term, post-OBBBA effective) and FY2034 (last year with a published CBO figure), possibly a midpoint like FY2030. Same FY → CY interpolation pattern as the existing test_tax_revenue.py.
This replaces the multi-year 2023–2033 sweep in the skipped test_tax_revenue, narrowing to a small number of defensible anchor years.
Requests
- Approach for itax / payrolltax: which of the four options above do you prefer? Is there an option 5 we missed?
- Anchor years: 2-year (FY26, FY34), or 3-year (+ FY30)?
- Tolerance: dependent on the approach answer — probably 3–5% under option 1, 25% under option 2, tight under option 3, n/a under option 4.
Related
Context
This issue was broken out of the skipped-tests umbrella issue (#501) so the design question can be discussed on its own, without crowding the three concrete PRs that the umbrella proposes (fingerprint, SOI sanity checks, cleanup).
@martinholmer— would welcome your thoughts on this. If you want me to make a concrete proposal first, please let me know.Intent
We want a running test that weighted total individual income tax (
iitax) and weighted total payroll tax (payrolltax) on TMD data are reasonable at a small number of future years. This is the one kind of check that exercises the growfactor / uprating path for more than one year out. The runningtests/test_tax_expenditures.pyalready exercises uprating for 2022 → 2023 (tax-expenditure estimates compared against committed reference values), so single-step uprating is covered; what is not covered today is multi-year or far-future uprating — which is precisely where growfactors and post-OBBBA policy effects matter most.A post-OBBBA CBO source is available: the 2026-02-01 Winter baseline in
US-CBO/eval-projections/input_data/baselines.csv. In that vintage, individual income tax is FY26 = $2,751.291 B and FY33 = $3,743.854 B — about 3% lower than the pre-OBBBA values currently shipped intests/expected_itax_rev_2022_data.yaml.Before writing this test we need to figure out what to compare against and how strict a tolerance is defensible — which is the subject of this issue.
The problem: tax-calculator and CBO don't measure the same thing
When we set up a 2022 baseline comparison to check how well the two sides agree before we project forward, we hit a roughly 13% gap on individual income tax and about 11% on payroll tax. Neither is a TMD bug — both come from differences in what each side is measuring:
iitaxis an accrued-liability measure on the 1040 universe only (c09200 − refundincalcfunctions.py).payrolltax=ptax_was + extra_payrolltax— FICA on wages plus SECA only, with no UI, federal employee retirement, or Railroad Retirement.Observed gaps, calendar-year 2022
Converting fiscal-year CBO figures to calendar year using
FY22 + 0.25·(FY23 − FY22)(the same FY→CY interpolation pattern used in the skippedtest_tax_revenue.py). The TMD side uses the formulas from that test: for itax,iitax + refundon PUF records (the+ refundadds back the refundable-credit payout portion to match CBO's treatment of those as outlays rather than negative revenue); for ptax,payrolltaxon all records.iitax + refund(PUF records, weighted)payrolltax(all records, weighted)What plausibly accounts for the 13.5% individual-income-tax gap
The TMD side is lower by roughly $350 B in 2022. No single factor dominates; the gap is a combination of items that are in CBO but not in the TaxCalc 1040 universe:
iitaxWe can effectively rule out one hypothesis: "withholding from people who never file for refunds." The IRS reports only $1–2 B per year of unclaimed refunds, far too small to contribute meaningfully. Cash-vs-accrual timing would normally average out over time; 2022 is an unusual year because of ARPA CTC advance-payment reconciliation and pandemic-era processing backlogs.
Cross-check against SOI
The same TMD
iitax($2,147.4 B) matches SOItottax($2,139.9 B) to within 0.35% — see the SOI sanity-check PR in the umbrella for detail. SOI and TaxCalc agree closely on 1040-universe individual income tax liability. CBO disagrees with both SOI and TaxCalc by 13–17% in 2022, which is evidence that the CBO-vs-TMD gap is primarily a CBO-vs-SOI definitional gap, not a TMD modeling problem.Four options for how to build the future-year test
We don't have a strong view and would appreciate your read before committing to an approach.
Growth-rate comparison. Check
(TMD_future / TMD_2022)against(CBO_future / CBO_2022)for each aggregate. Taking ratios cancels the level gap (it appears in both numerator and denominator) and directly tests the uprating path — which is the thing we actually want to validate. A tight tolerance (3–5%) becomes possible. Cleanest option, but assumes the definitional gap stays roughly constant over time.Level comparison with wide tolerance. Accept the level gap and use a ~25% tolerance. Simpler, but less useful diagnostically — a failure could not tell us whether the growfactor is wrong or the base year drifted.
Narrower CBO / JCT / Treasury publication. Is there a breakdown that separates 1040-only individual income tax (excluding 1041, 1042, NRA withholding, and refund netting)? Or a payroll-tax subset that excludes UI, Railroad Retirement, and federal employee retirement? If yes, level comparison with a tight tolerance becomes viable. (We did not find one, but you may know the literature better.)
Restrict the test to population as the primary future-year check.
tmd/storage/input/cbo26_population.yamlextends to 2075 and is cleanly comparable to CBO (no definition issues — 0% gap in 2022). Keep itax and payrolltax out of the test entirely until there is a clean comparator.Proposed anchor years
Once the comparability question is settled: FY2026 (near-term, post-OBBBA effective) and FY2034 (last year with a published CBO figure), possibly a midpoint like FY2030. Same FY → CY interpolation pattern as the existing
test_tax_revenue.py.This replaces the multi-year 2023–2033 sweep in the skipped
test_tax_revenue, narrowing to a small number of defensible anchor years.Requests
Related
test_tax_revenue)US-CBO/eval-projections/input_data/baselines.csv— post-OBBBA CBO baseline source