Skip to content

Design discussion: future-year revenue sanity check — how to align tax-calculator iitax / payrolltax with a CBO or Treasury comparator #502

@donboyd5

Description

@donboyd5

Context

This issue was broken out of the skipped-tests umbrella issue (#501) so the design question can be discussed on its own, without crowding the three concrete PRs that the umbrella proposes (fingerprint, SOI sanity checks, cleanup).

@martinholmer — would welcome your thoughts on this. If you want me to make a concrete proposal first, please let me know.

Intent

We want a running test that weighted total individual income tax (iitax) and weighted total payroll tax (payrolltax) on TMD data are reasonable at a small number of future years. This is the one kind of check that exercises the growfactor / uprating path for more than one year out. The running tests/test_tax_expenditures.py already exercises uprating for 2022 → 2023 (tax-expenditure estimates compared against committed reference values), so single-step uprating is covered; what is not covered today is multi-year or far-future uprating — which is precisely where growfactors and post-OBBBA policy effects matter most.

A post-OBBBA CBO source is available: the 2026-02-01 Winter baseline in US-CBO/eval-projections/input_data/baselines.csv. In that vintage, individual income tax is FY26 = $2,751.291 B and FY33 = $3,743.854 B — about 3% lower than the pre-OBBBA values currently shipped in tests/expected_itax_rev_2022_data.yaml.

Before writing this test we need to figure out what to compare against and how strict a tolerance is defensible — which is the subject of this issue.

The problem: tax-calculator and CBO don't measure the same thing

When we set up a 2022 baseline comparison to check how well the two sides agree before we project forward, we hit a roughly 13% gap on individual income tax and about 11% on payroll tax. Neither is a TMD bug — both come from differences in what each side is measuring:

  • CBO "Individual Income Taxes" is Treasury cash receipts on Monthly Treasury Statement (MTS) methodology — cash basis, net of refunds, on a combined unified basis. Per MTS notes, individual income tax is derived as the residual of the combined payment after SS / Medicare estimates are deducted from combined FICA+IIT Treasury deposits. That line structurally includes Form 1041 (estates and trusts), Form 1042 (nonresident-alien withholding), cash-vs-accrual timing effects, and refund-netting. TaxCalc iitax is an accrued-liability measure on the 1040 universe only (c09200 − refund in calcfunctions.py).
  • CBO "Payroll Taxes" = OASDI + HI (Medicare Part A) + Unemployment Insurance + federal employees' retirement + Railroad Retirement, both employer and employee shares, including SECA. TaxCalc payrolltax = ptax_was + extra_payrolltax — FICA on wages plus SECA only, with no UI, federal employee retirement, or Railroad Retirement.

Observed gaps, calendar-year 2022

Converting fiscal-year CBO figures to calendar year using FY22 + 0.25·(FY23 − FY22) (the same FY→CY interpolation pattern used in the skipped test_tax_revenue.py). The TMD side uses the formulas from that test: for itax, iitax + refund on PUF records (the + refund adds back the refundable-credit payout portion to match CBO's treatment of those as outlays rather than negative revenue); for ptax, payrolltax on all records.

Aggregate TMD CY2022 CBO CY2022 (pre-OBBBA interp) Δ
iitax + refund (PUF records, weighted) $2,253.9 B $2,605.3 B −13.5%
payrolltax (all records, weighted) $1,342.8 B $1,503.2 B −10.7%

What plausibly accounts for the 13.5% individual-income-tax gap

The TMD side is lower by roughly $350 B in 2022. No single factor dominates; the gap is a combination of items that are in CBO but not in the TaxCalc 1040 universe:

Component Approximate magnitude
Form 1041 (estates and trusts income tax) — in CBO, not in TaxCalc iitax $30–40 B
Form 1042 / NRA withholding, net of refunds and treaty/credit offsets $50–100 B
Late assessments, audit collections, penalties, interest on 1040 accounts $10–30 B
Cash-vs-accrual timing, unusually large in 2022 from ARPA CTC reconciliation and pandemic-era processing backlogs $50–100 B
Treasury MTS residual methodology (individual income tax derived as residual after SS / Medicare estimates are subtracted from combined deposits) indeterminate, nonzero
Total plausibly accounted for $140–270 B
Unexplained residual $80–210 B

We can effectively rule out one hypothesis: "withholding from people who never file for refunds." The IRS reports only $1–2 B per year of unclaimed refunds, far too small to contribute meaningfully. Cash-vs-accrual timing would normally average out over time; 2022 is an unusual year because of ARPA CTC advance-payment reconciliation and pandemic-era processing backlogs.

Cross-check against SOI

The same TMD iitax ($2,147.4 B) matches SOI tottax ($2,139.9 B) to within 0.35% — see the SOI sanity-check PR in the umbrella for detail. SOI and TaxCalc agree closely on 1040-universe individual income tax liability. CBO disagrees with both SOI and TaxCalc by 13–17% in 2022, which is evidence that the CBO-vs-TMD gap is primarily a CBO-vs-SOI definitional gap, not a TMD modeling problem.

Four options for how to build the future-year test

We don't have a strong view and would appreciate your read before committing to an approach.

  1. Growth-rate comparison. Check (TMD_future / TMD_2022) against (CBO_future / CBO_2022) for each aggregate. Taking ratios cancels the level gap (it appears in both numerator and denominator) and directly tests the uprating path — which is the thing we actually want to validate. A tight tolerance (3–5%) becomes possible. Cleanest option, but assumes the definitional gap stays roughly constant over time.

  2. Level comparison with wide tolerance. Accept the level gap and use a ~25% tolerance. Simpler, but less useful diagnostically — a failure could not tell us whether the growfactor is wrong or the base year drifted.

  3. Narrower CBO / JCT / Treasury publication. Is there a breakdown that separates 1040-only individual income tax (excluding 1041, 1042, NRA withholding, and refund netting)? Or a payroll-tax subset that excludes UI, Railroad Retirement, and federal employee retirement? If yes, level comparison with a tight tolerance becomes viable. (We did not find one, but you may know the literature better.)

  4. Restrict the test to population as the primary future-year check. tmd/storage/input/cbo26_population.yaml extends to 2075 and is cleanly comparable to CBO (no definition issues — 0% gap in 2022). Keep itax and payrolltax out of the test entirely until there is a clean comparator.

Proposed anchor years

Once the comparability question is settled: FY2026 (near-term, post-OBBBA effective) and FY2034 (last year with a published CBO figure), possibly a midpoint like FY2030. Same FY → CY interpolation pattern as the existing test_tax_revenue.py.

This replaces the multi-year 2023–2033 sweep in the skipped test_tax_revenue, narrowing to a small number of defensible anchor years.

Requests

  • Approach for itax / payrolltax: which of the four options above do you prefer? Is there an option 5 we missed?
  • Anchor years: 2-year (FY26, FY34), or 3-year (+ FY30)?
  • Tolerance: dependent on the approach answer — probably 3–5% under option 1, 25% under option 2, tight under option 3, n/a under option 4.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions