Design discussion: future-year revenue sanity check — how to align tax-calculator `iitax` / `payrolltax` with a CBO or Treasury comparator


## Context

This issue was broken out of the skipped-tests umbrella issue (#501) so the design question can be discussed on its own, without crowding the three concrete PRs that the umbrella proposes (fingerprint, SOI sanity checks, cleanup).

`@martinholmer` — would welcome your thoughts on this. If you want me to make a concrete proposal first, please let me know.

## Intent

We want a running test that weighted total individual income tax (`iitax`) and weighted total payroll tax (`payrolltax`) on TMD data are reasonable at a small number of future years. This is the one kind of check that exercises the growfactor / uprating path for more than one year out. The running `tests/test_tax_expenditures.py` already exercises uprating for 2022 → 2023 (tax-expenditure estimates compared against committed reference values), so single-step uprating is covered; what is **not** covered today is multi-year or far-future uprating — which is precisely where growfactors and post-OBBBA policy effects matter most.

A post-OBBBA CBO source is available: the 2026-02-01 Winter baseline in [`US-CBO/eval-projections/input_data/baselines.csv`](https://github.com/US-CBO/eval-projections/blob/main/input_data/baselines.csv). In that vintage, individual income tax is FY26 = \$2,751.291 B and FY33 = \$3,743.854 B — about 3% lower than the pre-OBBBA values currently shipped in `tests/expected_itax_rev_2022_data.yaml`.

Before writing this test we need to figure out what to compare against and how strict a tolerance is defensible — which is the subject of this issue.

## The problem: tax-calculator and CBO don't measure the same thing

When we set up a 2022 baseline comparison to check how well the two sides agree before we project forward, we hit a roughly 13% gap on individual income tax and about 11% on payroll tax. Neither is a TMD bug — both come from differences in what each side is measuring:

- **CBO "Individual Income Taxes"** is Treasury cash receipts on Monthly Treasury Statement (MTS) methodology — cash basis, net of refunds, on a combined unified basis. Per MTS notes, individual income tax is derived as the residual of the combined payment after SS / Medicare estimates are deducted from combined FICA+IIT Treasury deposits. That line structurally includes Form 1041 (estates and trusts), Form 1042 (nonresident-alien withholding), cash-vs-accrual timing effects, and refund-netting. TaxCalc `iitax` is an accrued-liability measure on the 1040 universe only (`c09200 − refund` in `calcfunctions.py`).
- **CBO "Payroll Taxes"** = OASDI + HI (Medicare Part A) + Unemployment Insurance + federal employees' retirement + Railroad Retirement, both employer and employee shares, including SECA. TaxCalc `payrolltax` = `ptax_was + extra_payrolltax` — FICA on wages plus SECA only, with no UI, federal employee retirement, or Railroad Retirement.

### Observed gaps, calendar-year 2022

Converting fiscal-year CBO figures to calendar year using `FY22 + 0.25·(FY23 − FY22)` (the same FY→CY interpolation pattern used in the skipped `test_tax_revenue.py`). The TMD side uses the formulas from that test: for itax, `iitax + refund` on PUF records (the `+ refund` adds back the refundable-credit payout portion to match CBO's treatment of those as outlays rather than negative revenue); for ptax, `payrolltax` on all records.

| Aggregate                                              | TMD CY2022  | CBO CY2022 (pre-OBBBA interp) | Δ          |
|--------------------------------------------------------|-------------|-------------------------------|------------|
| `iitax + refund` (PUF records, weighted)               | \$2,253.9 B | \$2,605.3 B                   | **−13.5%** |
| `payrolltax` (all records, weighted)                   | \$1,342.8 B | \$1,503.2 B                   | **−10.7%** |

### What plausibly accounts for the 13.5% individual-income-tax gap

The TMD side is lower by roughly \$350 B in 2022. No single factor dominates; the gap is a combination of items that are in CBO but not in the TaxCalc 1040 universe:

| Component                                                                 | Approximate magnitude |
|---------------------------------------------------------------------------|-----------------------|
| Form 1041 (estates and trusts income tax) — in CBO, not in TaxCalc `iitax` | \$30–40 B             |
| Form 1042 / NRA withholding, net of refunds and treaty/credit offsets      | \$50–100 B            |
| Late assessments, audit collections, penalties, interest on 1040 accounts | \$10–30 B             |
| Cash-vs-accrual timing, unusually large in 2022 from ARPA CTC reconciliation and pandemic-era processing backlogs | \$50–100 B |
| Treasury MTS residual methodology (individual income tax derived as residual after SS / Medicare estimates are subtracted from combined deposits) | indeterminate, nonzero |
| **Total plausibly accounted for**                                         | **\$140–270 B**       |
| **Unexplained residual**                                                  | \$80–210 B            |

We can effectively rule out one hypothesis: "withholding from people who never file for refunds." The IRS reports only \$1–2 B per year of unclaimed refunds, far too small to contribute meaningfully. Cash-vs-accrual timing would normally average out over time; 2022 is an unusual year because of ARPA CTC advance-payment reconciliation and pandemic-era processing backlogs.

### Cross-check against SOI

The same TMD `iitax` (\$2,147.4 B) matches SOI `tottax` (\$2,139.9 B) to within 0.35% — see the SOI sanity-check PR in the umbrella for detail. SOI and TaxCalc agree closely on 1040-universe individual income tax liability. CBO disagrees with both SOI and TaxCalc by 13–17% in 2022, which is evidence that **the CBO-vs-TMD gap is primarily a CBO-vs-SOI definitional gap, not a TMD modeling problem**.

## Four options for how to build the future-year test

We don't have a strong view and would appreciate your read before committing to an approach.

1. **Growth-rate comparison.** Check `(TMD_future / TMD_2022)` against `(CBO_future / CBO_2022)` for each aggregate. Taking ratios cancels the level gap (it appears in both numerator and denominator) and directly tests the uprating path — which is the thing we actually want to validate. A tight tolerance (3–5%) becomes possible. Cleanest option, but assumes the definitional gap stays roughly constant over time.

2. **Level comparison with wide tolerance.** Accept the level gap and use a ~25% tolerance. Simpler, but less useful diagnostically — a failure could not tell us whether the growfactor is wrong or the base year drifted.

3. **Narrower CBO / JCT / Treasury publication.** Is there a breakdown that separates 1040-only individual income tax (excluding 1041, 1042, NRA withholding, and refund netting)? Or a payroll-tax subset that excludes UI, Railroad Retirement, and federal employee retirement? If yes, level comparison with a tight tolerance becomes viable. (We did not find one, but you may know the literature better.)

4. **Restrict the test to population as the primary future-year check.** `tmd/storage/input/cbo26_population.yaml` extends to 2075 and is cleanly comparable to CBO (no definition issues — 0% gap in 2022). Keep itax and payrolltax out of the test entirely until there is a clean comparator.

## Proposed anchor years

Once the comparability question is settled: **FY2026** (near-term, post-OBBBA effective) and **FY2034** (last year with a published CBO figure), possibly a midpoint like **FY2030**. Same FY → CY interpolation pattern as the existing `test_tax_revenue.py`.

This replaces the multi-year 2023–2033 sweep in the skipped `test_tax_revenue`, narrowing to a small number of defensible anchor years.

## Requests

- **Approach for itax / payrolltax**: which of the four options above do you prefer? Is there an option 5 we missed?
- **Anchor years**: 2-year (FY26, FY34), or 3-year (+ FY30)?
- **Tolerance**: dependent on the approach answer — probably 3–5% under option 1, 25% under option 2, tight under option 3, n/a under option 4.

## Related

- #501 — parent umbrella issue (skipped-tests plan) — to be linked when posted
- #430 — original tracking issue enumerating the six skipped tests (this issue covers item #1 of that list, `test_tax_revenue`)
- [`US-CBO/eval-projections/input_data/baselines.csv`](https://github.com/US-CBO/eval-projections/blob/main/input_data/baselines.csv) — post-OBBBA CBO baseline source


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design discussion: future-year revenue sanity check — how to align tax-calculator `iitax` / `payrolltax` with a CBO or Treasury comparator #502

Context

Intent

The problem: tax-calculator and CBO don't measure the same thing

Observed gaps, calendar-year 2022

What plausibly accounts for the 13.5% individual-income-tax gap

Cross-check against SOI

Four options for how to build the future-year test

Proposed anchor years

Requests

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aggregate	TMD CY2022	CBO CY2022 (pre-OBBBA interp)	Δ
`iitax + refund` (PUF records, weighted)	$2,253.9 B	$2,605.3 B	−13.5%
`payrolltax` (all records, weighted)	$1,342.8 B	$1,503.2 B	−10.7%

Component	Approximate magnitude
Form 1041 (estates and trusts income tax) — in CBO, not in TaxCalc `iitax`	$30–40 B
Form 1042 / NRA withholding, net of refunds and treaty/credit offsets	$50–100 B
Late assessments, audit collections, penalties, interest on 1040 accounts	$10–30 B
Cash-vs-accrual timing, unusually large in 2022 from ARPA CTC reconciliation and pandemic-era processing backlogs	$50–100 B
Treasury MTS residual methodology (individual income tax derived as residual after SS / Medicare estimates are subtracted from combined deposits)	indeterminate, nonzero
Total plausibly accounted for	$140–270 B
Unexplained residual	$80–210 B

Design discussion: future-year revenue sanity check — how to align tax-calculator iitax / payrolltax with a CBO or Treasury comparator #502

Description

Context

Intent

The problem: tax-calculator and CBO don't measure the same thing

Observed gaps, calendar-year 2022

What plausibly accounts for the 13.5% individual-income-tax gap

Cross-check against SOI

Four options for how to build the future-year test

Proposed anchor years

Requests

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Design discussion: future-year revenue sanity check — how to align tax-calculator `iitax` / `payrolltax` with a CBO or Treasury comparator #502