Skip to content

Commit b7f4f85

Browse files
MaxGhenisclaude
authored andcommitted
Add voluntary tax filer assignment (#513)
* Add voluntary tax filer assignment SOI data shows many low-AGI filers who file taxes voluntarily even when not required and not receiving a refund. This affects calibration accuracy when comparing CPS-based filer counts to SOI totals. Add would_file_taxes_voluntarily variable at tax_unit level with ~5% probability, using seeded RNG for reproducibility. This enables policyengine-us to incorporate voluntary filing behavior in its tax_unit_is_filer variable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Improve voluntary tax filer logic with refund-seeking behavior Replace simple 5% voluntary filing rate with more nuanced approach: 1. Add would_file_for_refund variable that identifies tax units taking up EITC (95% of EITC takers are assumed to know they'll get a refund) 2. Apply voluntary filing rate (3%) only to those NOT already filing for a refund, to avoid double-counting This better models the actual filing decision process where refundable credit recipients have a clear financial incentive to file, while others may file for state requirements, documentation, or habit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Simplify voluntary filer logic and add filer count calibration targets Voluntary filer changes (cps.py): - Remove redundant would_file_for_refund variable since takes_up_eitc already captures refund-seeking behavior - Simplify to single would_file_taxes_voluntarily variable that applies only to tax units NOT taking up EITC - Use 5% voluntary filing rate for non-EITC takers Calibration target changes (loss.py): - Add SOI Table 1.1 filer counts by AGI band as calibration targets - Covers 7 bands: <$0, $0-5k, $5k-10k, $10k-25k, $25k-50k, $50k-100k, $100k+ - Includes all filers (not just taxable returns) to properly calibrate low-income filer counts which are important for distribution accuracy - Uprates 2015 SOI counts to current year using population growth This consolidates PR #514 into PR #513. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add changelog entry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 749d985 commit b7f4f85

3 files changed

Lines changed: 48 additions & 0 deletions

File tree

changelog_entry.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- bump: minor
2+
changes:
3+
added:
4+
- Add voluntary tax filer variable and filer count calibration targets by AGI band.

policyengine_us_data/datasets/cps/cps.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,17 @@ def add_takeup(self):
289289
imputed_risk = rng.random(n_persons) < wic_risk_rate_by_person
290290
data["is_wic_at_nutritional_risk"] = receives_wic | imputed_risk
291291

292+
# Voluntary tax filing: some people file even when not required and not
293+
# seeking a refund. EITC take-up already captures refund-seeking behavior
294+
# (if you take up EITC, you file). This variable captures people who file
295+
# for other reasons: state requirements, documentation, habit.
296+
# ~5% of tax units who don't take up EITC still file voluntarily.
297+
voluntary_filing_rate = 0.05
298+
rng = seeded_rng("would_file_taxes_voluntarily")
299+
data["would_file_taxes_voluntarily"] = ~data["takes_up_eitc"] & (
300+
rng.random(n_tax_units) < voluntary_filing_rate
301+
)
302+
292303
self.save_dataset(data)
293304

294305

policyengine_us_data/utils/loss.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,39 @@ def build_loss_matrix(dataset: type, time_period):
339339
)
340340
targets_array.append(row["eitc_total"] * eitc_spending_uprating)
341341

342+
# Tax filer counts by AGI band (SOI Table 1.1)
343+
# This calibrates total filers (not just taxable returns) including
344+
# low-AGI filers who are important for income distribution accuracy
345+
SOI_FILER_COUNTS_2015 = {
346+
# (agi_lower, agi_upper): total_returns
347+
(-np.inf, 0): 2_072_066,
348+
(0, 5_000): 10_134_703,
349+
(5_000, 10_000): 11_398_595,
350+
(10_000, 25_000): 23_447_927,
351+
(25_000, 50_000): 23_727_745,
352+
(50_000, 100_000): 32_801_908,
353+
(100_000, np.inf): 25_120_985,
354+
}
355+
356+
# Get AGI and filer status at tax unit level, mapped to household
357+
agi_tu = sim.calculate("adjusted_gross_income").values
358+
is_filer_tu = sim.calculate("tax_unit_is_filer").values > 0
359+
360+
for (
361+
agi_lower,
362+
agi_upper,
363+
), filer_count_2015 in SOI_FILER_COUNTS_2015.items():
364+
in_band = (agi_tu >= agi_lower) & (agi_tu < agi_upper)
365+
label = f"nation/soi/filer_count/agi_{fmt(agi_lower)}_{fmt(agi_upper)}"
366+
loss_matrix[label] = sim.map_result(
367+
(is_filer_tu & in_band).astype(float),
368+
"tax_unit",
369+
"household",
370+
)
371+
# Uprate from 2015 to current year using population growth
372+
uprated_target = filer_count_2015 * population_uprating
373+
targets_array.append(uprated_target)
374+
342375
# Hard-coded totals
343376
for variable_name, target in HARD_CODED_TOTALS.items():
344377
label = f"nation/census/{variable_name}"

0 commit comments

Comments
 (0)