Skip to content

Commit 261d78a

Browse files
authored
Add TANF administrative calibration targets (#750)
* Add TANF administrative calibration targets * Tighten TANF target semantics * Document upstream-only PR workflow * Format TANF calibration files
1 parent 9ab9c70 commit 261d78a

16 files changed

Lines changed: 528 additions & 22 deletions

File tree

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ database:
8282
python policyengine_us_data/db/etl_age.py --year $(YEAR)
8383
python policyengine_us_data/db/etl_medicaid.py --year $(YEAR)
8484
python policyengine_us_data/db/etl_snap.py --year $(YEAR)
85+
python policyengine_us_data/db/etl_tanf.py --year $(YEAR)
8586
python policyengine_us_data/db/etl_state_income_tax.py --year $(YEAR)
8687
python policyengine_us_data/db/etl_irs_soi.py --year $(YEAR)
8788
python policyengine_us_data/db/etl_pregnancy.py --year $(YEAR)

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,21 @@ which installs the development dependencies in a reference-only manner (so that
1414
to the package code will be reflected immediately); `policyengine-us-data` is a dev package
1515
and not intended for direct access.
1616

17+
## Pull Requests
18+
19+
PRs must come from branches pushed to `PolicyEngine/policyengine-us-data`, not from
20+
personal forks. The PR workflow hard-fails fork-based PRs before the real test suite
21+
runs because the required secrets are unavailable there.
22+
23+
Before opening a PR, push the current branch to the upstream repo:
24+
25+
```bash
26+
make push-pr-branch
27+
```
28+
29+
That target pushes the current branch to the `upstream` remote and sets tracking so
30+
`gh pr create` opens the PR from `PolicyEngine/policyengine-us-data`.
31+
1732
## SSA Data Sources
1833

1934
The following SSA data sources are used in this project:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Added HHS ACF TANF caseload and cash-assistance ETL targets, exposed baseline CPS liquid-asset inputs, and aligned TANF calibration totals to FY2024 administrative data.

policyengine_us_data/calibration/target_config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,13 @@ include:
5151
# REMOVED: is_pregnant — 100% unachievable across all 51 state geos
5252
- variable: snap
5353
geo_level: state
54+
- variable: tanf
55+
geo_level: state
5456
- variable: adjusted_gross_income
5557
geo_level: state
58+
- variable: spm_unit_count
59+
geo_level: state
60+
domain_variable: tanf
5661

5762
# === STATE — fine AGI bracket targets (stubs 9/10 from in55cmcsv) ===
5863
- variable: person_count
@@ -127,6 +132,9 @@ include:
127132
geo_level: national
128133
- variable: tanf
129134
geo_level: national
135+
- variable: spm_unit_count
136+
geo_level: national
137+
domain_variable: tanf
130138
- variable: tip_income
131139
geo_level: national
132140
- variable: unemployment_compensation

policyengine_us_data/datasets/cps/cps.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1534,8 +1534,6 @@ def select_random_subset_to_target(
15341534
}
15351535
)
15361536

1537-
final_counts = pd.Series(ssn_card_type).value_counts().sort_index()
1538-
15391537
# ============================================================================
15401538
# PROBABILISTIC FAMILY CORRELATION ADJUSTMENT
15411539
# ============================================================================
@@ -1559,8 +1557,6 @@ def select_random_subset_to_target(
15591557
)
15601558
print(f"Additional undocumented needed: {undocumented_needed:,.0f}")
15611559

1562-
families_adjusted = 0
1563-
15641560
if undocumented_needed > 0:
15651561
# Identify households with mixed status (code 0 + code 3 members)
15661562
mixed_household_candidates = []
@@ -1584,7 +1580,6 @@ def select_random_subset_to_target(
15841580
# Randomly select from eligible code 3 members in mixed households to hit target
15851581
if len(mixed_household_candidates) > 0:
15861582
mixed_household_candidates = np.array(mixed_household_candidates)
1587-
candidate_weights = person_weights[mixed_household_candidates]
15881583

15891584
# Use probabilistic selection to hit target
15901585
selected_indices = select_random_subset_to_target(
@@ -1596,7 +1591,6 @@ def select_random_subset_to_target(
15961591

15971592
if len(selected_indices) > 0:
15981593
ssn_card_type[selected_indices] = 0
1599-
families_adjusted = len(selected_indices)
16001594
print(
16011595
f"Selected {len(selected_indices)} people from {len(mixed_household_candidates)} candidates in mixed households"
16021596
)
@@ -1735,7 +1729,7 @@ def get_arrival_year_midpoint(peinusyr):
17351729
# Save as immigration_status_str since that's what PolicyEngine expects
17361730
cps["immigration_status_str"] = immigration_status.astype("S")
17371731
# Final population summary
1738-
print(f"\nFinal populations:")
1732+
print("\nFinal populations:")
17391733
code_to_str = {
17401734
0: "NONE", # Likely undocumented immigrants
17411735
1: "CITIZEN", # US citizens
@@ -1952,7 +1946,6 @@ def add_tips(self, cps: h5py.File):
19521946
# is_married is person-level here but policyengine-us defines it at Family
19531947
# level, so we must not save it
19541948
cps = cps.drop(columns=["is_married", "is_under_18", "is_under_6"], errors="ignore")
1955-
19561949
self.save_dataset(cps)
19571950

19581951

policyengine_us_data/db/DATABASE_GUIDE.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,11 @@ make database-refresh # Force re-download all sources and rebuild
2929
| 4 | `etl_age.py` | Census ACS 1-year | Age distribution: 18 bins x 488 geographies |
3030
| 5 | `etl_medicaid.py` | Census ACS + CMS | Medicaid enrollment (admin state-level, survey district-level) |
3131
| 6 | `etl_snap.py` | USDA FNS + Census ACS | SNAP participation (admin state-level, survey district-level) |
32-
| 7 | `etl_state_income_tax.py` | Census STC | State income tax collections (Census STC FY2023 `T40`, downloaded and cached) |
33-
| 8 | `etl_irs_soi.py` | IRS | Tax variables, EITC by child count, AGI brackets, conditional strata |
34-
| 9 | `etl_pregnancy.py` | CDC VSRR + Census ACS | Pregnancy prevalence by state (provisional birth counts) |
35-
| 10 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |
32+
| 7 | `etl_tanf.py` | HHS ACF | TANF caseload families and cash-assistance spending (FY2024) |
33+
| 8 | `etl_state_income_tax.py` | Census STC | State income tax collections (Census STC FY2023 `T40`, downloaded and cached) |
34+
| 9 | `etl_irs_soi.py` | IRS | Tax variables, EITC by child count, AGI brackets, conditional strata |
35+
| 10 | `etl_pregnancy.py` | CDC VSRR + Census ACS | Pregnancy prevalence by state (provisional birth counts) |
36+
| 11 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |
3637

3738
### Raw Input Caching
3839

policyengine_us_data/db/create_field_valid_values.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ def populate_field_valid_values(session: Session) -> None:
7575
("source", "Census ACS S2201", "survey"),
7676
("source", "Census STC", "administrative"),
7777
("source", "CDC VSRR Natality", "administrative"),
78+
("source", "HHS ACF TANF Caseload", "administrative"),
79+
("source", "HHS ACF TANF Financial", "administrative"),
7880
("source", "PolicyEngine", "hardcoded"),
7981
]
8082

policyengine_us_data/db/etl_national_targets.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -203,13 +203,6 @@ def extract_national_targets(year: int = DEFAULT_YEAR):
203203
"notes": "Housing subsidies",
204204
"year": HARDCODED_YEAR,
205205
},
206-
{
207-
"variable": "tanf",
208-
"value": 9e9,
209-
"source": "HHS/ACF",
210-
"notes": "TANF cash assistance",
211-
"year": HARDCODED_YEAR,
212-
},
213206
{
214207
"variable": "real_estate_taxes",
215208
"value": 500e9,

0 commit comments

Comments
 (0)