Skip to content

Commit 83cdc3f

Browse files
baogorekclaude
andcommitted
Add name-based seeding, state-specific Medicaid, SSI and WIC variables
Replace shared RNG (seed=100) with per-variable name-based seeding using _stable_string_hash for order-independent reproducibility. Add state-specific Medicaid takeup rates (53%-99%), SSI resource test pass rate, and WIC takeup/nutritional risk draw variables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4b70d64 commit 83cdc3f

7 files changed

Lines changed: 242 additions & 92 deletions

File tree

changelog_entry.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,11 @@
11
- bump: minor
22
changes:
33
added:
4-
- Move all randomness to data package for deterministic country package. Take-up decisions for SNAP, Medicaid, ACA, EITC, DC PTC, Head Start, and Early Head Start are now generated stochastically during dataset creation using take-up rates from YAML parameter files.
4+
- Name-based seeding (seeded_rng) for order-independent reproducibility
5+
- State-specific Medicaid takeup rates (53%-99% range, 51 jurisdictions)
6+
- SSI resource test pass rate parameter (0.4)
7+
- WIC takeup and nutritional risk draw variables (float)
8+
- meets_ssi_resource_test boolean generation
9+
changed:
10+
- Replaced shared RNG (seed=100) with per-variable name-based seeding
11+
- Medicaid takeup now uses state-specific rates instead of uniform 93%

policyengine_us_data/datasets/cps/cps.py

Lines changed: 46 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from microimpute.models.qrf import QRF
1616
import logging
1717
from policyengine_us_data.parameters import load_take_up_rate
18-
18+
from policyengine_us_data.utils.randomness import seeded_rng
1919

2020
test_lite = os.environ.get("TEST_LITE") == "true"
2121
print(f"TEST_LITE == {test_lite}")
@@ -205,24 +205,25 @@ def add_rent(self, cps: h5py.File, person: DataFrame, household: DataFrame):
205205
def add_takeup(self):
206206
data = self.load_dataset()
207207

208-
from policyengine_us import system, Microsimulation
208+
from policyengine_us import Microsimulation
209209

210210
baseline = Microsimulation(dataset=self)
211211

212-
# Generate all stochastic take-up decisions using take-up rates from parameter files
213-
# This keeps the country package purely deterministic
214-
generator = np.random.default_rng(seed=100)
212+
n_persons = len(data["person_id"])
213+
n_tax_units = len(data["tax_unit_id"])
214+
n_spm_units = len(data["spm_unit_id"])
215215

216-
# Load take-up rates from parameter files
216+
# Load take-up rates
217217
eitc_rates_by_children = load_take_up_rate("eitc", self.time_period)
218218
dc_ptc_rate = load_take_up_rate("dc_ptc", self.time_period)
219219
snap_rate = load_take_up_rate("snap", self.time_period)
220220
aca_rate = load_take_up_rate("aca", self.time_period)
221-
medicaid_rate = load_take_up_rate("medicaid", self.time_period)
221+
medicaid_rates_by_state = load_take_up_rate("medicaid", self.time_period)
222222
head_start_rate = load_take_up_rate("head_start", self.time_period)
223223
early_head_start_rate = load_take_up_rate(
224224
"early_head_start", self.time_period
225225
)
226+
ssi_pass_rate = load_take_up_rate("ssi_pass_rate", self.time_period)
226227

227228
# EITC: varies by number of children
228229
eitc_child_count = baseline.calculate("eitc_child_count").values
@@ -232,38 +233,60 @@ def add_takeup(self):
232233
for c in eitc_child_count
233234
]
234235
)
235-
data["takes_up_eitc"] = (
236-
generator.random(len(data["tax_unit_id"])) < eitc_takeup_rate
237-
)
236+
rng = seeded_rng("takes_up_eitc")
237+
data["takes_up_eitc"] = rng.random(n_tax_units) < eitc_takeup_rate
238238

239239
# DC Property Tax Credit
240-
data["takes_up_dc_ptc"] = (
241-
generator.random(len(data["tax_unit_id"])) < dc_ptc_rate
242-
)
240+
rng = seeded_rng("takes_up_dc_ptc")
241+
data["takes_up_dc_ptc"] = rng.random(n_tax_units) < dc_ptc_rate
243242

244243
# SNAP
245-
data["takes_up_snap_if_eligible"] = (
246-
generator.random(len(data["spm_unit_id"])) < snap_rate
247-
)
244+
rng = seeded_rng("takes_up_snap_if_eligible")
245+
data["takes_up_snap_if_eligible"] = rng.random(n_spm_units) < snap_rate
248246

249247
# ACA
250-
data["takes_up_aca_if_eligible"] = (
251-
generator.random(len(data["tax_unit_id"])) < aca_rate
252-
)
248+
rng = seeded_rng("takes_up_aca_if_eligible")
249+
data["takes_up_aca_if_eligible"] = rng.random(n_tax_units) < aca_rate
253250

254-
# Medicaid
251+
# Medicaid: state-specific rates
252+
state_codes = baseline.calculate("state_code_str").values
253+
hh_ids = data["household_id"]
254+
person_hh_ids = data["person_household_id"]
255+
hh_to_state = dict(zip(hh_ids, state_codes))
256+
person_states = np.array(
257+
[hh_to_state.get(hh_id, "CA") for hh_id in person_hh_ids]
258+
)
259+
medicaid_rate_by_person = np.array(
260+
[medicaid_rates_by_state.get(s, 0.93) for s in person_states]
261+
)
262+
rng = seeded_rng("takes_up_medicaid_if_eligible")
255263
data["takes_up_medicaid_if_eligible"] = (
256-
generator.random(len(data["person_id"])) < medicaid_rate
264+
rng.random(n_persons) < medicaid_rate_by_person
257265
)
258266

259267
# Head Start
268+
rng = seeded_rng("takes_up_head_start_if_eligible")
260269
data["takes_up_head_start_if_eligible"] = (
261-
generator.random(len(data["person_id"])) < head_start_rate
270+
rng.random(n_persons) < head_start_rate
262271
)
263272

264273
# Early Head Start
274+
rng = seeded_rng("takes_up_early_head_start_if_eligible")
265275
data["takes_up_early_head_start_if_eligible"] = (
266-
generator.random(len(data["person_id"])) < early_head_start_rate
276+
rng.random(n_persons) < early_head_start_rate
277+
)
278+
279+
# SSI resource test
280+
rng = seeded_rng("meets_ssi_resource_test")
281+
data["meets_ssi_resource_test"] = rng.random(n_persons) < ssi_pass_rate
282+
283+
# WIC draws (country package compares against category-specific rates)
284+
rng = seeded_rng("wic_takeup_draw")
285+
data["wic_takeup_draw"] = rng.random(n_persons).astype(np.float32)
286+
287+
rng = seeded_rng("wic_nutritional_risk_draw")
288+
data["wic_nutritional_risk_draw"] = rng.random(n_persons).astype(
289+
np.float32
267290
)
268291

269292
self.save_dataset(data)

policyengine_us_data/parameters/__init__.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,36 +11,38 @@
1111
PARAMETERS_DIR = Path(__file__).parent
1212

1313

14-
def load_take_up_rate(variable_name: str, year: int = 2018) -> float:
14+
def load_take_up_rate(variable_name: str, year: int = 2018):
1515
"""Load take-up rate from YAML parameter files.
1616
1717
Args:
1818
variable_name: Name of the take-up parameter file (without .yaml)
1919
year: Year for which to get the rate
2020
2121
Returns:
22-
Take-up rate as a float between 0 and 1
22+
float, dict (EITC rates_by_children), or dict (Medicaid
23+
rates_by_state)
2324
"""
2425
yaml_path = PARAMETERS_DIR / "take_up" / f"{variable_name}.yaml"
2526

2627
with open(yaml_path) as f:
2728
data = yaml.safe_load(f)
2829

29-
# Handle EITC special case (has rates_by_children instead of values)
30+
# EITC: rates by number of children
3031
if "rates_by_children" in data:
31-
return data["rates_by_children"] # Return the dict
32+
return data["rates_by_children"]
3233

33-
# Find the applicable value for the year
34+
# Medicaid: state-specific rates
35+
if "rates_by_state" in data:
36+
return data["rates_by_state"]
37+
38+
# Standard time-series values
3439
values = data["values"]
3540
applicable_value = None
3641

3742
for date_key, value in sorted(values.items()):
38-
# Handle both string and datetime.date objects from YAML
3943
if hasattr(date_key, "year"):
40-
# It's a datetime.date object
4144
date_year = date_key.year
4245
else:
43-
# It's a string
4446
date_year = int(date_key.split("-")[0])
4547

4648
if date_year <= year:

policyengine_us_data/parameters/take_up/medicaid.yaml

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,62 @@ metadata:
33
label: Medicaid takeup rate
44
unit: /1
55
period: year
6+
breakdown:
7+
- state_code
68
reference:
79
- title: KFF "A Closer Look at the Remaining Uninsured Population Eligible for Medicaid and CHIP"
8-
href: https://www.kff.org/uninsured/issue-brief/a-closer-look-at-the-remaining-uninsured-population-eligible-for-medicaid-and-chip/#:~:text=the%20uninsured%20rate%20dropped%20to,States%20began%20the
9-
values:
10-
2018-01-01: 0.93
10+
href: https://www.kff.org/uninsured/issue-brief/a-closer-look-at-the-remaining-uninsured-population-eligible-for-medicaid-and-chip/
11+
- title: State-specific rates derived from MACPAC enrollment targets vs modeled eligibility
12+
href: https://www.medicaid.gov/medicaid/program-information/medicaid-and-chip-enrollment-data/report-highlights/index.html
13+
rates_by_state:
14+
AK: 0.88
15+
AL: 0.92
16+
AR: 0.79
17+
AZ: 0.95
18+
CA: 0.78
19+
CO: 0.99
20+
CT: 0.89
21+
DC: 0.99
22+
DE: 0.86
23+
FL: 0.98
24+
GA: 0.73
25+
HI: 0.88
26+
IA: 0.84
27+
ID: 0.78
28+
IL: 0.85
29+
IN: 0.99
30+
KS: 0.92
31+
KY: 0.87
32+
LA: 0.79
33+
MA: 0.94
34+
MD: 0.95
35+
ME: 0.92
36+
MI: 0.91
37+
MN: 0.89
38+
MO: 0.89
39+
MS: 0.75
40+
MT: 0.83
41+
NC: 0.94
42+
ND: 0.91
43+
NE: 0.79
44+
NH: 0.84
45+
NJ: 0.74
46+
NM: 0.84
47+
NV: 0.93
48+
NY: 0.86
49+
OH: 0.82
50+
OK: 0.77
51+
OR: 0.92
52+
PA: 0.64
53+
RI: 0.94
54+
SC: 0.93
55+
SD: 0.88
56+
TN: 0.92
57+
TX: 0.76
58+
UT: 0.53
59+
VA: 0.82
60+
VT: 0.93
61+
WA: 0.98
62+
WI: 0.91
63+
WV: 0.83
64+
WY: 0.70
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
description: Proportion of SSI-aged-blind-disabled recipients who meet the asset test.
2+
metadata:
3+
label: SSI resource test pass rate
4+
unit: /1
5+
period: year
6+
reference:
7+
- title: SSI resource test pass rate from policyengine-us
8+
href: https://github.com/PolicyEngine/policyengine-us
9+
values:
10+
2018-01-01: 0.4

0 commit comments

Comments
 (0)