Make country package purely deterministic - read stochastic variables from dataset#6635
Closed
MaxGhenis wants to merge 8 commits into
Closed
Make country package purely deterministic - read stochastic variables from dataset#6635MaxGhenis wants to merge 8 commits into
MaxGhenis wants to merge 8 commits into
Conversation
… from dataset This change removes all random number generation from policyengine-us. All stochastic take-up variables are now generated in policyengine-us-data and read from the dataset. The country package is now a purely deterministic rules engine. ## Key Changes ### Removed - All take-up seed variables (snap_take_up_seed, aca_take_up_seed, medicaid_take_up_seed) - All take-up rate parameters (moved to policyengine-us-data) ### Simplified All takes_up_* variables now use dataset values with deterministic fallbacks: - takes_up_snap_if_eligible (default: True) - takes_up_aca_if_eligible (default: True) - takes_up_medicaid_if_eligible (default: True) ## Trade-offs **IMPORTANT**: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic. To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values in policyengine-us-data. Related: policyengine-us-data PR (must be merged FIRST) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Collaborator
|
I think the Mass branch got merged into this PR @MaxGhenis |
- Create takes_up_head_start_if_eligible and takes_up_early_head_start_if_eligible - Update head_start and early_head_start to use takeup in microsimulation - Add unit=USD and simplify labels to match conventions - Takeup is generated stochastically in dataset, defaults to True in policy calculator
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6635 +/- ##
============================================
- Coverage 100.00% 71.42% -28.58%
============================================
Files 16 7 -9
Lines 229 84 -145
Branches 0 2 +2
============================================
- Hits 229 60 -169
- Misses 0 24 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Changed np.any(programs) to programs > 0 to preserve array structure. The np.any() call was collapsing the entire array into a single boolean, causing all people to be categorically eligible if ANY tax unit qualified. This manifested when using axes - eligibility showed True at all income levels even when income_eligible was correctly False at high incomes. Fixes the issue where Early Head Start benefits were incorrectly given to high-income households (e.g., $200k) in vectorized calculations.
The vectorization fix is now in its own PR (PolicyEngine#6804) to keep the takeup migration PR focused on moving randomness to the data package.
These tests tested the old formula-based takeup using seed variables. In the new design, takeup is generated in the dataset (policyengine-us-data) and the variables have no formula (just default_value = True). Removed: - takes_up_snap_if_eligible.yaml - takes_up_medicaid_if_eligible.yaml - takes_up_aca_if_eligible.yaml The stochastic behavior is now tested in the data package, not the rules engine.
4 tasks
baogorek
added a commit
that referenced
this pull request
Feb 5, 2026
…try package Remove all random() calls and seed variables from the country package. Takeup variables (ACA, SNAP, Medicaid) are now formula-less with default True. WIC uses draw variables instead of random(). SSI resource test uses only policy logic. Add state-specific Medicaid rates, Section 1931 deprivation rules, Head Start/Early Head Start takeup variables. Supersedes #6635, #7317. Fixes #7316. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 tasks
Collaborator
|
Superseded by #7326, which includes all changes from this PR plus state-specific Medicaid rates, Section 1931 deprivation rules, WIC draw variables, SSI resource test, Head Start takeup, and name-based seeding. Companion data PR: PolicyEngine/policyengine-us-data#451. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR removes all random number generation from policyengine-us. All stochastic take-up variables are now generated in policyengine-us-data and read from the dataset. The country package is now a purely deterministic rules engine.
Changes
Removed
Simplified
All takes_up_* variables now use dataset values with deterministic fallbacks:
These variables have no formula - when present in the dataset, OpenFisca uses the dataset value. For policy calculator (non-microsimulation), they default to True (full take-up assumption).
Trade-offs
IMPORTANT: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic.
To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values in policyengine-us-data.
Test Plan
Related PRs
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com