Generate random seeds in dataset for reproducible stochastic simulations by MaxGhenis · Pull Request #202 · PolicyEngine/policyengine-uk-data

MaxGhenis · 2025-10-05T14:14:12Z

Summary

This PR moves random number generation from policyengine-uk into the dataset generation, following the pattern established in policyengine-us-data.

⚠️ MERGE ORDER: This PR must be merged BEFORE the companion policyengine-uk PR

Changes

Add random seed generation in FRS dataset for person, benunit, and household entities
Update SPI dataset to use seeded generator for age assignment
Update income imputation to use seeded generator for age assignment
Update capital gains imputation to use seeded generator for quantile sampling
Update childcare assumptions to use seeded generator

All random generation now uses np.random.default_rng(seed=100) for full reproducibility across dataset builds.

This replaces the previous approach of calculating random variables directly during simulation and instead stores random seeds in the dataset that can be used by variables in policyengine-uk.

Related PRs

policyengine-uk: [Will be created next]

Test Plan

FRS dataset generation completes successfully
Random seeds are generated for all three entity types
Seeds are reproducible across builds with same seed value
Companion policyengine-uk PR passes all tests after this is merged

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This change moves random number generation from policyengine-uk into the dataset generation, following the pattern established in policyengine-us-data. Changes: - Add random seed generation in FRS dataset for person, benunit, and household entities - Update SPI dataset to use seeded generator for age assignment - Update income imputation to use seeded generator for age assignment - Update capital gains imputation to use seeded generator for quantile sampling - Update childcare assumptions to use seeded generator All random generation now uses np.random.default_rng(seed=100) for full reproducibility across dataset builds. This replaces the previous approach of calculating random variables directly during simulation and instead stores random seeds in the dataset that can be used by variables in policyengine-uk. Related: policyengine-uk PR (must be merged after this)

MaxGhenis closed this Oct 5, 2025

MaxGhenis deleted the migrate-random-to-data branch October 5, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generate random seeds in dataset for reproducible stochastic simulations#202

Generate random seeds in dataset for reproducible stochastic simulations#202
MaxGhenis wants to merge 1 commit into
mainfrom
migrate-random-to-data

MaxGhenis commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MaxGhenis commented Oct 5, 2025

Summary

Changes

Related PRs

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant