Skip to content

Generate random seeds in dataset for reproducible stochastic simulations#202

Closed
MaxGhenis wants to merge 1 commit into
mainfrom
migrate-random-to-data
Closed

Generate random seeds in dataset for reproducible stochastic simulations#202
MaxGhenis wants to merge 1 commit into
mainfrom
migrate-random-to-data

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Summary

This PR moves random number generation from policyengine-uk into the dataset generation, following the pattern established in policyengine-us-data.

⚠️ MERGE ORDER: This PR must be merged BEFORE the companion policyengine-uk PR

Changes

  • Add random seed generation in FRS dataset for person, benunit, and household entities
  • Update SPI dataset to use seeded generator for age assignment
  • Update income imputation to use seeded generator for age assignment
  • Update capital gains imputation to use seeded generator for quantile sampling
  • Update childcare assumptions to use seeded generator

All random generation now uses np.random.default_rng(seed=100) for full reproducibility across dataset builds.

This replaces the previous approach of calculating random variables directly during simulation and instead stores random seeds in the dataset that can be used by variables in policyengine-uk.

Related PRs

  • policyengine-uk: [Will be created next]

Test Plan

  • FRS dataset generation completes successfully
  • Random seeds are generated for all three entity types
  • Seeds are reproducible across builds with same seed value
  • Companion policyengine-uk PR passes all tests after this is merged

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This change moves random number generation from policyengine-uk into the
dataset generation, following the pattern established in policyengine-us-data.

Changes:
- Add random seed generation in FRS dataset for person, benunit, and household entities
- Update SPI dataset to use seeded generator for age assignment
- Update income imputation to use seeded generator for age assignment
- Update capital gains imputation to use seeded generator for quantile sampling
- Update childcare assumptions to use seeded generator

All random generation now uses np.random.default_rng(seed=100) for full
reproducibility across dataset builds.

This replaces the previous approach of calculating random variables directly
during simulation and instead stores random seeds in the dataset that can
be used by variables in policyengine-uk.

Related: policyengine-uk PR (must be merged after this)
@MaxGhenis MaxGhenis closed this Oct 5, 2025
@MaxGhenis MaxGhenis deleted the migrate-random-to-data branch October 5, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant