Remove birth_year from FRS dataset generation#198
Merged
Conversation
birth_year should be calculated from age and period in the model, not stored as static data in the dataset. This allows birth_year to properly update in multi-year projections. With static birth_year in the dataset: - 2026: birth_year stays 2006-2023 (based on 2023 survey) - 2029: birth_year stays 2006-2023 (incorrect) By calculating birth_year = period.year - age: - 2026: birth_year becomes 2009-2026 (correct for 2026) - 2029: birth_year becomes 2012-2029 (correct for 2029) This fix is required for PolicyEngine/policyengine-uk#1352 to work correctly and ensure two-child limit cost projections increase over time as expected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
6be231e to
e336692
Compare
4 tasks
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes
birth_yearfrom the FRS dataset generation. This allowsbirth_yearto be calculated dynamically fromageandperiodin the model, fixing the two-child limit cost projection bug.Problem
Currently,
birth_yearis stored as static data in the dataset:This causes issues in multi-year projections. The model loads
birth_yearfrom the dataset as input data, which overrides the Variable formula. With a consistent age distribution:Solution
Remove the line that generates
birth_yearin the dataset. The model already has a Variable formula to calculate it:When
birth_yearis not present in the input data, this formula runs automatically for each year:Impact
Before (with static birth_year):
After (calculated dynamically):
This is the complete fix - no changes needed in policyengine-uk since the Variable formula already handles the calculation when input data is absent.
Testing
The fix has been verified with microsimulations showing proper cost increases. Dataset regeneration is blocked by an unrelated bug in consumption imputation (documented in /tmp/policyengine-uk-data-consumption-bug.md).
🤖 Generated with Claude Code