Add microsimulation smoke test against unpinned latest data#1617
Merged
Conversation
Catches silent model/data skew at the point the enhanced FRS dataset is republished on HuggingFace, not after a release. Exercises whatever is currently on HF `main` (unlike `conftest.py` which pins to an older version) and asserts plausibility bounds on: - UK weighted population and household/benunit counts - `is_parent` weighted population (>10M) — catches the defaulting-to- zero failure introduced by removing the inferred formula in #1595 - Universal credit aggregate in £55-£95bn range around the OBR target - state pension / child benefit / pension credit floors - `extended_childcare_entitlement_eligible` reaching >500k benefit units Verified against the 1.45.8 stale dataset: correctly fails 3 of 5 tests (is_parent=0, UC=£51.7bn, childcare=0), passes the other two. Marked `microsimulation` so it only runs in CI when HUGGING_FACE_TOKEN is set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Catches silent model/data skew at the point the enhanced FRS dataset is republished on HuggingFace, not after a release. Exercises whatever is currently on HF
main(unlikeconftest.pywhich pins to an older version) and asserts plausibility bounds on:is_parentweighted population (>10M) — catches the defaulting-to-zero failure introduced by removing the inferred formula in Remove inferred is_parent formula #1595extended_childcare_entitlement_eligiblereaching >500k benefit unitsWhy
Between 2026-04-15 14:04 UTC (when #1595 removed the inferred
is_parentformula) and 2026-04-16 14:17 UTC (when uk-data 1.50.3 published an FRS build with anis_parentcolumn), any run ofpolicyengine-ukmain against the prior dataset silently produced:is_parent = 0everywhereextended_childcare_entitlement_eligible = 0everywhereNone of that surfaced in CI because the existing
microsimulation-marked tests pin to dataset version1.40.3viaconftest.py. This adds a parallel, unpinned test that uses whatever is on HFmain— so the next time model code and data pipelines drift apart, CI notices.Verification
Local run against fresh 1.50.3: all 5 tests pass.
Local run against the stale 1.45.8 snapshot from before the fix:
i.e. the three failure modes the April 15–16 window actually exhibited all trip.
Scope
policyengine_uk/tests/test_latest_data_smoke.py— 5 new tests, all markedmicrosimulationso they skip unlessHUGGING_FACE_TOKENis present.changelog.d/latest-data-smoke.added.md— changelog fragment.Test plan
uvx ruff format --check/uvx ruff checkcleanmain(1.50.3)Testjob passes