Add microsimulation smoke test against unpinned latest data by MaxGhenis · Pull Request #1617 · PolicyEngine/policyengine-uk

MaxGhenis · 2026-04-17T02:49:59Z

Summary

Catches silent model/data skew at the point the enhanced FRS dataset is republished on HuggingFace, not after a release. Exercises whatever is currently on HF main (unlike conftest.py which pins to an older version) and asserts plausibility bounds on:

UK weighted population and household/benunit counts
is_parent weighted population (>10M) — catches the defaulting-to-zero failure introduced by removing the inferred formula in Remove inferred is_parent formula #1595
Universal credit aggregate in £55–£95bn range around the OBR target
State pension / child benefit / pension credit floors
extended_childcare_entitlement_eligible reaching >500k benefit units

Why

Between 2026-04-15 14:04 UTC (when #1595 removed the inferred is_parent formula) and 2026-04-16 14:17 UTC (when uk-data 1.50.3 published an FRS build with an is_parent column), any run of policyengine-uk main against the prior dataset silently produced:

is_parent = 0 everywhere
extended_childcare_entitlement_eligible = 0 everywhere
Universal credit aggregate ≈ £51.7bn (vs OBR target ~£74bn), because capital-limit logic combined with stale savings imputations over-triggered disqualifications

None of that surfaced in CI because the existing microsimulation-marked tests pin to dataset version 1.40.3 via conftest.py. This adds a parallel, unpinned test that uses whatever is on HF main — so the next time model code and data pipelines drift apart, CI notices.

Verification

Local run against fresh 1.50.3: all 5 tests pass.

Local run against the stale 1.45.8 snapshot from before the fix:

PASS: test_population_totals_are_plausible
FAIL: test_is_parent_is_populated — is_parent weighted total 0 is too low
FAIL: test_universal_credit_aggregate_in_range — UC £51.7bn outside £55–£95bn
PASS: test_core_benefits_are_nonzero
FAIL: test_childcare_entitlement_populated — extended_childcare_entitlement_eligible total 0

i.e. the three failure modes the April 15–16 window actually exhibited all trip.

Scope

policyengine_uk/tests/test_latest_data_smoke.py — 5 new tests, all marked microsimulation so they skip unless HUGGING_FACE_TOKEN is present.
changelog.d/latest-data-smoke.added.md — changelog fragment.
No production code touched.

Test plan

uvx ruff format --check / uvx ruff check clean
All 5 tests pass locally against fresh HF main (1.50.3)
3 of 5 tests fail as expected against stale 1.45.8 snapshot
CI Test job passes

Catches silent model/data skew at the point the enhanced FRS dataset is republished on HuggingFace, not after a release. Exercises whatever is currently on HF `main` (unlike `conftest.py` which pins to an older version) and asserts plausibility bounds on: - UK weighted population and household/benunit counts - `is_parent` weighted population (>10M) — catches the defaulting-to- zero failure introduced by removing the inferred formula in #1595 - Universal credit aggregate in £55-£95bn range around the OBR target - state pension / child benefit / pension credit floors - `extended_childcare_entitlement_eligible` reaching >500k benefit units Verified against the 1.45.8 stale dataset: correctly fails 3 of 5 tests (is_parent=0, UC=£51.7bn, childcare=0), passes the other two. Marked `microsimulation` so it only runs in CI when HUGGING_FACE_TOKEN is set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis marked this pull request as ready for review April 17, 2026 02:55

MaxGhenis merged commit 53adf33 into main Apr 17, 2026
2 checks passed

MaxGhenis deleted the add-latest-data-smoke-test branch April 17, 2026 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add microsimulation smoke test against unpinned latest data#1617

Add microsimulation smoke test against unpinned latest data#1617
MaxGhenis merged 1 commit into
mainfrom
add-latest-data-smoke-test

MaxGhenis commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MaxGhenis commented Apr 17, 2026

Summary

Why

Verification

Scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant