Skip to content

Add microsimulation smoke test against unpinned latest data#1617

Merged
MaxGhenis merged 1 commit into
mainfrom
add-latest-data-smoke-test
Apr 17, 2026
Merged

Add microsimulation smoke test against unpinned latest data#1617
MaxGhenis merged 1 commit into
mainfrom
add-latest-data-smoke-test

Conversation

@MaxGhenis

Copy link
Copy Markdown
Collaborator

Summary

Catches silent model/data skew at the point the enhanced FRS dataset is republished on HuggingFace, not after a release. Exercises whatever is currently on HF main (unlike conftest.py which pins to an older version) and asserts plausibility bounds on:

  • UK weighted population and household/benunit counts
  • is_parent weighted population (>10M) — catches the defaulting-to-zero failure introduced by removing the inferred formula in Remove inferred is_parent formula #1595
  • Universal credit aggregate in £55–£95bn range around the OBR target
  • State pension / child benefit / pension credit floors
  • extended_childcare_entitlement_eligible reaching >500k benefit units

Why

Between 2026-04-15 14:04 UTC (when #1595 removed the inferred is_parent formula) and 2026-04-16 14:17 UTC (when uk-data 1.50.3 published an FRS build with an is_parent column), any run of policyengine-uk main against the prior dataset silently produced:

  • is_parent = 0 everywhere
  • extended_childcare_entitlement_eligible = 0 everywhere
  • Universal credit aggregate ≈ £51.7bn (vs OBR target ~£74bn), because capital-limit logic combined with stale savings imputations over-triggered disqualifications

None of that surfaced in CI because the existing microsimulation-marked tests pin to dataset version 1.40.3 via conftest.py. This adds a parallel, unpinned test that uses whatever is on HF main — so the next time model code and data pipelines drift apart, CI notices.

Verification

Local run against fresh 1.50.3: all 5 tests pass.

Local run against the stale 1.45.8 snapshot from before the fix:

PASS: test_population_totals_are_plausible
FAIL: test_is_parent_is_populated — is_parent weighted total 0 is too low
FAIL: test_universal_credit_aggregate_in_range — UC £51.7bn outside £55–£95bn
PASS: test_core_benefits_are_nonzero
FAIL: test_childcare_entitlement_populated — extended_childcare_entitlement_eligible total 0

i.e. the three failure modes the April 15–16 window actually exhibited all trip.

Scope

  • policyengine_uk/tests/test_latest_data_smoke.py — 5 new tests, all marked microsimulation so they skip unless HUGGING_FACE_TOKEN is present.
  • changelog.d/latest-data-smoke.added.md — changelog fragment.
  • No production code touched.

Test plan

  • uvx ruff format --check / uvx ruff check clean
  • All 5 tests pass locally against fresh HF main (1.50.3)
  • 3 of 5 tests fail as expected against stale 1.45.8 snapshot
  • CI Test job passes

Catches silent model/data skew at the point the enhanced FRS dataset
is republished on HuggingFace, not after a release. Exercises whatever
is currently on HF `main` (unlike `conftest.py` which pins to an older
version) and asserts plausibility bounds on:

- UK weighted population and household/benunit counts
- `is_parent` weighted population (>10M) — catches the defaulting-to-
  zero failure introduced by removing the inferred formula in #1595
- Universal credit aggregate in £55-£95bn range around the OBR target
- state pension / child benefit / pension credit floors
- `extended_childcare_entitlement_eligible` reaching >500k benefit units

Verified against the 1.45.8 stale dataset: correctly fails 3 of 5 tests
(is_parent=0, UC=£51.7bn, childcare=0), passes the other two.

Marked `microsimulation` so it only runs in CI when HUGGING_FACE_TOKEN
is set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis marked this pull request as ready for review April 17, 2026 02:55
@MaxGhenis MaxGhenis merged commit 53adf33 into main Apr 17, 2026
2 checks passed
@MaxGhenis MaxGhenis deleted the add-latest-data-smoke-test branch April 17, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant