Skip to content

Commit abee4e5

Browse files
MaxGhenisvahid-ahmadiclaude
authored
Tighten population tolerance and add fidelity tests (#366)
* Tighten population tolerance and add fidelity tests The weighted-UK-population drift that motivated #310 has already dropped from ~6.5% to ~1.6% on current main as a side-effect of the data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC target refresh #363, reported-anchor takeup #359). Tightens `test_population` tolerance from 7 % to 3 % to lock in that gain — any future calibration change that regresses back toward the pre-April-2026 overshoot now trips CI instead of silently drifting. Adds a new `test_population_fidelity.py` with four regression tests extracted from the #310 draft: - weighted-total ONS match (3 % tolerance) - household-count sanity range (25-33 M) - non-inflation guard (< 72 M) - country-populations-sum-to-UK consistency Does not include #310's loss-function change or Scotland target removal; those are independent proposals and should be evaluated on their own merits once the practical overshoot is resolved. Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Loosen population tolerance 3% -> 4% for stochastic calibration variance First CI run on this branch produced 71.8M (3.31% over target) where yesterday's main build produced 70.97M (1.58%). Stochastic dropout in the calibration optimiser (`dropout_weights(weights, 0.05)`) gives ~1-2 percentage point build-to-build variance on the population total. 4% keeps the regression gate well below the pre-April-2026 overshoot (~6.5%) while not flaking on normal stochastic variance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 92f84fe commit abee4e5

3 files changed

Lines changed: 78 additions & 3 deletions

File tree

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Tightened `test_population` tolerance from 7% to 3% now that the stage-2 QRF (#362), TFC target refresh (#363), and reported-anchor takeup (#359) pulled the weighted UK population overshoot from ~6.5% down to ~1.6%. Added four regression tests in `test_population_fidelity.py` (weighted-total match, household-count range, non-inflation guard, country-sum consistency) extracted from the earlier #310 draft so any future calibration drift back toward the pre-April-2026 overshoot trips CI.
Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
def test_population(baseline):
22
population = baseline.calculate("people", 2025).sum() / 1e6
3-
POPULATION_TARGET = 69.5 # Expected UK population in millions, per ONS 2022-based estimate here: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationprojections/bulletins/nationalpopulationprojections/2022based
4-
# Tolerance temporarily relaxed to 7% due to calibration inflation issue #217
5-
assert abs(population / POPULATION_TARGET - 1) < 0.07, (
3+
POPULATION_TARGET = 69.5 # ONS 2022-based projection for 2025, millions: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationprojections/bulletins/nationalpopulationprojections/2022based
4+
# Tightened from 7% to 4% after data-pipeline improvements in April 2026
5+
# (stage-2 QRF imputation #362, TFC target refresh #363, reported-anchor
6+
# takeup #359) pulled the weighted UK population down from ~74M (+6.5%)
7+
# to ~71M (+1.6% - 3.3% depending on stochastic calibration variance).
8+
# 4% headroom keeps CI stable across runs while still catching any
9+
# regression back toward the pre-April-2026 overshoot.
10+
assert abs(population / POPULATION_TARGET - 1) < 0.04, (
611
f"Expected UK population of {POPULATION_TARGET:.1f} million, got {population:.1f} million."
712
)
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
"""Population fidelity regression tests for the calibrated dataset.
2+
3+
Guards against the April 2026 calibration drift (issue #217) where the
4+
weighted UK population inflated ~6.5% above the ONS target. The drift
5+
was pulled back to ~1.6% by the data-pipeline improvements that landed
6+
in #362 (stage-2 QRF), #363 (TFC target refresh), and #359 (reported-
7+
anchor takeup). These tests lock in that gain so future calibration
8+
changes can't regress past current fidelity without a test failure.
9+
10+
Extracted from PolicyEngine/policyengine-uk-data#310 (Vahid Ahmadi).
11+
"""
12+
13+
from __future__ import annotations
14+
15+
import warnings
16+
17+
import numpy as np
18+
19+
POPULATION_TARGET = 69.5 # ONS 2022-based projection for 2025, millions
20+
TOLERANCE = 0.04 # 4% — covers ~1.6%-3.3% stochastic calibration variance
21+
22+
23+
def _raw(micro_series):
24+
"""Extract the raw numpy array from a MicroSeries without triggering
25+
the `.values` deprecation warning."""
26+
with warnings.catch_warnings():
27+
warnings.simplefilter("ignore", UserWarning)
28+
return np.array(micro_series.values)
29+
30+
31+
def test_weighted_population_matches_ons_target(baseline):
32+
"""Weighted UK population is within 3 % of the ONS projection."""
33+
population = baseline.calculate("people", 2025).sum() / 1e6
34+
assert abs(population / POPULATION_TARGET - 1) < TOLERANCE, (
35+
f"Weighted population {population:.1f}M is >{TOLERANCE:.0%} "
36+
f"from ONS target {POPULATION_TARGET:.1f}M."
37+
)
38+
39+
40+
def test_household_count_reasonable(baseline):
41+
"""Total weighted households fall inside the ONS 25-33 M range."""
42+
hw = _raw(baseline.calculate("household_weight", 2025))
43+
total_hh = hw.sum() / 1e6
44+
assert 25 < total_hh < 33, (
45+
f"Total weighted households {total_hh:.1f}M outside 25-33M range."
46+
)
47+
48+
49+
def test_population_not_inflated(baseline):
50+
"""Population stays below the pre-April-2026 inflated level (72 M)."""
51+
population = baseline.calculate("people", 2025).sum() / 1e6
52+
assert population < 72, (
53+
f"Population {population:.1f}M exceeds 72M — calibration has "
54+
"regressed toward the pre-#217 overshoot."
55+
)
56+
57+
58+
def test_country_populations_sum_to_uk(baseline):
59+
"""England + Scotland + Wales + NI populations sum to the UK total."""
60+
people = baseline.calculate("people", 2025)
61+
country = baseline.calculate("country", map_to="person")
62+
63+
uk_pop = people.sum()
64+
country_sum = sum(people[country == c].sum() for c in country.unique())
65+
66+
assert abs(country_sum / uk_pop - 1) < 0.001, (
67+
f"Country populations sum to {country_sum / 1e6:.1f}M "
68+
f"but UK total is {uk_pop / 1e6:.1f}M."
69+
)

0 commit comments

Comments
 (0)