Skip to content
This repository was archived by the owner on Jun 19, 2026. It is now read-only.

Commit 604621c

Browse files
vahid-ahmadiclaudeMaxGhenis
authored
Calibrate LA council tax (band counts + net £) and fix national gross/net (#374)
* Add LA-level council tax calibration targets Two families of LA-level targets, covering all 360 LAs in local_authorities_2021.csv, built from four public sources: - `ons/council_tax_band_d/{code}` (350 targets): average Band D council tax inclusive of all precepts per billing authority. Sources: MHCLG *Council Tax levels set by local authorities in England 2026-27*, Welsh Government *Council Tax levels April 2026 to March 2027*, Scottish Government *Council Tax Assumptions 2025*. All 296 English + 22 Welsh + 32 Scottish LAs covered. - `ons/council_tax_band_count/{code}/{band}` (2,541 targets): number of dwellings per band A-H per LA. Source: VOA *Council Tax: Stock of Properties, 2025*. Covers England + Wales (318 LAs × ~8 bands, minus City of London Band A which is VOA-suppressed). NI is excluded: domestic rates, not council tax. Scotland band counts are not in VOA; Scottish Assessors publishes them separately and is a follow-up. Files ----- - `storage/la_council_tax.csv` (31 KB, 360 rows): canonical CSV joining DLUHC Table 10 column 17, Welsh Table 1 "Overall average band D", Scottish Gov "CT by Band 2025-26" Band D column, and VOA CTSOP1.0 bands A-H onto the reference LA list. - Post-2023 South Yorkshire E-codes (E08000038/39) re-mapped to pre-2023 codes (E08000016/19) to match the reference list. - Scottish ampersand/double-space naming normalised ("Argyll & Bute" → "Argyll and Bute", etc.). - `targets/sources/la_council_tax.py`: reads the CSV, emits Target objects at geographic_level=LOCAL_AUTHORITY with per-country year tagging and per-country reference URL. Testing ------- 22 hermetic tests (no network access, no baseline fixture needed): Structure - Row count matches local_authorities_2021.csv. - Every expected column present. - Four UK country codes represented. - Every LA code matches the reference list. Value plausibility (the #371 lesson) - Band D amount in [£900, £3,500] for every row with a value. - Total dwellings in [200, 800,000] for every row with a value. - Explicit Isles of Scilly regression test: total dwellings in [500, 5,000], not the 2.49M outlier that slipped into #371. - Band A-H counts sum to total dwellings within 20-property slack (VOA 10-property suppression allowance). - Every band-count target value ≤ 500k (largest LA stock). Coverage expectations - Every English, Welsh and Scottish LA has a Band D value. - Northern Ireland has no council tax flagged (has_council_tax=False). Spot-checks of published facts - Wandsworth (E09000032) and Westminster (E09000033) are the two lowest-Band-D English LAs (catches row-swap bugs). - Scottish average Band D is £500+ below English average. Target-API invariants - get_targets() returns a non-empty list without network access. - Band D target count matches the CSV's non-null Band D count. - Band count target count matches Σ non-null band columns. - Every target carries geographic_level=LOCAL_AUTHORITY and a geo_code. - Band D targets use Unit.GBP; band count targets use Unit.COUNT with is_count=True. - Every target has at least one year of values. Sources ------- - MHCLG (England 2026-27): https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027 - Welsh Government (Wales 2026-27): https://www.gov.wales/council-tax-levels-april-2026-march-2027-html - Scottish Government (Scotland 2025-26): https://www.gov.scot/publications/council-tax-datasets/ - VOA (England + Wales 2025): https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025 Out of scope for this PR (follow-ups) ------------------------------------- - Wiring these targets into datasets/local_areas/local_authorities/loss.py so the LA reweighting actually calibrates on them. Planned follow-up PR. - Scottish Assessors per-LA chargeable-dwellings to fill the Scotland band-count gap. - Council Tax Support caseload per LA (DWP StatXplore). - Single Person Discount rate per LA (CIPFA). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review: add Welsh Band I, source totals from VOA, tidy module Review points addressed: - Add count_band_I column to la_council_tax.csv, populated for all 22 Welsh LAs (Wales revalued in 2005 and introduced a 9th band). Cardiff 1480, Monmouthshire 670, Vale of Glamorgan 1060, etc. English rows keep Band I null; VOA marks it [z] (not applicable). - Re-source total_dwellings from VOA "All properties" column instead of deriving it as the sum of A-H. Previously Σ(A..H) was used for both sides of test_band_counts_sum_to_total, making the test self-referential; now it validates against the published total with a 20-property slack for VOA rounding. - Rename count columns symmetrically: band_A..band_H + band_D_count → count_band_A..count_band_I. Removes the lopsided band_D_count name that existed only to avoid clashing with band_d_amount. - Align band-count target names with voa_council_tax.py: voa/council_tax/{code}/{band} (was ons/council_tax_band_count/...); variable="council_tax_band" (was council_tax_band_count, which is not a real PolicyEngine-UK variable); drop breakdown_variable to match the regional VOA module. - Cache the CSV read with @lru_cache(maxsize=1), matching voa_council_tax. - Update module docstring: "A-H in England/Scotland, A-I in Wales". Tests: - New: test_welsh_las_have_band_i (all 22 Welsh LAs populated). - New: test_english_las_have_no_band_i (guard against spurious fills). - New: test_cardiff_band_i_matches_published_figure (~1,480 per VOA 2025). Final target counts: - 350 Band D amount targets (unchanged). - 2,563 band-count targets, up from 2,541: +22 Welsh Band I plus two band-H rows that were null due to the earlier truncation. * Satisfy ruff format on la_council_tax.py * Wire LA council-tax band-count targets into the calibration loss matrix The targets registered in la_council_tax.py were inert — the LA target matrix had no columns for them, so the reweighter could not see them. This wires the eight VOA Council Tax Stock-of-Properties band-count targets (A-H) into the LA loss matrix: - matrix entry: per-household indicator 1[council_tax_band == B] from policyengine-uk. - y entry: 360-vector of per-LA dwelling counts from storage/la_council_tax.csv. For LAs without VOA data — Scottish LAs (the VOA summary tables don't cover Scotland) and Northern Irish LAs (no council tax) — the value falls back to national_count × la_household_share, matching the existing tenure block's fallback pattern. Two targets are deliberately not wired in this pass: - Band I — Wales-only and mostly null in the CSV. - The Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate quantity that does not fit the linear matrix-times-weights aggregation. Wiring it as total council-tax revenue would need Scotland-specific band ratios (different from England/Wales after 2017) and is worth a separate PR. New tests in test_la_loss_council_tax.py cover both layers: - Light: CSV joins to every LA code, the eight count_band_{X} columns exist, E/W rows are populated, Scotland is null as documented, and NI has has_council_tax=False. - Full build (gated on enhanced FRS fixture): all eight columns present in matrix and y; y vectors length 360, finite and positive; matrix entries are 0/1 indicators with rows summing to ≤1; y matches the CSV verbatim for an English LA (Hartlepool); Scotland and NI LAs receive a positive fallback rather than NaN or zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add LA-level net council tax £ target alongside band counts Wires the second FRS data point into the LA reweighter, addressing the 28 Apr standup ALIGNED decision: "calibrate the two FRS data points as the council tax information is provided after deductions." Both sides of the new constraint are net of CTR: - matrix col = council_tax_less_benefit (gross − CTR benefit) - y = directly observed net council tax requirement per LA Sources (no national-total apportionment, all directly published): - England (296 LAs): MHCLG Council Taxbase 2025, Table 1.35 "Tax base after allowance for council tax support" × Band D amount. Sums to £47.4bn, within 3.4% of the MHCLG Table 1 published England Council Tax Requirement of £45.86bn (small gap from year mismatch: 2025 taxbase × 2026-27 Band D). - Wales (22 LAs): Welsh Government "Council Tax Levels April 2026 to March 2027" Table 3 "Council tax income (£m)". Sums to £2.45bn. - Scotland (32) and NI (10): no source wired; loss.py routes through the existing national × la_household_share fallback, same pattern as the band-count target and the rent target. Mirrors the rent block in loss.py: load CSV → merge into ct_merged → matrix col / y assignment / has_data mask / national-share fallback. Files: - storage/la_council_tax.csv: new column total_council_tax_net. - targets/sources/la_council_tax.py: load_la_net_council_tax() + Target objects named housing/council_tax_net/{code}. - datasets/local_areas/local_authorities/loss.py: housing/council_tax_net block immediately after the band-count block. - tests/test_la_loss_council_tax.py: 11 new tests (4 layer-1 + 7 layer-2) covering CSV column presence, country coverage, value range, England-total ballpark vs MHCLG, matrix-col correctness, na-fallback behaviour, calibratability sanity check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix gross/net mismatch in OBR national council tax compute OBR EFO Table 4.1 reports "Total net council tax receipts" — net of council tax reduction (CTR). The matching household-level signal is council_tax_less_benefit (= gross council tax − CTR award), not council_tax (which is the gross liability before CTR per its docstring "Gross amount spent on Council Tax, before discounts"). Calibrating gross household values against a net national target systematically pulls weights down to fit (Σ w × gross > Σ w × net), leaking bias into adjacent national targets that share the weight vector. Order-of-magnitude sanity (UK 2024-25): Σ w × council_tax (gross) ≈ £55bn Σ w × council_tax_less_benefit (net) ≈ £47bn OBR Table 4.1 "Total net council tax" ≈ £44bn After the fix, the council tax constraint is internally consistent (both sides net) and aligns with Max's 28 Apr standup decision on FRS-net-of-CTR alignment. Pairs naturally with the LA-level housing/council_tax_net target this PR adds — both use the same net variable. Adds three regression tests pinning the net-variable contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Zero NI council tax targets instead of fabricating fallbacks Northern Ireland uses domestic rates, not council tax. The CSV's has_council_tax flag has been False for NI from the original commit, but loss.py was ignoring it and assigning national × la_household_share to NI LAs for both band counts and the new net £ column. Effect: the optimiser was being told "NI households should pay this much council tax" with a positive target, while every NI household has council_tax_band == None and council_tax_less_benefit == 0 — an unsatisfiable constraint that wastes loss the optimiser cannot drive to zero. Reported by @MaxGhenis in PR review. Fix: read has_council_tax from the CSV, gate the np.where so NI LAs get y == 0 for all 9 council-tax columns. Direct-value and fallback paths unchanged for E/W/S. Updates two tests that previously asserted positive fallback for NI; adds explicit zero-NI assertion for housing/council_tax_net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document derived/proxy nature + lineage drift for #374 CT targets Per @MaxGhenis PR review: both council-tax LA targets are derived proxies, not direct matches for the matrix-side variables. The PR description and code comments earlier overstated this. voa/council_tax/{A..H}: target counts VOA dwellings (E&W only, includes exempt/empty/second homes); matrix counts policyengine-uk households. Banding ratios differ in Scotland post-2017 and Wales has Band I. housing/council_tax_net: target value is MHCLG taxbase × Band D (taxbase = Band D equivalent dwellings adjusted for ~7 discount/ premium/exemption classes); matrix col is FRS-reported council_tax_less_benefit (household-reported gross less reported CTB). Same intent, different construction paths. Documentation only — no code, data, or test behaviour change. The la_council_tax.py docstring now has an explicit "Lineage caveats" section, and loss.py block comments label both targets as derived/proxy with cross-reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Mask unavailable LA council tax targets * Remove redundant council tax availability gate --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Max Ghenis <mghenis@gmail.com>
1 parent 35b5ff4 commit 604621c

9 files changed

Lines changed: 1519 additions & 11 deletions

File tree

policyengine_uk_data/datasets/local_areas/local_authorities/loss.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@
1111
- ONS income: ONS small area income estimates
1212
- Tenure: English Housing Survey
1313
- Private rent: VOA/ONS private rental market statistics
14+
- Council tax bands A-H: VOA Council Tax Stock of Properties (per LA)
15+
- Council tax £ paid (net of CTR): MHCLG taxbase × Band D (England),
16+
Welsh Government Council Tax Income (Wales)
1417
"""
1518

1619
from policyengine_uk import Microsimulation
@@ -252,6 +255,57 @@ def create_local_authority_target_matrix(
252255
national_rent * la_household_share,
253256
)
254257

258+
# ── Council tax band counts (LA targets) ───────────────────────
259+
# Derived/proxy targets: per-LA VOA dwellings in each band A-H.
260+
# Lineage drift vs the matrix-side household council_tax_band:
261+
# VOA counts dwellings (incl. exempt / empty / second homes);
262+
# matrix counts households. See la_council_tax.py for full
263+
# caveat. Missing cells stay NaN and are masked out by the
264+
# calibrator; this keeps the target direct instead of fabricating
265+
# national-share fallbacks for Scotland or Northern Ireland. Band I
266+
# is Wales-only and rarely populated, so it is intentionally
267+
# excluded.
268+
ct_path = STORAGE_FOLDER / "la_council_tax.csv"
269+
if ct_path.exists():
270+
ct_data = pd.read_csv(ct_path)
271+
ct_columns = ["code"] + [f"count_band_{b}" for b in "ABCDEFGH"]
272+
if "total_council_tax_net" in ct_data.columns:
273+
ct_columns.append("total_council_tax_net")
274+
ct_merged = la_codes.merge(ct_data[ct_columns], on="code", how="left")
275+
ct_band = sim.calculate("council_tax_band").values
276+
for band in "ABCDEFGH":
277+
col = f"voa/council_tax/{band}"
278+
matrix[col] = (ct_band == band).astype(float)
279+
csv_col = f"count_band_{band}"
280+
has_count = ct_merged[csv_col].notna().values
281+
direct = ct_merged[csv_col].values
282+
y[col] = np.where(
283+
has_count,
284+
direct,
285+
np.nan,
286+
)
287+
288+
# ── Council tax £ paid, net of CTR (LA targets) ────────────
289+
# Derived/proxy target: y = MHCLG taxbase × Band D (E) or WG
290+
# Council Tax Income (W). Matrix col is FRS-reported
291+
# council_tax_less_benefit (gross − reported CTB). Same
292+
# intent (household council tax paid net of CTR), different
293+
# construction paths — see la_council_tax.py for the lineage
294+
# caveat flagged in review by @MaxGhenis. Both sides are net
295+
# of CTR, per Max's 28 Apr standup decision on FRS alignment.
296+
# Missing cells remain NaN and are masked out by the calibrator.
297+
if "total_council_tax_net" in ct_merged.columns:
298+
matrix["housing/council_tax_net"] = sim.calculate(
299+
"council_tax_less_benefit"
300+
).values
301+
has_ct_net = ct_merged["total_council_tax_net"].notna().values
302+
direct_net = ct_merged["total_council_tax_net"].values
303+
y["housing/council_tax_net"] = np.where(
304+
has_ct_net,
305+
direct_net,
306+
np.nan,
307+
)
308+
255309
# ── Country mask ───────────────────────────────────────────────
256310
country_mask = create_country_mask(
257311
household_countries=sim.calculate("country").values,

policyengine_uk_data/storage/la_council_tax.csv

Lines changed: 361 additions & 0 deletions
Large diffs are not rendered by default.

policyengine_uk_data/targets/compute/council_tax.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,18 @@ def compute_council_tax_band(target, ctx) -> np.ndarray:
1919

2020

2121
def compute_obr_council_tax(target, ctx) -> np.ndarray:
22-
"""Compute OBR council tax receipts, optionally by country."""
22+
"""Compute OBR council tax receipts, optionally by country.
23+
24+
OBR Table 4.1 reports "Total net council tax receipts" — net of
25+
council tax reduction (CTR) support. The matching household-level
26+
signal is therefore ``council_tax_less_benefit`` (= gross council
27+
tax less the CTR award), not ``council_tax`` (which is the gross
28+
liability before CTR). Using the gross variable here would
29+
systematically push weights down to fit a net target, leaking
30+
bias into adjacent national calibrations.
31+
"""
2332
name = target.name
24-
ct = ctx.pe("council_tax")
33+
ct = ctx.pe("council_tax_less_benefit")
2534

2635
if name == "obr/council_tax":
2736
return ct
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
"""Local-authority council tax calibration targets (derived proxies).
2+
3+
Produces three kinds of LA-level calibration target from public data:
4+
5+
- ``ons/council_tax_band_d/{code}``: the average Band D council tax
6+
(inclusive of all precepts) each household pays in billing authority
7+
``code``. Sourced from MHCLG, Welsh Government and Scottish
8+
Government annual publications.
9+
- ``voa/council_tax/{code}/{band}``: the number of dwellings in band
10+
``A``–``H`` (England) or ``A``–``I`` (Wales) for billing authority
11+
``code``. Sourced from the VOA *Council Tax: Stock of Properties*
12+
summary tables.
13+
- ``housing/council_tax_net/{code}``: net council tax requirement per
14+
LA (net of CTR support). England derived from MHCLG taxbase × Band D;
15+
Wales sourced directly from WG Council Tax Income (Table 3).
16+
17+
Data for all 360 LAs in ``local_authorities_2021.csv`` is joined from
18+
the committed canonical file ``storage/la_council_tax.csv``. Rows where
19+
a source did not provide a value are omitted so calibrators cleanly
20+
skip them.
21+
22+
Lineage caveats (flagged in PR review by @MaxGhenis):
23+
24+
- ``voa/council_tax/{A..H}`` is a **derived proxy**, not a direct
25+
match for the matrix-side household ``council_tax_band``:
26+
* Target counts VOA dwellings; matrix counts policyengine-uk
27+
households. A household ≠ a dwelling in general.
28+
* VOA stock includes exempt, empty, and second-home dwellings,
29+
which contribute zero to the matrix-side sum (no household lives
30+
in them in the FRS).
31+
* VOA covers England and Wales only. Scotland and NI cells are
32+
masked out of the loss matrix unless a direct source is available.
33+
* Banding ratios differ: Scotland diverged from the standard
34+
6/9–18/9 E&W ratios after the 2017 reform; Wales has Band I,
35+
England does not.
36+
37+
- ``housing/council_tax_net`` is a **derived proxy**:
38+
* Target value (England) is MHCLG ``taxbase × Band D``, where
39+
taxbase is Band D equivalent dwellings adjusted for ~7
40+
discount, premium, and exemption classes (single-person,
41+
disabled relief, second-home, empty-home premium, family
42+
annexe, etc.). Wales uses WG-published net council tax income
43+
direct.
44+
* Matrix col is FRS-reported ``council_tax_less_benefit``
45+
(household-reported gross less reported CTB).
46+
* Same intent (what households pay net of CTR), different
47+
construction paths and underlying microdata sources.
48+
49+
Known coverage gaps:
50+
51+
- Northern Ireland is excluded because its domestic rates system is
52+
distinct from council tax. ``loss.py`` masks NI cells rather than
53+
fabricating a fallback.
54+
- Band-count rows for Scottish LAs are absent because the VOA summary
55+
tables do not cover Scotland; Scottish Assessors publishes per-LA
56+
chargeable-dwellings data separately and is a follow-up.
57+
- Band I only exists in Wales (introduced in the 2005 Welsh revaluation);
58+
English rows leave it null.
59+
- City of London has Band A suppressed by VOA for disclosure control;
60+
its other bands are populated.
61+
62+
Sources:
63+
- MHCLG *Council Tax levels set by local authorities in England 2026-27*
64+
https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027
65+
- MHCLG *Council Taxbase 2025 in England* (Table 1.35 taxbase after CTR)
66+
https://www.gov.uk/government/statistics/council-taxbase-2025-in-england
67+
- Welsh Government *Council Tax levels: April 2026 to March 2027*
68+
https://www.gov.wales/council-tax-levels-april-2026-march-2027-html
69+
- Scottish Government *Council Tax Assumptions 2025* (CT by Band, 2025-26)
70+
https://www.gov.scot/publications/council-tax-datasets/
71+
- VOA *Council Tax: Stock of Properties, 2025*
72+
https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025
73+
"""
74+
75+
from __future__ import annotations
76+
77+
from functools import lru_cache
78+
79+
import pandas as pd
80+
81+
from policyengine_uk_data.targets.schema import (
82+
GeographicLevel,
83+
Target,
84+
Unit,
85+
)
86+
from policyengine_uk_data.targets.sources._common import STORAGE
87+
88+
89+
_CSV_NAME = "la_council_tax.csv"
90+
91+
# Latest fiscal years covered by each source. The LA Band D amounts are
92+
# structurally single-year snapshots; callers that need longer time
93+
# series should uprate via the existing council-tax uprating index.
94+
_YEAR_BAND_D_ENGLAND = 2026
95+
_YEAR_BAND_D_WALES = 2026
96+
_YEAR_BAND_D_SCOTLAND = 2025
97+
_YEAR_BAND_COUNT = 2025
98+
99+
_BAND_COUNT_COLUMNS = {band: f"count_band_{band}" for band in "ABCDEFGHI"}
100+
101+
_ENGLAND_REF = (
102+
"https://www.gov.uk/government/statistics/"
103+
"council-tax-levels-set-by-local-authorities-in-england-2026-to-2027"
104+
)
105+
_WALES_REF = "https://www.gov.wales/council-tax-levels-april-2026-march-2027-html"
106+
_SCOTLAND_REF = "https://www.gov.scot/publications/council-tax-datasets/"
107+
_VOA_REF = (
108+
"https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025"
109+
)
110+
# Net council tax requirement per LA. England derived from MHCLG
111+
# Council Taxbase 2025 Table 1.35 ("Tax base after allowance for council
112+
# tax support") × LA Band D amount. Wales sourced directly from the
113+
# Welsh Government Table 3 "Council tax income (£m)" — already net.
114+
_NET_CT_REF_ENG = (
115+
"https://www.gov.uk/government/statistics/council-taxbase-2025-in-england"
116+
)
117+
_NET_CT_REF_WAL = _WALES_REF
118+
119+
120+
@lru_cache(maxsize=1)
121+
def _load_table() -> pd.DataFrame | None:
122+
"""Return the committed LA council-tax table, or ``None`` if missing."""
123+
csv_path = STORAGE / _CSV_NAME
124+
if not csv_path.exists():
125+
return None
126+
return pd.read_csv(csv_path)
127+
128+
129+
def load_la_net_council_tax() -> pd.DataFrame:
130+
"""Load per-LA net council tax requirement (£, after CTR support).
131+
132+
Returns a DataFrame with columns ``code, total_council_tax_net``
133+
for LAs where a directly-observed net figure is available
134+
(England + Wales). Scotland and NI are absent; loss-matrix callers
135+
should mask those cells rather than fabricating fallback values.
136+
"""
137+
df = _load_table()
138+
if df is None or df.empty:
139+
return pd.DataFrame(columns=["code", "total_council_tax_net"])
140+
if "total_council_tax_net" not in df.columns:
141+
return pd.DataFrame(columns=["code", "total_council_tax_net"])
142+
return df.loc[
143+
df["total_council_tax_net"].notna(),
144+
["code", "total_council_tax_net"],
145+
].reset_index(drop=True)
146+
147+
148+
def _year_for_band_d(country: str) -> int:
149+
if country == "WALES":
150+
return _YEAR_BAND_D_WALES
151+
if country == "SCOTLAND":
152+
return _YEAR_BAND_D_SCOTLAND
153+
return _YEAR_BAND_D_ENGLAND
154+
155+
156+
def _ref_for_band_d(country: str) -> str:
157+
if country == "WALES":
158+
return _WALES_REF
159+
if country == "SCOTLAND":
160+
return _SCOTLAND_REF
161+
return _ENGLAND_REF
162+
163+
164+
def get_targets() -> list[Target]:
165+
"""Emit LA-level Band D amount + band-count targets."""
166+
df = _load_table()
167+
if df is None or df.empty:
168+
return []
169+
170+
targets: list[Target] = []
171+
172+
# Band D amount targets — one per LA with a reported value.
173+
for _, row in df.iterrows():
174+
amount = row.get("band_d_amount")
175+
if pd.isna(amount):
176+
continue
177+
code = str(row["code"])
178+
country = str(row["country"])
179+
targets.append(
180+
Target(
181+
name=f"ons/council_tax_band_d/{code}",
182+
variable="council_tax_band_d_amount",
183+
source="ons",
184+
unit=Unit.GBP,
185+
geographic_level=GeographicLevel.LOCAL_AUTHORITY,
186+
geo_code=code,
187+
geo_name=str(row["name"]),
188+
values={_year_for_band_d(country): float(amount)},
189+
reference_url=_ref_for_band_d(country),
190+
)
191+
)
192+
193+
# Band count targets — one per (LA, band) where VOA has a value.
194+
for _, row in df.iterrows():
195+
code = str(row["code"])
196+
name = str(row["name"])
197+
for band, col in _BAND_COUNT_COLUMNS.items():
198+
count = row.get(col)
199+
if pd.isna(count):
200+
continue
201+
targets.append(
202+
Target(
203+
name=f"voa/council_tax/{code}/{band}",
204+
variable="council_tax_band",
205+
source="voa",
206+
unit=Unit.COUNT,
207+
geographic_level=GeographicLevel.LOCAL_AUTHORITY,
208+
geo_code=code,
209+
geo_name=name,
210+
values={_YEAR_BAND_COUNT: float(count)},
211+
is_count=True,
212+
reference_url=_VOA_REF,
213+
)
214+
)
215+
216+
# Net council tax £ targets — one per LA with an observed value.
217+
# Mirrors the FRS net-of-CTR amount; pairs with the band targets
218+
# above to cover both FRS council-tax data points.
219+
if "total_council_tax_net" in df.columns:
220+
for _, row in df.iterrows():
221+
net = row.get("total_council_tax_net")
222+
if pd.isna(net):
223+
continue
224+
country = str(row["country"])
225+
ref = _NET_CT_REF_WAL if country == "WALES" else _NET_CT_REF_ENG
226+
targets.append(
227+
Target(
228+
name=f"housing/council_tax_net/{row['code']}",
229+
variable="council_tax_less_benefit",
230+
source="mhclg" if country == "ENGLAND" else "stats_wales",
231+
unit=Unit.GBP,
232+
geographic_level=GeographicLevel.LOCAL_AUTHORITY,
233+
geo_code=str(row["code"]),
234+
geo_name=str(row["name"]),
235+
values={_YEAR_BAND_D_ENGLAND: float(net)},
236+
reference_url=ref,
237+
)
238+
)
239+
240+
return targets

policyengine_uk_data/tests/test_calibrate_save.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,3 +120,42 @@ def test_calibrate_local_areas_saves_weights_in_nonverbose_branch(
120120
# Verify the saved weights have the area_count x n_households shape
121121
# produced by the calibrator.
122122
assert weights.shape == (2, 4)
123+
124+
125+
def test_calibrate_local_areas_masks_nan_local_targets(tmp_path, monkeypatch):
126+
"""Sparse local targets should be allowed.
127+
128+
Local-authority sources are not available for every area/metric pair.
129+
A NaN target means "do not train on this cell", not "propagate NaN
130+
through the loss".
131+
"""
132+
133+
import h5py
134+
135+
from policyengine_uk_data.utils import calibrate as calibrate_module
136+
from policyengine_uk_data.utils.calibrate import calibrate_local_areas
137+
138+
monkeypatch.setattr(calibrate_module, "STORAGE_FOLDER", tmp_path)
139+
140+
matrix_fn, national_matrix_fn = _make_toy_inputs(n_households=4, area_count=2)
141+
142+
def sparse_matrix_fn(dataset):
143+
matrix, local_targets, country_mask = matrix_fn(dataset)
144+
local_targets.iloc[1, 0] = np.nan
145+
return matrix, local_targets, country_mask
146+
147+
weight_file = "toy_sparse_weights.h5"
148+
calibrate_local_areas(
149+
dataset=_StubDataset(np.array([1.0, 1.0, 1.0, 1.0])),
150+
matrix_fn=sparse_matrix_fn,
151+
national_matrix_fn=national_matrix_fn,
152+
area_count=2,
153+
weight_file=weight_file,
154+
dataset_key="2025",
155+
epochs=5,
156+
verbose=False,
157+
)
158+
159+
with h5py.File(tmp_path / weight_file, "r") as f:
160+
weights = f["2025"][:]
161+
assert np.isfinite(weights).all()

0 commit comments

Comments
 (0)