Background
After PR #374 (commit 96f5707), the council-tax blocks in policyengine_uk_data/datasets/local_areas/local_authorities/loss.py use NaN-masking for cells where no direct source is available:
voa/council_tax/{A..H} → np.where(has_count, direct, np.nan)
housing/council_tax_net → np.where(has_ct_net, direct, np.nan)
The calibrator (utils/calibrate.py) was updated to mask NaN cells out of the loss, so missing-source LAs simply don't contribute to training on those targets.
Inconsistency
The other LA-level blocks in the same file still use the national-share fallback pattern:
# tenure/* (English Housing Survey — England-only)
y[f\"tenure/{tenure_key}\"] = np.where(
has_tenure, targets.values, national * la_household_share
)
# rent/private_rent (VOA private rents — England + Wales)
y[\"rent/private_rent\"] = np.where(
has_rent, target.values, national_rent * la_household_share
)
# ons/equiv_net_income_* (ONS small-area income — England + Wales)
y[\"ons/equiv_net_income_bhc\"] = np.where(
has_ons_data, target.values, national_bhc * la_household_share
)
For LAs with missing source data (Wales / Scotland / NI for tenure; Scotland / NI for rent and ONS income), these blocks fabricate a target value as a population-weighted slice of the national total, rather than masking the cell out.
This means the LA reweighter currently follows two coexisting policies:
- Council tax: only train on directly observed cells.
- Tenure / rent / ONS income: train on observed cells plus fabricated national-share fallbacks.
Question
Is the council-tax NaN-masking approach the new standard for all LA blocks, or is the national-share fallback intentional for the older blocks?
If the new standard, the existing tenure / rent / ONS-income blocks should be migrated to NaN-masking too (same shape: np.where(has_data, direct, np.nan)).
If the older approach is intentional for those targets (e.g., national-share fallback is acceptable for tenure mix percentages because mix patterns vary less across countries than council tax band distributions), the council-tax block comment should explicitly say so, and ideally the reasoning gets captured in a short policy note alongside loss.py.
Proposed actions
Two tractable paths:
-
Unify on NaN-masking. Migrate the tenure / rent / ONS-income blocks to np.where(has_data, direct, np.nan) and rely on the calibrator's NaN-masking. Pros: consistent, no fabricated targets anywhere. Cons: behaviour change for all callers; needs a calibration-quality check before/after to confirm residuals don't degrade.
-
Document the asymmetry. Add a short comment at the top of loss.py (or in utils/calibrate.py) explaining when each pattern applies and why. Pros: tiny, no behaviour change. Cons: leaves the inconsistency in place for the next PR adding an LA target.
Either path is fine. The current state (no comment, no consistency) makes the next PR-author guess.
Related
cc @MaxGhenis @vahid-ahmadi
Background
After PR #374 (commit
96f5707), the council-tax blocks inpolicyengine_uk_data/datasets/local_areas/local_authorities/loss.pyuse NaN-masking for cells where no direct source is available:voa/council_tax/{A..H}→np.where(has_count, direct, np.nan)housing/council_tax_net→np.where(has_ct_net, direct, np.nan)The calibrator (
utils/calibrate.py) was updated to mask NaN cells out of the loss, so missing-source LAs simply don't contribute to training on those targets.Inconsistency
The other LA-level blocks in the same file still use the national-share fallback pattern:
For LAs with missing source data (Wales / Scotland / NI for tenure; Scotland / NI for rent and ONS income), these blocks fabricate a target value as a population-weighted slice of the national total, rather than masking the cell out.
This means the LA reweighter currently follows two coexisting policies:
Question
Is the council-tax NaN-masking approach the new standard for all LA blocks, or is the national-share fallback intentional for the older blocks?
If the new standard, the existing tenure / rent / ONS-income blocks should be migrated to NaN-masking too (same shape:
np.where(has_data, direct, np.nan)).If the older approach is intentional for those targets (e.g., national-share fallback is acceptable for tenure mix percentages because mix patterns vary less across countries than council tax band distributions), the council-tax block comment should explicitly say so, and ideally the reasoning gets captured in a short policy note alongside
loss.py.Proposed actions
Two tractable paths:
Unify on NaN-masking. Migrate the tenure / rent / ONS-income blocks to
np.where(has_data, direct, np.nan)and rely on the calibrator's NaN-masking. Pros: consistent, no fabricated targets anywhere. Cons: behaviour change for all callers; needs a calibration-quality check before/after to confirm residuals don't degrade.Document the asymmetry. Add a short comment at the top of
loss.py(or inutils/calibrate.py) explaining when each pattern applies and why. Pros: tiny, no behaviour change. Cons: leaves the inconsistency in place for the next PR adding an LA target.Either path is fine. The current state (no comment, no consistency) makes the next PR-author guess.
Related
VALIDATION_TARGETSand excluded from training entirely.cc @MaxGhenis @vahid-ahmadi