LA calibration: unify missing-source handling across loss.py blocks

## Background

After PR #374 (commit `96f5707`), the council-tax blocks in `policyengine_uk_data/datasets/local_areas/local_authorities/loss.py` use **NaN-masking** for cells where no direct source is available:

- `voa/council_tax/{A..H}` → `np.where(has_count, direct, np.nan)`
- `housing/council_tax_net` → `np.where(has_ct_net, direct, np.nan)`

The calibrator (`utils/calibrate.py`) was updated to mask NaN cells out of the loss, so missing-source LAs simply don't contribute to training on those targets.

## Inconsistency

The other LA-level blocks in the same file still use the **national-share fallback** pattern:

```python
# tenure/*  (English Housing Survey — England-only)
y[f\"tenure/{tenure_key}\"] = np.where(
    has_tenure, targets.values, national * la_household_share
)

# rent/private_rent  (VOA private rents — England + Wales)
y[\"rent/private_rent\"] = np.where(
    has_rent, target.values, national_rent * la_household_share
)

# ons/equiv_net_income_*  (ONS small-area income — England + Wales)
y[\"ons/equiv_net_income_bhc\"] = np.where(
    has_ons_data, target.values, national_bhc * la_household_share
)
```

For LAs with missing source data (Wales / Scotland / NI for tenure; Scotland / NI for rent and ONS income), these blocks **fabricate** a target value as a population-weighted slice of the national total, rather than masking the cell out.

This means the LA reweighter currently follows two coexisting policies:
- Council tax: only train on directly observed cells.
- Tenure / rent / ONS income: train on observed cells **plus** fabricated national-share fallbacks.

## Question

Is the council-tax NaN-masking approach the new standard for all LA blocks, or is the national-share fallback intentional for the older blocks?

If the new standard, the existing tenure / rent / ONS-income blocks should be migrated to NaN-masking too (same shape: `np.where(has_data, direct, np.nan)`).

If the older approach is intentional for those targets (e.g., national-share fallback is acceptable for tenure mix percentages because mix patterns vary less across countries than council tax band distributions), the council-tax block comment should explicitly say so, and ideally the reasoning gets captured in a short policy note alongside `loss.py`.

## Proposed actions

Two tractable paths:

1. **Unify on NaN-masking.** Migrate the tenure / rent / ONS-income blocks to `np.where(has_data, direct, np.nan)` and rely on the calibrator's NaN-masking. Pros: consistent, no fabricated targets anywhere. Cons: behaviour change for all callers; needs a calibration-quality check before/after to confirm residuals don't degrade.

2. **Document the asymmetry.** Add a short comment at the top of `loss.py` (or in `utils/calibrate.py`) explaining when each pattern applies and why. Pros: tiny, no behaviour change. Cons: leaves the inconsistency in place for the next PR adding an LA target.

Either path is fine. The current state (no comment, no consistency) makes the next PR-author guess.

## Related

- #371 — uses a third pattern: derived proxy added to `VALIDATION_TARGETS` and excluded from training entirely.
- #374 — introduced the NaN-masking pattern and the calibrator-side mask handling.

cc @MaxGhenis @vahid-ahmadi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LA calibration: unify missing-source handling across loss.py blocks #381

Background

Inconsistency

Question

Proposed actions

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

LA calibration: unify missing-source handling across loss.py blocks #381

Description

Background

Inconsistency

Question

Proposed actions

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions