Skip to content

Commit fa5d8fd

Browse files
vahid-ahmadiclaude
andcommitted
Add per-area H5 publishing (Phase 6 of OA calibration pipeline)
Extract per-area H5 subsets from sparse L0-calibrated weights. Each H5 contains only active households (non-zero weight after pruning) with linked person and benunit rows. Supports constituency and LA area types. Wired into create_datasets.py after calibration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3bdacad commit fa5d8fd

4 files changed

Lines changed: 778 additions & 3 deletions

File tree

docs/oa_calibration_pipeline.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,21 @@ LA and constituency are parallel — a constituency can span multiple LAs and vi
126126
---
127127

128128
### Phase 6: Local Area Publishing
129-
**Status: Not Started**
129+
**Status: Complete**
130130

131-
Generate per-area H5 files from sparse weights. Modal integration for scale.
131+
Generate per-area H5 files from sparse L0-calibrated weights.
132132

133133
**Deliverables:**
134-
- `policyengine_uk_data/calibration/publish_local_h5s.py`
134+
- `policyengine_uk_data/calibration/publish_local_h5s.py` — extracts per-area H5 subsets from the sparse weight vector; each H5 contains only active households (non-zero weight) with their calibrated weights, plus the linked person and benunit rows
135+
- `datasets/create_datasets.py` — publish step wired in after calibration, before downrating
136+
- `tests/test_publish_local_h5s.py` — 13 tests covering area-household mapping, H5 structure, pruned-household exclusion, weight correctness, person/benunit FK integrity, full publish cycle, summary statistics, and validation
137+
138+
**Key design:**
139+
- `_get_area_household_indices()`: maps each area code to its household row indices via OA geography columns from clone-and-assign
140+
- `publish_area_h5()`: writes a single H5 per area — filters to active (non-zero weight) households, extracts linked persons and benunits via FK joins, stores as HDF5 groups with metadata attributes
141+
- `publish_local_h5s()`: orchestrates the full publish cycle — loads L0 weight vector, iterates over all areas, writes H5 files to `storage/local_h5s/{area_type}/`, produces `_summary.csv` with per-area statistics
142+
- `validate_local_h5s()`: post-publish validation checking file existence, HDF5 structure, and cross-area household ID uniqueness
143+
- Supports both constituency (650) and LA (360) area types
144+
- Zero-weight households (L0-pruned) are excluded from area H5 files — only active records are published
135145

136146
**US reference:** PR #465 (modal)

0 commit comments

Comments
 (0)