Skip to content

Commit 3bdacad

Browse files
vahid-ahmadiclaude
andcommitted
Add SQLite target database (Phase 5 of OA calibration pipeline)
Hierarchical target storage with two parallel geographic branches: - Administrative: country → region → LA → MSOA → LSOA → OA - Parliamentary: country → constituency Schema: areas (geographic hierarchy), targets (definitions), target_values (year-indexed values). ETL loads areas from OA crosswalk + area code CSVs, targets from registry + local CSVs. Query API: get_targets(), get_area_targets(), get_area_children(), get_area_hierarchy(). 12 tests all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4d50734 commit 3bdacad

6 files changed

Lines changed: 1022 additions & 4 deletions

File tree

docs/oa_calibration_pipeline.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,26 @@ Build sparse calibration matrix from cloned dataset, bridging Phase 2 (clone-and
100100
---
101101

102102
### Phase 5: SQLite Target Database
103-
**Status: Not Started**
103+
**Status: Complete**
104+
105+
Hierarchical target storage with two parallel geographic branches:
106+
- Administrative: country → region → LA → MSOA → LSOA → OA
107+
- Parliamentary: country → constituency
104108

105-
Hierarchical target storage: UK → country → region → LA → constituency → MSOA → LSOA → OA.
109+
LA and constituency are parallel — a constituency can span multiple LAs and vice versa.
106110

107111
**Deliverables:**
108-
- `policyengine_uk_data/db/` directory with ETL scripts
109-
- Migrate existing CSV/Excel targets into SQLite
112+
- `policyengine_uk_data/db/schema.py` — SQLite schema: `areas` (geographic hierarchy), `targets` (definitions), `target_values` (year-indexed values)
113+
- `policyengine_uk_data/db/etl.py` — ETL loading areas from OA crosswalk + area code CSVs, targets from registry + local CSV/XLSX sources
114+
- `policyengine_uk_data/db/query.py` — query API: `get_targets()`, `get_area_targets()`, `get_area_children()`, `get_area_hierarchy()`
115+
- `tests/test_target_db.py` — tests covering schema creation, area hierarchy, target loading, queries
116+
117+
**Key design:**
118+
- Areas table with `parent_code` encoding hierarchy; LAs parent to regions, constituencies parent to countries
119+
- Targets loaded from two sources: registry (national/country/region via `get_all_targets()`) and local CSVs (constituency/LA age, income, UC, LA extras)
120+
- Query API supports filtering by geographic level, area code, variable, source, year
121+
- `get_area_hierarchy()` walks up the tree from any code (e.g. OA → LSOA → MSOA → LA → region → country)
122+
- Full rebuild via `python -m policyengine_uk_data.db.etl`
110123

111124
**US reference:** PR #398 (treasury) + PR #488 (db-work)
112125

policyengine_uk_data/db/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)