PolicyEngine's project to build accurate UK household survey data.
For a full local dataset build:
- Ensure the private prerequisite folders exist under
policyengine_uk_data/storage/. - Use Python 3.13. Python 3.14 currently fails while loading PyTables/Blosc2 in this repo.
- Prefer the sibling
policyengine-ukcheckout when building locally, because the published wheel in your active environment may not expose all variables required by the data pipeline.
If ../policyengine-uk exists, you can run:
make data-localWhen the pipeline is extended to produce a sequence of yearly snapshots (see issue #345), three identifier columns are the panel keys that link rows across years:
| Table | ID column |
|---|---|
household |
household_id |
benunit |
benunit_id |
person |
person_id |
These IDs are deterministic functions of the FRS sernum (see
policyengine_uk_data/datasets/frs.py) and must be preserved byte-for-byte
by every downstream transformation so that snapshot Y and snapshot Y + 1
can be joined on them. Use
policyengine_uk_data.utils.panel_ids.assert_panel_id_consistency to check
this invariant when adding new save-time or ageing logic.