Skip to content

Update SPI ingestion to 2022-23#412

Merged
MaxGhenis merged 9 commits into
mainfrom
codex/update-ukds-inputs-2024
May 24, 2026
Merged

Update SPI ingestion to 2022-23#412
MaxGhenis merged 9 commits into
mainfrom
codex/update-ukds-inputs-2024

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented May 24, 2026

Summary

  • switch SPI private prerequisite downloads from spi_2020_21.zip to the existing spi_2022_23.zip private HF artifact
  • centralize the current SPI release name, tab filename, fiscal year, and H5 filename in datasets/spi.py
  • update income imputation to train from put2223uk.tab
  • scope the cached income QRF model by SPI release metadata so old 2020-21-trained income.pkl files are not silently reused
  • make SPI age and region handling tolerate current-file codes like AGERANGE=-1 and GORCODE values outside 1-12 without silently mapping them to London
  • regenerate storage/incomes_projection.csv from the 2022-23 SPI input and make the projection builder validate/rebuild stale SPI H5 files before use
  • keep full income model training at 100k SPI draws, while reducing the TESTING=1 data-build sample to 10k so PR CI stays bounded

Context

policyengine/policyengine-uk-data-private already contains spi_2022_23.zip and spi_2022_23.h5, but the income-imputation donor path still pointed at spi_2020_21/put2021uk.tab. I did not find an open issue specifically for that donor-file switch.

Related open issues are about target vintages rather than this private prerequisite path:

This PR advances the current SPI microdata input to 2022-23 and regenerates the checked-in SPI income projections from that base. The remaining stale non-SPI private inputs are tracked separately in #411.

Validation

  • TESTING=1 uv run --python 3.13 pytest policyengine_uk_data/tests/test_income_projection.py policyengine_uk_data/tests/test_spi_build.py policyengine_uk_data/tests/test_frs_prerequisites.py -q
  • uv run --python 3.13 ruff format --check policyengine_uk_data/datasets/imputations/income.py policyengine_uk_data/utils/incomes_projection.py policyengine_uk_data/tests/test_spi_build.py
  • git diff --check

@MaxGhenis MaxGhenis marked this pull request as ready for review May 24, 2026 04:02
@MaxGhenis MaxGhenis merged commit 2ca20b8 into main May 24, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the codex/update-ukds-inputs-2024 branch May 24, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant