Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
8ef214a
Convert SPI dataset to function
nikhilwoodruff Jul 13, 2025
482f577
Make progress on new entity dataset schema
nikhilwoodruff Jul 13, 2025
bf40ecd
Get extended FRS working
nikhilwoodruff Jul 13, 2025
495a824
Add imputed disability
nikhilwoodruff Jul 14, 2025
314c081
Add random variables
nikhilwoodruff Jul 14, 2025
46f17ad
Re-add datasets
nikhilwoodruff Jul 14, 2025
15d1a99
Fix bug
nikhilwoodruff Jul 14, 2025
83b0503
Add fixes
nikhilwoodruff Jul 14, 2025
794cf5a
Add changes
nikhilwoodruff Jul 14, 2025
839790d
Get calibration working!!
nikhilwoodruff Jul 14, 2025
5a185fb
Add changes
nikhilwoodruff Jul 14, 2025
0c2bf62
Add changes
nikhilwoodruff Jul 14, 2025
29063ae
Remove notebooks
nikhilwoodruff Jul 17, 2025
3fba11c
Merge branch 'main' of https://github.com/PolicyEngine/policyengine-u…
nikhilwoodruff Jul 17, 2025
f085b40
Adjust make test
nikhilwoodruff Jul 27, 2025
d493d4d
Merge branch 'main' of https://github.com/PolicyEngine/policyengine-u…
nikhilwoodruff Jul 27, 2025
4eba1dc
Versioning
nikhilwoodruff Jul 27, 2025
397b7b9
Fix action
nikhilwoodruff Jul 27, 2025
e576b4f
SingleYearDataset
nikhilwoodruff Jul 27, 2025
de76835
Use correct file
nikhilwoodruff Jul 27, 2025
bf417bb
Format
nikhilwoodruff Jul 27, 2025
a109374
Add changes
nikhilwoodruff Jul 28, 2025
e69dbbf
Fix bug in current education
nikhilwoodruff Jul 28, 2025
4bc468b
Refactor
nikhilwoodruff Jul 28, 2025
d8df4c4
Remove decode to str
nikhilwoodruff Jul 28, 2025
cbc9142
Fix uprating segments
nikhilwoodruff Jul 28, 2025
a2b6a82
Bump UK
nikhilwoodruff Jul 28, 2025
2e525a4
Add return type
nikhilwoodruff Jul 28, 2025
5ab420f
Add local area CSVs
nikhilwoodruff Jul 28, 2025
fbce0f4
Update tests
nikhilwoodruff Jul 29, 2025
f067f02
Ensure consistency with national demographics
nikhilwoodruff Jul 30, 2025
74beb46
Fix bounds
nikhilwoodruff Jul 30, 2025
2d62c5e
Silly error
nikhilwoodruff Jul 30, 2025
d0c26d5
Multiply demographics b 1e6
nikhilwoodruff Jul 30, 2025
641b34e
Standardise microcalibrate usage
nikhilwoodruff Jul 31, 2025
e871214
Update tests
nikhilwoodruff Aug 4, 2025
1e6ebd2
Remove tile usage
nikhilwoodruff Aug 4, 2025
c6d8fb0
Merge branch 'main' of https://github.com/PolicyEngine/policyengine-u…
nikhilwoodruff Aug 4, 2025
24c723c
Update calibrate.py
nikhilwoodruff Aug 4, 2025
9773ac3
Revert to old reweighting (microcalibrate being difficult)
nikhilwoodruff Aug 4, 2025
d690c49
Consolidate code
nikhilwoodruff Aug 4, 2025
72eca84
Update to fix bugs
nikhilwoodruff Aug 5, 2025
959ce91
Get working hopefully!!
nikhilwoodruff Aug 5, 2025
bd46438
Fix tests
nikhilwoodruff Aug 5, 2025
4f3d2fc
Increase epochs
nikhilwoodruff Aug 5, 2025
1743517
Remove None epochs
nikhilwoodruff Aug 5, 2025
0c37612
Add public services imputations
nikhilwoodruff Aug 5, 2025
daaebc5
Add docstrings
nikhilwoodruff Aug 5, 2025
ef435c6
Update reform impacts
nikhilwoodruff Aug 5, 2025
9eee308
Adjust tests
nikhilwoodruff Aug 5, 2025
6429d9f
Add download function
nikhilwoodruff Aug 5, 2025
af43462
Reduce population
nikhilwoodruff Aug 5, 2025
d556f9c
Adjust tests
nikhilwoodruff Aug 5, 2025
a382a75
Remove stamp duty target
nikhilwoodruff Aug 5, 2025
8a8d2fb
Add rich terminal output
nikhilwoodruff Aug 5, 2025
c467ccc
Adjust targets
nikhilwoodruff Aug 5, 2025
f75da83
Format
nikhilwoodruff Aug 5, 2025
b1be746
Adjust targets
nikhilwoodruff Aug 6, 2025
35c30e2
Fix bug
nikhilwoodruff Aug 6, 2025
cf4c314
Bump test
nikhilwoodruff Aug 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions .github/workflows/pull_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
python-version: 3.13
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install black
- name: Check formatting
run: black . -l 79 --check
test:
name: Build and test
name: Test
runs-on: ubuntu-latest
env:
HUGGING_FACE_TOKEN: ${{ secrets.HUGGING_FACE_TOKEN }}
Expand All @@ -42,21 +42,19 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
python-version: 3.13
- name: Install package
run: make install-uv
run: uv pip install -e ".[dev]" --system
- name: Download data inputs
run: make download
env:
HUGGING_FACE_TOKEN: ${{ secrets.HUGGING_FACE_TOKEN }}
- name: Build datasets
run: make data
env:
DATA_LITE: true
- name: Save calibration log
uses: actions/upload-artifact@v4
with:
name: calibration_log.csv
path: calibration_log.csv
- name: Run tests
run: pytest
run: make test
8 changes: 4 additions & 4 deletions .github/workflows/push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
python-version: 3.13
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand All @@ -44,13 +44,13 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
python-version: 3.13
- uses: "google-github-actions/auth@v2"
with:
workload_identity_provider: "projects/322898545428/locations/global/workloadIdentityPools/policyengine-research-id-pool/providers/prod-github-provider"
service_account: "policyengine-research@policyengine-research.iam.gserviceaccount.com"
- name: Install package
run: make install-uv
run: uv pip install -e ".[dev]" --system
- name: Download data inputs
run: make download
env:
Expand All @@ -63,7 +63,7 @@ jobs:
name: calibration_log.csv
path: calibration_log.csv
- name: Run tests
run: pytest
run: make test
- name: Upload data
run: make upload
env:
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
!tax_benefit.csv
!demographics.csv
!incomes_projection.csv
!policyengine_uk_data/datasets/frs/local_areas/**/*.csv
!policyengine_uk_data/datasets/local_areas/**/*.csv
**/_build
!policyengine_uk_data/storage/*.csv
**/version.json
26 changes: 3 additions & 23 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,21 @@ test:
pytest .

install:
pip install policyengine-uk
pip install policyengine>=2.4
pip install -e ".[dev]" --config-settings editable_mode=compat

install-uv:
uv pip install --system "jupyter-book>=2.0.0a0"
uv pip install --system -e ".[dev]" --config-settings editable_mode=compat
uv pip install -e ".[dev]" --config-settings editable_mode=compat

download:
python policyengine_uk_data/storage/download_private_prerequisites.py

upload:
python policyengine_uk_data/storage/upload_completed_datasets.py

docker:
docker buildx build --platform linux/amd64 . -t policyengine-uk-data:latest

documentation:
pip install --pre "jupyter-book>=2"
jb clean docs && jb build docs
python docs/add_plotly_to_book.py docs

data:
python policyengine_uk_data/datasets/frs/dwp_frs.py
python policyengine_uk_data/datasets/frs/frs.py
python policyengine_uk_data/datasets/frs/extended_frs.py
python policyengine_uk_data/datasets/frs/enhanced_frs.py
python policyengine_uk_data/datasets/frs/local_areas/constituencies/calibrate.py
python policyengine_uk_data/datasets/frs/local_areas/local_authorities/calibrate.py
python policyengine_uk_data/utils/create_multi_year_dataset.py
python policyengine_uk_data/storage/migrate_to_uk_single_year_datasets.py

efrs:
python policyengine_uk_data/datasets/frs/enhanced_frs.py
python policyengine_uk_data/datasets/frs/local_areas/constituencies/calibrate.py
python policyengine_uk_data/datasets/frs/local_areas/local_authorities/calibrate.py
python policyengine_uk_data/datasets/create_datasets.py

build:
python -m build
Expand Down
4 changes: 4 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- bump: patch
changes:
changed:
- Moved to functional, simplified architecture.
7 changes: 7 additions & 0 deletions policyengine_uk_data/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
from .datasets import *
from .storage.download_private_prerequisites import (
download_prerequisites,
check_prerequisites,
)

# Check prerequisites on import and warn if missing
check_prerequisites()
11 changes: 0 additions & 11 deletions policyengine_uk_data/datasets/__init__.py

This file was deleted.

158 changes: 158 additions & 0 deletions policyengine_uk_data/datasets/create_datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
from policyengine_uk_data.datasets.frs import create_frs
from policyengine_uk_data.storage import STORAGE_FOLDER
import logging
from policyengine_uk.data import UKSingleYearDataset
from policyengine_uk_data.utils.uprating import uprate_dataset
from policyengine_uk_data.utils.progress import (
ProcessingProgress,
display_success_panel,
display_error_panel,
)

logging.basicConfig(level=logging.INFO)


def main():
"""Create enhanced FRS dataset with rich progress tracking."""
try:
progress_tracker = ProcessingProgress()

# Define dataset creation steps
steps = [
"Create base FRS dataset",
"Impute consumption",
"Impute wealth",
"Impute VAT",
"Impute public service usage",
"Impute income",
"Impute capital gains",
"Uprate to 2025",
"Calibrate dataset",
"Downrate to 2023",
"Save final dataset",
]

with progress_tracker.track_dataset_creation(steps) as (
update_dataset,
nested_progress,
):

# Create base FRS dataset
update_dataset("Create base FRS dataset", "processing")
frs = create_frs(
raw_frs_folder=STORAGE_FOLDER / "frs_2023_24",
year=2023,
)
frs.save(STORAGE_FOLDER / "frs_2023_24.h5")
update_dataset("Create base FRS dataset", "completed")

# Import imputation functions
from policyengine_uk_data.datasets.imputations import (
impute_consumption,
impute_wealth,
impute_vat,
impute_income,
impute_capital_gains,
impute_services,
)

# Apply imputations with progress tracking
update_dataset("Impute consumption", "processing")
frs = impute_consumption(frs)
update_dataset("Impute consumption", "completed")

update_dataset("Impute wealth", "processing")
frs = impute_wealth(frs)
update_dataset("Impute wealth", "completed")

update_dataset("Impute VAT", "processing")
frs = impute_vat(frs)
update_dataset("Impute VAT", "completed")

update_dataset("Impute public service usage", "processing")
frs = impute_services(frs)
update_dataset("Impute public service usage", "completed")

update_dataset("Impute income", "processing")
frs = impute_income(frs)
update_dataset("Impute income", "completed")

update_dataset("Impute capital gains", "processing")
frs = impute_capital_gains(frs)
update_dataset("Impute capital gains", "completed")

# Uprate dataset
update_dataset("Uprate to 2025", "processing")
frs = uprate_dataset(frs, 2025)
update_dataset("Uprate to 2025", "completed")

# Calibrate dataset with nested progress
from policyengine_uk_data.datasets.local_areas.constituencies.calibrate import (
calibrate,
)

update_dataset("Calibrate dataset", "processing")

# Use a separate progress tracker for calibration with nested display
from policyengine_uk_data.utils.calibrate import (
calibrate_local_areas,
)
from policyengine_uk_data.datasets.local_areas.constituencies.loss import (
create_constituency_target_matrix,
create_national_target_matrix,
)
from policyengine_uk_data.datasets.local_areas.constituencies.calibrate import (
get_performance,
)

# Run calibration with verbose progress
frs_calibrated = calibrate_local_areas(
dataset=frs,
matrix_fn=create_constituency_target_matrix,
national_matrix_fn=create_national_target_matrix,
area_count=650,
weight_file="parliamentary_constituency_weights.h5",
excluded_training_targets=[],
log_csv="calibration_log.csv",
verbose=True, # Enable nested progress display
area_name="Constituency",
get_performance=get_performance,
nested_progress=nested_progress, # Pass the nested progress manager
)

update_dataset("Calibrate dataset", "completed")

# Downrate and save
update_dataset("Downrate to 2023", "processing")
frs_calibrated = uprate_dataset(frs_calibrated, 2023)
update_dataset("Downrate to 2023", "completed")

update_dataset("Save final dataset", "processing")
frs_calibrated.save(STORAGE_FOLDER / "enhanced_frs_2023_24.h5")
update_dataset("Save final dataset", "completed")

# Display success message
display_success_panel(
"Dataset creation completed successfully",
details={
"base_dataset": "frs_2023_24.h5",
"enhanced_dataset": "enhanced_frs_2023_24.h5",
"imputations_applied": "consumption, wealth, VAT, services, income, capital_gains",
"calibration": "national and constituency targets",
},
)

except Exception as e:
display_error_panel(
f"Dataset creation failed: {str(e)}",
suggestions=[
"Check that all required data files are present in storage folder",
"Verify sufficient disk space for dataset creation",
"Review log files for detailed error information",
],
)
raise


if __name__ == "__main__":
main()
Loading