Skip to content

Commit 17b2c66

Browse files
authored
Point CONTRIBUTING.md at the shared PolicyEngine guide (#361)
1 parent 46446f8 commit 17b2c66

2 files changed

Lines changed: 44 additions & 4 deletions

File tree

.github/CONTRIBUTING.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,46 @@
1-
## Updating data
1+
# Contributing to policyengine-uk-data
22

3-
If your changes present a non-bugfix change to one or more datasets which are cloud-hosted (FRS and EFRS), then please change both the filename and URL (in both the class definition file and in `storage/upload_completed_datasets.py`). This enables us to store historical versions of datasets separately and reproducibly.
3+
See the [shared PolicyEngine contribution guide](https://github.com/PolicyEngine/.github/blob/main/CONTRIBUTING.md) for cross-repo conventions (towncrier changelog fragments, `uv run`, PR description format, anti-patterns). This file covers policyengine-uk-data specifics.
44

5-
## Updating the versioning
5+
## Commands
66

7-
Please add to `changelog.yaml` and then run `make changelog` before committing the results ONCE in this PR.
7+
```bash
8+
make install # install deps (uv)
9+
make format # format (required)
10+
make download # download raw FRS + SPI inputs from HF (needs HUGGING_FACE_TOKEN)
11+
make data # full dataset build (impute, calibrate, upload)
12+
make test # test suite
13+
uv run pytest policyengine_uk_data/tests/path/to/test.py -v
14+
```
15+
16+
Python 3.13+. Default branch: `main`. Raw FRS / SPI microdata live on HuggingFace; set `HUGGING_FACE_TOKEN` before running anything that touches the dataset build.
17+
18+
## What lives here
19+
20+
This repo builds the `.h5` files that feed `policyengine-uk`:
21+
22+
- `datasets/frs.py` — raw FRS → PolicyEngine variable mapping
23+
- `datasets/imputations/` — QRF / other imputations layered on top (income, wealth, consumption, etc.)
24+
- `datasets/local_areas/` — constituency and local-authority calibration
25+
- `targets/` — calibration target sources (OBR, DWP, HMRC, ONS, SLC, etc.)
26+
- `utils/calibrate.py` — the reweighting optimiser
27+
- `storage/` — raw inputs, intermediate artefacts, published outputs
28+
29+
## Data-protection rules — no exceptions
30+
31+
The enhanced FRS dataset is licensed under strict UK Data Service terms. Violating them risks losing access, which would end PolicyEngine UK.
32+
33+
- **Never upload data to any public location.** The HuggingFace repo `policyengine/policyengine-uk-data-private` is private and authenticated.
34+
- **Never modify `upload_completed_datasets.py` or `utils/data_upload.py`** to change upload destinations without explicit confirmation from the data controller (currently Nikhil Woodruff).
35+
- **Never print, log, or output individual-level records.** Aggregates (sums, means, counts, weighted totals) are fine; individual rows are not.
36+
- **If you see a private/public repo split, assume it is intentional** — ask why before changing it.
37+
38+
## Updating datasets
39+
40+
If your change is a non-bugfix update to a cloud-hosted dataset (FRS, enhanced FRS), bump both the filename and URL in the class definition and in `storage/upload_completed_datasets.py`. That lets us store historical dataset versions separately and reproducibly.
41+
42+
## Repo-specific anti-patterns
43+
44+
- **Don't** hardcode dataset years in variable transforms; use `dataset.time_period` and the uprating pipeline.
45+
- **Don't** commit large binary artefacts — use HuggingFace storage.
46+
- **Don't** skip `make test` when touching the imputation or calibration pipeline; full CI rebuilds the dataset and takes ~25 minutes.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Point CONTRIBUTING.md at the shared PolicyEngine contribution guide (https://github.com/PolicyEngine/.github) and trim the per-repo file to commands, repo-specific conventions, and anti-patterns. Removes the stale `changelog_entry.yaml` / `make changelog` instructions.

0 commit comments

Comments
 (0)