Skip to content

Commit 317a626

Browse files
MaxGhenisclaude
andauthored
Finish migration from yaml-changelog to towncrier (#795)
* Finish migration from yaml-changelog to towncrier; point CONTRIBUTING.md at shared guide This repo was partway through the yaml-changelog → towncrier migration already: `pr.yaml` ran `towncrier check`, `.github/bump_version.py` inferred version bumps from `changelog.d/` fragment types, and `pyproject.toml` carried `[tool.towncrier]` config. The stale bits that remained: - Unused dep `yaml-changelog>=0.1.7` in `pyproject.toml` - Unused `.github/workflows/reusable_changelog_check.yaml` (not wired into any call-graph) - Zero-byte `changelog_entry.yaml` at the repo root - 601-line `changelog.yaml` whose entire contents are already compiled into `CHANGELOG.md` - A CONTRIBUTING.md that described the old yaml-changelog flow and told contributors to edit `changelog_entry.yaml` This PR drops all of the above and rewrites CONTRIBUTING.md to point at the new shared PolicyEngine guide (https://github.com/PolicyEngine/.github/blob/main/CONTRIBUTING.md, proposed in PolicyEngine/.github#3) plus a repo-specific section on commands, test placement, dataset-versioning rules, and anti-patterns. After this lands the repo matches the rest of the org (uk-data, core, uk, us, microdf, policyengine.py) on a single towncrier flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update uv.lock after removing yaml-changelog Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 424b81b commit 317a626

5 files changed

Lines changed: 62 additions & 1035 deletions

File tree

.github/CONTRIBUTING.md

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,63 @@
1-
## Updating data
1+
# Contributing to policyengine-us-data
22

3-
If your changes present a non-bugfix change to one or more datasets which are cloud-hosted (CPS, ECPS and PUF), then please change both the filename and URL (in both the class definition file and in `storage/upload_completed_datasets.py`. This enables us to store historical versions of datasets separately and reproducibly.
3+
See the [shared PolicyEngine contribution guide](https://github.com/PolicyEngine/.github/blob/main/CONTRIBUTING.md) for cross-repo conventions (towncrier changelog fragments, `uv run`, PR description format, anti-patterns). This file covers policyengine-us-data specifics.
4+
5+
## Commands
6+
7+
```bash
8+
make install # install deps (uv)
9+
make format # format (required)
10+
make test-unit # unit tests (synthetic / mocked, seconds)
11+
make test-integration # integration tests (need built H5 datasets)
12+
make test # both
13+
make data # full dataset build (long)
14+
make push-pr-branch # push to upstream with correct tracking (use before opening PRs)
15+
uv run pytest tests/unit/datasets/ -v
16+
```
17+
18+
Python 3.12–3.14. Default branch: `main`.
19+
20+
## Test organisation
21+
22+
- `tests/unit/` — self-contained (synthetic data, mocks, checked-in fixtures). Run in seconds with no external deps.
23+
- `unit/datasets/` — dataset code
24+
- `unit/calibration/` — calibration code
25+
- `tests/integration/` — requires built H5 datasets, HuggingFace downloads, `Microsimulation` objects, or DB ETL. Named after the dataset under test (e.g. `test_cps.py` tests `cps_2024.h5`).
26+
27+
**Placement rules:**
28+
29+
- **Never** put tests that need H5 files or `Microsimulation` in `unit/`.
30+
- **Never** put synthetic-only tests in `integration/`.
31+
- Sanity checks (value ranges, population counts) go in the per-dataset integration file, not a separate sanity file.
32+
- When adding an integration test, extend the existing per-dataset file if one exists.
33+
34+
## Updating datasets
35+
36+
If your change is a non-bugfix update to a cloud-hosted dataset (CPS, enhanced CPS, PUF), bump both the filename and URL in the class definition and in `storage/upload_completed_datasets.py`. That lets us store historical dataset versions separately and reproducibly.
437

538
## Opening PRs
639

7-
Push PR branches to the upstream `PolicyEngine/policyengine-us-data` repository, not to a personal fork. From the repo root, run:
40+
**Always create branches on the upstream repo, not a fork.** Fork PRs can't access workflow secrets and will fail on data-download steps. The convenience target:
41+
42+
```bash
43+
make push-pr-branch
44+
```
45+
46+
pushes the current branch to `upstream` with the correct tracking so `gh pr create` just works.
47+
48+
## Repo-specific anti-patterns
49+
50+
- **Never fabricate data or results.** This is a research codebase; reproducible aggregates only. Use `[TO BE CALCULATED]` placeholders if a number isn't computed yet.
51+
- **Don't** open PRs from personal forks (CI will fail on secrets).
52+
- **Don't** add `[codex]` or other agent-label prefixes to PR titles.
53+
- **Don't** skip full-build CI when touching the imputation or calibration pipeline.
54+
- **Don't** commit large binary artefacts — HuggingFace storage only.
55+
56+
## CI workflows
857

9-
`make push-pr-branch`
58+
Five workflow files in `.github/workflows/`:
1059

11-
This avoids the fork-only CI failure path and sets the upstream tracking branch correctly before opening the PR.
60+
- `pr.yaml` — fork check, lint, uv.lock freshness, towncrier fragment check, unit tests, smoke test, docs build. Integration tests trigger when files in `policyengine_us_data/`, `modal_app/`, or `tests/integration/` change. ~2–3 min for the unit path.
61+
- `push.yaml` — on push to main: either version-bump + PyPI publish (on `Update package version` commits), or a full Modal data build with integration tests (on everything else).
62+
- `pipeline.yaml` — dispatch only, spawns the H5 generation pipeline on Modal with configurable GPU/epochs/workers.
63+
- `local_area_publish.yaml` / `local_area_promote.yaml` — manual dispatch to build/stage and then promote local-area H5 files.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Migrate the changelog tooling from `yaml-changelog` (`changelog_entry.yaml` + `changelog.yaml` + `build-changelog`) to towncrier (`changelog.d/<branch>.<type>.md` fragments). The repo's CI already ran `towncrier check` in `pr.yaml` and `bump_version.py` already read fragments from `changelog.d/`; this drops the leftover yaml-changelog artefacts (unused dep, unused reusable workflow, zero-byte `changelog_entry.yaml`, and duplicated `changelog.yaml` whose contents are already in `CHANGELOG.md`) so the tooling story matches the rest of the org.

0 commit comments

Comments
 (0)