Skip to content

Migrate versioning workflow from PAT to GitHub App token#640

Closed
anth-volk wants to merge 6 commits into
mainfrom
fix/fix-us-data-pypi
Closed

Migrate versioning workflow from PAT to GitHub App token#640
anth-volk wants to merge 6 commits into
mainfrom
fix/fix-us-data-pypi

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

Fixes #638

Summary

  • Replaced expired PAT (POLICYENGINE_GITHUB) with a GitHub App token in the versioning workflow
  • Uses actions/create-github-app-token@v1 with APP_ID and APP_PRIVATE_KEY secrets (already configured in the repo)
  • Matches the pattern used in policyengine-api-v2-alpha

Test plan

  • Merge a PR with a changelog fragment and verify the versioning workflow successfully pushes the "Update package version" commit
  • Verify that commit triggers code_changes.yaml and the PyPI publish step runs

🤖 Generated with Claude Code

Fixes #638. The versioning workflow used a PAT (POLICYENGINE_GITHUB)
to push the "Update package version" commit, which broke when the
token expired. Switch to a GitHub App token via
actions/create-github-app-token@v1, matching the pattern used in
policyengine-api-v2-alpha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk marked this pull request as ready for review March 26, 2026 22:47
@anth-volk anth-volk requested a review from juaristi22 March 26, 2026 22:47
Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude aggressively posted this comment on my behalf:

One thing I initially flagged: the `EndBug/add-and-commit@v9` step doesn't explicitly receive the app token — so I worried it might fall back to the default `GITHUB_TOKEN` for the push, which wouldn't trigger downstream workflows (like PyPI publish via `code_changes.yaml`).

After digging in, this is not a concern. Here's why:

- `actions/checkout@v4` with `token: <app-token>` persists that token into the local git config via an `http.extraheader` credential (this is the default `persist-credentials: true` behavior).
- `EndBug/add-and-commit@v9` just shells out to `git push` — it uses whatever credentials the local repo already has configured. Its `github_token` input is only used for GitHub API calls (fetching user info for commit metadata), not for git operations.
- Their README confirms: *"When pushing, the action uses the token that the local git repository has been configured with."*

So the version-bump commit will be pushed with the GitHub App token, which will correctly trigger subsequent workflow runs. LGTM.

So an LGTM is coming and I'll try to get this in before I go to sleep tonight.

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a note about Claude's anxiousness about a token carrying over, but an LGTM from the human.

@juaristi22
Copy link
Copy Markdown
Collaborator

There are other workflow files like reusable_tests.yaml that use similar tokens like secrets.POLICYENGINE_US_DATA_GITHUB_TOKEN or code_changes.yaml (which also seems responsible for publishing), that might also be affected by this. I'm not sure whether we want to trigger the entire test suite again on main (considering costs) now that we have the constraint that PRs need to be approved before merging. It might be worth reviewing the entire workflow now that running the pipeline has come in?

(Eg, the test suite was also responsible for publishing datasets, but now the pipeline run also does that, we might want to decide on a single responsible step -- we are overwriting them anyway otherwise). What do you think @baogorek @anth-volk ?

@anth-volk
Copy link
Copy Markdown
Collaborator Author

@juaristi22 This was just a minimal fix, as I figured PyPI publishing might be important to have working again. I don't think the current CI/CD pipeline is sufficient. I could transform this PR into that, or do separately, but would be happy to propose a solution unless anyone else is super eager to do the plumbing.

@anth-volk anth-volk marked this pull request as draft March 27, 2026 16:38
anth-volk and others added 5 commits March 30, 2026 20:30
Split tests into unit/ (self-contained, synthetic data, mocks) and
integration/ (requires built H5 datasets). Unit sub-folders use no
test_ prefix (datasets/, calibration/) to avoid confusion with
integration tests.

- Move 30+ unit test files to tests/unit/ with calibration/ and
  datasets/ sub-directories
- Move 11 integration test files to tests/integration/
- Merge test_dataset_sanity.py into per-dataset integration files
  (test_cps.py, test_enhanced_cps.py, test_sparse_enhanced_cps.py)
- Rename calibration integration tests to match dataset names
  (test_source_imputed_cps_masking.py, _consistency.py)
- Move top-level tests/ files into appropriate unit/ or integration/
- Add integration conftest.py with skip logic and shared fixtures
- Update pyproject.toml testpaths and add pytest-cov dependency
- Update modal_app/data_build.py TEST_MODULES
- Add make test-unit and make test-integration targets
- Fix Path(__file__) references in moved test files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete 7 deprecated/redundant workflow files and replace with 3
consolidated workflows matching the policyengine-api-v2-alpha pattern.

pr.yaml: fork check, lint, uv.lock, changelog, unit tests with
Codecov (informational), smoke test. Runs in ~2-3 minutes.

push.yaml: two paths — version bump commits publish to PyPI; all
other commits run per-dataset Modal builds with integration tests
after each stage, then manual approval gate, then pipeline dispatch.

pipeline.yaml: remove push trigger (now dispatched from push.yaml),
add scope input (all/national/state/congressional/local/test), add
summary with Modal dashboard link.

Also adds:
- .codecov.yml with informational-only coverage reporting
- --script mode to data_build.py for per-dataset Modal execution
- SCRIPT_SHORT_NAMES mapping for human-friendly script names
- run_single_script() Modal function for single-dataset builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New queue-based H5 build system that replaces the partition-based
N-worker model with one-container-per-item processing.

- generate_work_items(scope, db_path): auto-generates work item lists
  filtered by scope (all/national/state/congressional/local/test).
  Test scope builds national + NY + NV-01 only.
- build_single_area(): Modal function (1 CPU, 16GB) that processes
  exactly one work item per container via worker_script.py
- queue_coordinator(): spawns up to 50 single-item workers, collects
  results. No multi-threading, no chunking.
- main_queue entrypoint for CLI access
- Wire scope parameter from pipeline.yaml through run_pipeline()
- Fall back to legacy coordinate_publish for scope=all

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add testing standards: unit vs integration placement rules, per-dataset
naming convention, make test-unit/test-integration commands. Add CI/CD
overview documenting the four workflow files and their triggers. Update
Python version from 3.11 to 3.12-3.13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk
Copy link
Copy Markdown
Collaborator Author

Closing in favor of a new PR that supersedes this with a broader CI/CD restructuring (test reorganization, workflow consolidation, queue-based Modal architecture). The PAT→App token fix from this PR is included in the new one.

@anth-volk anth-volk closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix versioning workflow: migrate from expired PAT to GitHub App token

3 participants