Follow-up to PR #291's testing discussion (the workflow paths are only verified by firing for real).
actionlint is now landing via #292 — static analysis is Layer 1. This issue tracks the more substantive testing infrastructure.
Goals
Confirm upload_benchmark_result.yaml + aggregate_benchmark_results.yaml + regenerate_entities.yaml work end-to-end without writing to the real _results / _web branches, so we can iterate on workflow logic without polluting production state.
Proposal
Layer 2 — act smoke tests in CI
act runs GitHub Actions workflows locally via Docker. Supports workflow_call. Add a CI job that:
- Synthesizes an
issues.labeled event payload (committed to the repo as e.g. .github/test_events/issue_labeled.json).
- Runs the workflow under
act with MEDS_DEV_DRY_RUN=1.
- Asserts the workflow completes without error.
Workflow changes needed:
- Gate the actual
git push origin _results / git push origin _web steps behind if: env.MEDS_DEV_DRY_RUN != '1' (or equivalent).
- Optionally: emit a structured "would have pushed: " line on dry runs so the test can assert the intended side-effects without performing them.
Catches:
- Wrong action versions / typos / missing inputs (already covered by actionlint, but redundant safety doesn't hurt).
workflow_call plumbing — job dependencies, permission inheritance, output passing.
- Issue-extraction → validate → would-push flow against a synthetic event payload.
- Most expression-language bugs.
Doesn't catch:
- Token-scoping issues (act uses a different auth model).
- Real GitHub branch-protection rules.
- Network failure modes (rate limits, transient GitHub API errors).
Layer 3 — sandbox repo / branches (manual, periodic)
For what act can't verify, maintain a separate sandbox where the real workflows can be fired without polluting production. Two options:
- Separate repo: clone of MEDS-DEV used only for workflow shakeouts. Pro: complete isolation. Con: drift from main repo state.
test_* branch prefix in the same repo: workflows accept an input/env var that swaps _results → test_results and _web → test_web. Pro: same repo state. Con: more workflow complexity, easier to accidentally cross streams.
Probably start with the separate-repo approach.
Layer 4 (stretch) — full integration with a docker-compose stack
If we ever wanted to be really thorough: spin up a local GitHub Enterprise emulator or use a self-hosted runner in a controlled environment. Probably overkill for the scale of MEDS-DEV.
Acceptance criteria
Related
Follow-up to PR #291's testing discussion (the workflow paths are only verified by firing for real).
actionlintis now landing via #292 — static analysis is Layer 1. This issue tracks the more substantive testing infrastructure.Goals
Confirm
upload_benchmark_result.yaml+aggregate_benchmark_results.yaml+regenerate_entities.yamlwork end-to-end without writing to the real_results/_webbranches, so we can iterate on workflow logic without polluting production state.Proposal
Layer 2 —
actsmoke tests in CIactruns GitHub Actions workflows locally via Docker. Supportsworkflow_call. Add a CI job that:issues.labeledevent payload (committed to the repo as e.g..github/test_events/issue_labeled.json).actwithMEDS_DEV_DRY_RUN=1.Workflow changes needed:
git push origin _results/git push origin _websteps behindif: env.MEDS_DEV_DRY_RUN != '1'(or equivalent).Catches:
workflow_callplumbing — job dependencies, permission inheritance, output passing.Doesn't catch:
Layer 3 — sandbox repo / branches (manual, periodic)
For what
actcan't verify, maintain a separate sandbox where the real workflows can be fired without polluting production. Two options:test_*branch prefix in the same repo: workflows accept an input/env var that swaps_results→test_resultsand_web→test_web. Pro: same repo state. Con: more workflow complexity, easier to accidentally cross streams.Probably start with the separate-repo approach.
Layer 4 (stretch) — full integration with a docker-compose stack
If we ever wanted to be really thorough: spin up a local GitHub Enterprise emulator or use a self-hosted runner in a controlled environment. Probably overkill for the scale of MEDS-DEV.
Acceptance criteria
MEDS_DEV_DRY_RUN(or equivalent) gating is in the three workflows, with documentation insrc/MEDS_DEV/web/README.md..github/test_events/*.jsonpayload fixtures committed.actagainst the synthetic payloads on every PR that touches.github/workflows/**.Related