act-based workflow smoke tests + sandbox infrastructure

Follow-up to PR #291's testing discussion (the workflow paths are only verified by firing for real).

[`actionlint`](https://github.com/rhysd/actionlint) is now landing via #292 — static analysis is Layer 1. This issue tracks the more substantive testing infrastructure.

## Goals

Confirm `upload_benchmark_result.yaml` + `aggregate_benchmark_results.yaml` + `regenerate_entities.yaml` work end-to-end **without** writing to the real `_results` / `_web` branches, so we can iterate on workflow logic without polluting production state.

## Proposal

### Layer 2 — `act` smoke tests in CI

[`act`](https://github.com/nektos/act) runs GitHub Actions workflows locally via Docker. Supports `workflow_call`. Add a CI job that:

1. Synthesizes an `issues.labeled` event payload (committed to the repo as e.g. `.github/test_events/issue_labeled.json`).
2. Runs the workflow under `act` with `MEDS_DEV_DRY_RUN=1`.
3. Asserts the workflow completes without error.

Workflow changes needed:

- Gate the actual `git push origin _results` / `git push origin _web` steps behind `if: env.MEDS_DEV_DRY_RUN != '1'` (or equivalent).
- Optionally: emit a structured "would have pushed: <ref>" line on dry runs so the test can assert the intended side-effects without performing them.

Catches:

- Wrong action versions / typos / missing inputs (already covered by actionlint, but redundant safety doesn't hurt).
- `workflow_call` plumbing — job dependencies, permission inheritance, output passing.
- Issue-extraction → validate → would-push flow against a synthetic event payload.
- Most expression-language bugs.

Doesn't catch:

- Token-scoping issues (act uses a different auth model).
- Real GitHub branch-protection rules.
- Network failure modes (rate limits, transient GitHub API errors).

### Layer 3 — sandbox repo / branches (manual, periodic)

For what `act` can't verify, maintain a separate sandbox where the real workflows can be fired without polluting production. Two options:

- **Separate repo**: clone of MEDS-DEV used only for workflow shakeouts. Pro: complete isolation. Con: drift from main repo state.
- **`test_*` branch prefix in the same repo**: workflows accept an input/env var that swaps `_results` → `test_results` and `_web` → `test_web`. Pro: same repo state. Con: more workflow complexity, easier to accidentally cross streams.

Probably start with the separate-repo approach.

### Layer 4 (stretch) — full integration with a docker-compose stack

If we ever wanted to be really thorough: spin up a local GitHub Enterprise emulator or use [a self-hosted runner](https://docs.github.com/en/actions/hosting-your-own-runners) in a controlled environment. Probably overkill for the scale of MEDS-DEV.

## Acceptance criteria

- [ ] `MEDS_DEV_DRY_RUN` (or equivalent) gating is in the three workflows, with documentation in `src/MEDS_DEV/web/README.md`.
- [ ] `.github/test_events/*.json` payload fixtures committed.
- [ ] CI job runs `act` against the synthetic payloads on every PR that touches `.github/workflows/**`.
- [ ] Sandbox-repo runbook documented (where it lives, how to refresh it, expected manual cadence).

## Related

- PR #291 — the workflow_call wiring this would smoke-test.
- PR #292 — actionlint (Layer 1).
- #238 — original symptom that motivated all this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

act-based workflow smoke tests + sandbox infrastructure #293

Goals

Proposal

Layer 2 — `act` smoke tests in CI

Layer 3 — sandbox repo / branches (manual, periodic)

Layer 4 (stretch) — full integration with a docker-compose stack

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

act-based workflow smoke tests + sandbox infrastructure #293

Description

Goals

Proposal

Layer 2 — act smoke tests in CI

Layer 3 — sandbox repo / branches (manual, periodic)

Layer 4 (stretch) — full integration with a docker-compose stack

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Layer 2 — `act` smoke tests in CI