Skip to content

Commit f02f5f2

Browse files
authored
Coverage-based test selection (done right) — shadow mode (#1461)
* docs: design spec for coverage-based test selection (done right) * docs: implementation plan for coverage-based test selection * feat(test): add stable param_hash key for coverage selection * feat(test): coverage map load/save with freshness metadata * feat(test): ALWAYS_RUN_ALL classification for coverage selection * feat(test): conservative coverage selection ladder * feat(test): TestCase.coverage_key() = param_hash(params) * feat(test): robust changed-file detection (CI list + self-healing git) * feat(test): coverage selection summary line * feat(test): coverage map health check (loud anti-rot) * fix(test): key coverage_key on full params, not mods (soundness) * feat(test): gcov coverage-map collector, keyed by param_hash * feat(test): wire --only-changes (shadow) and --build-coverage-map * ci: coverage map refresh + health workflows * fix(test): coverage_key on TestCaseBuilder + treat empty coverage as run (soundness) * feat(test): accept space/comma/newline separated --changed-files (CI paths-filter) * fix(test): force full run on src/**/include/ changes (gcov can't attribute includes) * Delete docs/superpowers/plans/2026-05-29-coverage-test-selection.md * Delete docs/superpowers directory * fix(test): empty --changed-files is uncertainty (run all), not skip-all * fix(test): ALWAYS_RUN_ALL covers test/run infra; cases.py runs new tests via rung 5 (soundness) * fix(coverage): exclude failed/partial-coverage tests from coverage map Fix 1: in build_coverage_map Phase 2, a test with non-empty failures produced only partial .gcda files (crash mid-pipeline). Previously those tests were still added to test_results, and their truncated coverage was cached. A later .fpp change that ran only in the missing stage would be silently skipped. Fix: when failures is non-empty, record in all_failures and continue without adding to test_results — absent entries are conservatively included by select_tests (rung 5). Fix 2: in _parse_gcov_json_output, a mid-stream json.JSONDecodeError returned the partial result set, which is untrustworthy (the truncated JSON stream may be missing coverage for .fpp files that were not yet serialised). Fix: return None on that error path so the caller omits the test from the map entirely. Fix 6 (comment): correct the post_process comment (~line 347) — the regular suite runs post_process only under --test-all (which CI sets), not never. * fix(coverage): broaden always-run-all and rung3 Fortran extension matching Fix 3: ALWAYS_RUN_ALL_EXACT + prefixes enumerated only a handful of toolchain files, missing case.py, build.py, common.py, state.py, sched.py, etc. Any toolchain/mfc/*.py change (except cases.py) affects every test's generation or execution, so under-enumeration was unsound. Replace with a catch-all: any(f.startswith('toolchain/mfc/') and f.endswith('.py') and f != CASES_PY). Drop the now-redundant individual file entries and toolchain/mfc/params/ and toolchain/mfc/run/ prefixes (all subsumed). Keep CMakeLists.txt, toolchain/cmake/, toolchain/bootstrap/, and src include rules. Fix 4: rung 3 matched only .f90 and .f, missing .F90, .F95, .F03, .F08, .FOR and all other uppercase/mixed variants. Changed files ending in those extensions under src/ would fall through to per-test selection against a coverage map that only tracks .fpp, causing silent under-inclusion. Fix: case-insensitive match against the full tuple (.f90, .f, .f95, .f03, .f08, .for). Tests: add three new unit tests covering the above fixes. * ci(coverage-refresh): add contents:write permission and document push caveat Add 'permissions: contents: write' at the workflow top level so the coverage-refresh job is authorized to commit and push the updated coverage_map.json.gz back to master. Without this, the GITHUB_TOKEN has only read permissions in newer default permission settings. Also add a comment on the git push step noting that branch protection may still reject the default GITHUB_TOKEN and that a PAT or GitHub App with bypass-branch-protection permission may be needed. * ci: coverage selection shadow mode on self-hosted jobs (git-detected changed files) * fix(coverage): invert ALWAYS_RUN_ALL to unattributable-catch-all and validate map shape Fix 1: Replace the allowlist of run-all files with an inverted design: define a small, conservative allowlist of provably test-irrelevant files (docs/*.md, LICENSE, etc.) and treat any changed file that is not .fpp, not cases.py, and not in that allowlist as unattributable -> run all. This closes the gap where mfc.sh, .github/**, tests/**, toolchain/pyproject.toml, and similar files would silently fall through to rung-7 (skip-all) under --select-enforce. Fix 2: In load_map, validate that every entry is str -> list[str] after popping _meta. A malformed entry returns (None, None) so the caller runs the full suite rather than silently misrouting tests. Tests: expand test_docs_only_still_skips_all to cover docs/, LICENSE, .claude/; add test_unattributable_nonsource_change_runs_all for mfc.sh / pyproject.toml / tests/** / .github/workflows; add test_load_map_rejects_malformed_entry. * fix(cli): clarify --changed-files help to mention space-separated input * fix(ci): gate --changed-files with --only-changes via bash array (PR-only) * fix(coverage-build): add mpiexec as fallback MPI launcher between mpirun and srun
1 parent 574e53d commit f02f5f2

13 files changed

Lines changed: 1343 additions & 2 deletions

File tree

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
"""Fail loudly if the committed coverage map is stale or under-covers. Used by coverage-health.yml."""
2+
import datetime
3+
import sys
4+
from pathlib import Path
5+
6+
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "toolchain"))
7+
from mfc.test.coverage import COVERAGE_MAP_PATH, load_map, map_health # noqa: E402
8+
from mfc.test.cases import list_cases # noqa: E402 (returns the current test list)
9+
10+
MAX_AGE_DAYS = 10
11+
MIN_FRACTION = 0.80
12+
13+
entries, meta = load_map(COVERAGE_MAP_PATH)
14+
if entries is None:
15+
sys.exit("Coverage map missing or corrupt.")
16+
current_keys = {b.to_case().coverage_key() for b in list_cases()}
17+
ok, msg = map_health(
18+
meta=meta,
19+
current_keys=current_keys,
20+
mapped_keys=set(entries),
21+
now=datetime.datetime.now(datetime.timezone.utc).isoformat(),
22+
max_age_days=MAX_AGE_DAYS,
23+
min_fraction=MIN_FRACTION,
24+
)
25+
print(msg)
26+
sys.exit(0 if ok else 1)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
set -e
3+
NJOBS="${SLURM_CPUS_ON_NODE:-24}"; [ "$NJOBS" -gt 64 ] && NJOBS=64
4+
./mfc.sh clean
5+
source .github/scripts/retry-build.sh
6+
retry_build ./mfc.sh build --gcov -j 8
7+
./mfc.sh test --build-coverage-map --gcov -j "$NJOBS"

.github/workflows/common/test.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,12 @@ if [ -n "${job_shard:-}" ]; then
5757
shard_opts="--shard $job_shard"
5858
fi
5959

60-
./mfc.sh test -v --max-attempts 3 --no-build -a -j $n_test_threads $rdma_opts $device_opts $build_opts $shard_opts -- -c $job_cluster
60+
# Coverage-based test selection in SHADOW mode on PRs: prints what it WOULD select but the
61+
# full suite still runs (no --select-enforce). Changed files come from git detection
62+
# (self-healing deepen) since the SLURM job doesn't receive the paths-filter list.
63+
select_opts=""
64+
if [ "${GITHUB_EVENT_NAME:-}" = "pull_request" ]; then
65+
select_opts="--only-changes"
66+
fi
67+
68+
./mfc.sh test -v --max-attempts 3 --no-build $select_opts -a -j $n_test_threads $rdma_opts $device_opts $build_opts $shard_opts -- -c $job_cluster
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# .github/workflows/coverage-health.yml
2+
name: 'Coverage Map Health'
3+
on:
4+
schedule:
5+
- cron: '0 7 * * *' # daily; loud if the refresh stopped working
6+
workflow_dispatch:
7+
jobs:
8+
health:
9+
if: github.repository == 'MFlowCode/MFC'
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
- uses: actions/setup-python@v5
14+
with: { python-version: '3.12' }
15+
- name: Initialize MFC
16+
run: ./mfc.sh init
17+
- name: Check coverage map freshness
18+
run: build/venv/bin/python3 .github/scripts/check_coverage_map_health.py
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# .github/workflows/coverage-refresh.yml
2+
name: 'Coverage Map Refresh'
3+
on:
4+
schedule:
5+
- cron: '0 6 * * 1' # weekly floor
6+
push:
7+
branches: [master]
8+
paths:
9+
- 'toolchain/mfc/test/cases.py'
10+
- 'src/**/*.fpp'
11+
workflow_dispatch:
12+
permissions:
13+
contents: write
14+
concurrency:
15+
group: coverage-refresh
16+
cancel-in-progress: true
17+
jobs:
18+
refresh:
19+
if: github.repository == 'MFlowCode/MFC'
20+
timeout-minutes: 240
21+
runs-on:
22+
group: phoenix
23+
labels: gt
24+
steps:
25+
- uses: actions/checkout@v4
26+
with: { clean: false }
27+
- name: Build + collect coverage map (SLURM)
28+
run: bash .github/scripts/submit-slurm-job.sh .github/workflows/common/coverage-refresh.sh cpu none phoenix
29+
- name: Commit refreshed map
30+
run: |
31+
if ! git diff --quiet tests/coverage_map.json.gz; then
32+
git config user.name "mfc-bot"
33+
git config user.email "mfc-bot@users.noreply.github.com"
34+
git add tests/coverage_map.json.gz
35+
git commit -m "test: refresh coverage map [skip ci]"
36+
# NOTE: pushing to a protected default branch requires a token or
37+
# GitHub App with bypass-branch-protection permission. The default
38+
# GITHUB_TOKEN may be rejected by branch protection rules; if so,
39+
# configure a PAT or App token with the `contents: write` scope and
40+
# pass it as `GITHUB_TOKEN` in the environment for this step.
41+
git push origin HEAD:master
42+
else
43+
echo "Coverage map unchanged."
44+
fi

.github/workflows/test.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ jobs:
5353
runs-on: 'ubuntu-latest'
5454
outputs:
5555
checkall: ${{ steps.changes.outputs.checkall }}
56+
changed_files: ${{ steps.changes.outputs.checkall_files }}
5657
steps:
5758
- name: Clone
5859
uses: actions/checkout@v4
@@ -62,6 +63,7 @@ jobs:
6263
id: changes
6364
with:
6465
filters: ".github/file-filter.yml"
66+
list-files: shell
6567

6668
github:
6769
name: ${{ matrix.nvhpc && format('NVHPC {0} ({1})', matrix.nvhpc, matrix.target) || format('Github ({0}, {1}, {2}, {3})', matrix.os, matrix.mpi, matrix.debug, matrix.intel && 'intel' || 'GNU') }}
@@ -262,11 +264,17 @@ jobs:
262264
- name: Test
263265
if: '!matrix.nvhpc'
264266
run: |
265-
/bin/bash mfc.sh test -v --max-attempts 3 -j $(nproc) $TEST_ALL $TEST_PCT $PRECISION
267+
# Coverage-based test selection in SHADOW mode on PRs: the selector
268+
# prints what it WOULD run, but the full suite still runs (no
269+
# --select-enforce). Enforcement is a separate, later change.
270+
SELECT=()
271+
[ "${{ github.event_name }}" = "pull_request" ] && SELECT=(--only-changes --changed-files "$CHANGED_FILES")
272+
/bin/bash mfc.sh test -v --max-attempts 3 -j $(nproc) "${SELECT[@]}" $TEST_ALL $TEST_PCT $PRECISION
266273
env:
267274
TEST_ALL: ${{ matrix.mpi == 'mpi' && '--test-all' || '' }}
268275
TEST_PCT: ${{ matrix.debug == 'reldebug' && '-% 20' || '' }}
269276
PRECISION: ${{ matrix.precision != '' && format('--{0}', matrix.precision) || '' }}
277+
CHANGED_FILES: ${{ needs.file-changes.outputs.changed_files }}
270278

271279
# ── NVHPC build + test (via docker exec into long-lived container) ──
272280
- name: Build (NVHPC)

tests/coverage_map.json.gz

14.1 KB
Binary file not shown.

toolchain/mfc/cli/commands.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -464,6 +464,25 @@
464464
type=str,
465465
default=None,
466466
),
467+
Argument(
468+
name="build-coverage-map", dest="build_coverage_map", action=ArgAction.STORE_TRUE, default=False, help="Build the gcov coverage map (requires a prior --gcov build). Master-side only."
469+
),
470+
Argument(
471+
name="only-changes",
472+
dest="only_changes",
473+
action=ArgAction.STORE_TRUE,
474+
default=False,
475+
help="Select only tests whose covered files overlap changed files (shadow mode unless --select-enforce).",
476+
),
477+
Argument(
478+
name="select-enforce",
479+
dest="select_enforce",
480+
action=ArgAction.STORE_TRUE,
481+
default=False,
482+
help="With --only-changes, actually skip unselected tests (otherwise shadow: print selection, run all).",
483+
),
484+
Argument(name="changed-files", dest="changed_files", type=str, default=None, help="Changed-file list (newline-, space-, or comma-separated; from CI paths-filter). Overrides git detection."),
485+
Argument(name="changes-branch", dest="changes_branch", type=str, default="master", help="Branch to diff against for --only-changes."),
467486
],
468487
mutually_exclusive=[
469488
MutuallyExclusiveGroup(

toolchain/mfc/test/case.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,11 @@ def get_trace(self) -> str:
246246
def get_uuid(self) -> str:
247247
return trace_to_uuid(self.trace)
248248

249+
def coverage_key(self) -> str:
250+
from .coverage import param_hash
251+
252+
return param_hash(self.params)
253+
249254
def get_dirpath(self):
250255
return os.path.join(common.MFC_TEST_DIR, self.get_uuid())
251256

@@ -371,6 +376,9 @@ class TestCaseBuilder:
371376
def get_uuid(self) -> str:
372377
return trace_to_uuid(self.trace)
373378

379+
def coverage_key(self) -> str:
380+
return self.to_case().coverage_key()
381+
374382
def to_case(self) -> TestCase:
375383
if self.kind == "convergence":
376384
# Convergence cases drive their own runs — the BASE_CFG mods/path

0 commit comments

Comments
 (0)