Skip to content

Commit f94ceca

Browse files
mjp41Copilot
andauthored
Add cross-platform Coverage CI (#842)
* Add cross-platform Coverage CI Adds a coverage workflow that mirrors the regular ctest invocation across the CI matrix and surfaces a per-PR coverage delta. Coverage is advisory only and never gates a PR. Build wiring (CMakeLists.txt, cmake/run_coverage.cmake) - New SNMALLOC_COVERAGE option (Clang/AppleClang only) adding -fprofile-instr-generate -fcoverage-mapping to compile and link. - Per-test LLVM_PROFILE_FILE so each test's profraws live in their own subdir and can be invalidated independently by the cached runner. - Global SNMALLOC_TEST_BINARIES property collects every test target so the new `coverage` custom target can pass them all to llvm-cov without globbing build outputs. - run_coverage.cmake drives ctest under coverage with sha256-keyed per-test caching, then merges per-test profdata and exports a unified coverage.json. Hashes are only cached when ctest reports overall success, so a timed-out or crashed test does not get its partial profraw frozen as the canonical answer for that binary. Merge tool (.github/scripts/merge_coverage.py + tests) - Per-line set-union merger of llvm-cov JSON exports across platforms: merged.executable = ⋃ executable lines, merged.covered = ⋃ covered lines, with covered ⊆ executable asserted at merge. - Emits per-platform breakdown alongside the union for the comment renderer. - 16 pytest cases cover union semantics, invariant enforcement, rendering, and determinism. Reusable workflows (reusable-cmake-build.yml, reusable-vm-build.yml) - New coverage-mode (off | tests | tests+selfhost) and coverage-artifact-name inputs; artifact name validated against ^[A-Za-z0-9_-]+$. - When coverage-mode != off: forces Debug, sets SNMALLOC_COVERAGE=ON, invokes the `coverage` target instead of plain ctest. - A single self-host step (always sets LLVM_PROFILE_FILE; harmless when not a coverage build) plus a separate post-step that exports selfhost.json only under coverage-mode == 'tests+selfhost'. Coverage workflow (.github/workflows/coverage.yml) - pull_request + nightly schedule + workflow_dispatch. - Matrix: ubuntu-24.04, macos-14 (brew llvm@19, absolute tool paths), linux-self-host-shim, linux-self-host-shim-checks (adds SNMALLOC_MEMCPY_BOUNDS=ON + SNMALLOC_CHECK_LOADS=ON, runs tests+selfhost), freebsd-14 (absolute /usr/local/llvm19/bin paths), netbsd-10 (pkgsrc clang/clang++), and a Windows clang-cl pre-gate build that exercises configure-time code paths without producing coverage. - Merge job: deterministic find | sort over downloaded artifacts, produces a coverage-merged artifact consumed by the commenter. Comment workflow (.github/workflows/coverage-comment.yml) - workflow_run trigger keeps write permissions out of the build job (read-all token in coverage.yml; pull-requests:write + issues:write here, no contents). - Validates artifact size/structure and the snmalloc-coverage-bot marker before posting. - PR path: dual-marker find-or-create with 3-attempt 409/403 backoff and pagination via github.paginate(). - Tracking-issue path (vars.COVERAGE_TRACKING_ISSUE) for nightly runs and pushes to the default branch. Marker-string coupling between the Python merger and the JS commenter is documented at both call sites; maintainers must update them in lockstep. * coverage CI: fix VM-build expression parse error GitHub Actions expressions only accept single-quoted strings, but the ctest invocation contains single quotes (the -E regex). Move the off-vs-coverage selection from a expression into a shell inside the script body. * coverage CI: fix four CI failures - Validation regex: allow `.` in coverage-artifact-name so matrix labels like `ubuntu-24.04` pass. - FreeBSD: llvm19 port installs versioned binaries (clang19, clang++19, llvm-cov19, llvm-profdata19) directly under /usr/local/bin/, not /usr/local/llvm19/bin/. Update absolute paths. - macOS: brew clang on the Apple SDK warns -Wundef on the SDK's `__STDC_VERSION__` check because brew clang doesn't treat the SDK as system headers by default. Export SDKROOT in the dependencies step so brew clang picks up the SDK as a sysroot (system include path), suppressing -Wundef in those headers. - Self-host coverage export: ls with two glob arguments returns nonzero when one glob has no match (e.g. .dylib on Linux), and pipefail propagated the failure. Use bash nullglob + array instead. Co-authored-by: Copilot <copilot@github.com> * coverage CI: add NetBSD back with compiler-rt pkgsrc's `llvm`/`clang` packages do not bundle libclang_rt.profile.a; the profile runtime lives in the separate `compiler-rt` package. Add it to pkg_add so -fprofile-instr-generate links cleanly. If this leg goes red because pkgsrc cannot resolve `compiler-rt`, the fallback is to fetch it from the NetBSD binary package mirror. Co-authored-by: Copilot <copilot@github.com> * coverage CI: drop NetBSD (pkgsrc compiler-rt-19 broken) pkgsrc's compiler-rt-19 ships a profile runtime in which __llvm_profile_raw_version is declared hidden but not defined. This makes every -fprofile-instr-generate shared library unlinkable — including libsnmallocshim.so during the regular build, not just the self-host step: R_X86_64_PC32 against undefined hidden symbol `__llvm_profile_raw_version` can not be used when making a shared object There is no fix on our side. The leg can be reinstated once pkgsrc ships a fixed compiler-rt, or by adding an in-VM compiler-rt build from llvm-project source (rejected for now — ~10 min per CI run for marginal NetBSD-pal coverage). * coverage CI: drop -format=json from selfhost llvm-cov export llvm-cov export emits JSON by default; the -format flag only accepts text|lcov, so -format=json failed CLI parsing. Co-authored-by: Copilot <copilot@github.com> * coverage CI: promote Windows clang-cl from pre-gate to full leg Drop build-only and contribute Windows coverage to the merged report. Exercises the Windows PAL surface, which no other leg in the matrix touches. windows-2022 runners ship LLVM (clang-cl, llvm-profdata, llvm-cov under C:\Program Files\LLVM\bin, on PATH) and ninja preinstalled, so no \`dependencies:\` step is needed. * coverage CI: review-pass fixes - Windows: pass absolute -DLLVM_COV / -DLLVM_PROFDATA paths instead of relying on PATH ordering inside the runner image. - Self-host step: tighten the if condition so it doesn't run a pointless self-host build when self-host=true is requested alongside an in-progress coverage build (the coverage-mode=='tests+selfhost' branch already covers that). Add a comment explaining the coupling with the Export step. - test_all_platforms_empty: assert covered ⊆ executable invariant for consistency with every other test in the suite. Co-authored-by: Copilot <copilot@github.com> * Coverage: trim Linux matrix to one self-host leg; fix multi-shim export The coverage matrix had two redundant Linux legs: - plain ubuntu-24.04 ctest, fully covered by macOS/FreeBSD/Windows ctest - linux-self-host-shim with no extra mitigations The remaining linux-self-host-shim-checks leg mirrors main.yml's self-host job (SNMALLOC_MEMCPY_BOUNDS=ON + SNMALLOC_CHECK_LOADS=ON) and is the only leg that exercises the bounds-checked memcpy and load-check paths. Also fix the self-host export: llvm-cov export was only given the first shim variant as -object, silently dropping coverage mappings for libsnmallocshim-checks.so and libsnmallocshim-checks-memcpy-only.so even though the for-loop ran ninja under each. Pass every built shim as a separate -object. * Coverage: drop spaced absolute LLVM paths on Windows; rely on PATH The Windows job passed -DLLVM_PROFDATA=C:/Program Files/LLVM/bin/llvm-profdata.exe via a YAML folded scalar. The unquoted space survived YAML but split at shell expansion of ${{ inputs.extra-cmake-flags }} in the reusable workflow, so CMake received -DLLVM_PROFDATA=C:/Program plus a phantom positional. Configure passed (find_program does not override a preset cache value, even a bogus one), but the coverage target later failed in execute_process with: llvm-profdata merge failed (exit no such file or directory) The windows-2022 runner image installs LLVM with C:\Program Files\LLVM\bin already on PATH, so find_program(NAMES ... llvm-profdata) resolves correctly without any override. * Coverage: replace coverage-mode enum with coverage boolean The 'coverage-mode' input had three values (off / tests / tests+selfhost) that conflated two orthogonal concerns: 1. Is this a coverage build at all? 2. Should self-host run? (2) is already expressed by the existing 'self-host' boolean input, so 'tests+selfhost' was redundant — it just meant 'coverage && self-host'. Replace with a 'coverage' boolean. Self-host now runs whenever the caller asks for it, regardless of coverage. The export step's gate becomes 'coverage && self-host'. The validate step drops its enum case, keeping the artifact-name regex check. No behavioural change for any existing call site: coverage-mode: 'off' -> coverage: false (default) coverage-mode: 'tests' -> coverage: true, self-host: false coverage-mode: 'tests+selfhost' -> coverage: true, self-host: true --------- Co-authored-by: Copilot <copilot@github.com>
1 parent 25b0d60 commit f94ceca

8 files changed

Lines changed: 1540 additions & 15 deletions

File tree

.github/scripts/merge_coverage.py

Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
#!/usr/bin/env python3
2+
"""Merge per-platform llvm-cov JSON exports into a per-line set-union report.
3+
4+
Usage:
5+
merge_coverage.py --output-json OUT.json --output-md OUT.md \\
6+
LABEL=PATH [LABEL=PATH ...]
7+
8+
Each PATH is a coverage.json from `llvm-cov export -format=json`. LABEL is
9+
a short platform identifier (e.g. ``linux``, ``macos``, ``selfhost-shim``).
10+
11+
Outputs:
12+
OUT.json per-file (executable, covered) line sets, plus per-platform map.
13+
OUT.md markdown summary suitable for a PR comment.
14+
15+
The per-line set-union design and its invariant (covered(f) ⊆ executable(f))
16+
are documented inline.
17+
"""
18+
from __future__ import annotations
19+
20+
import argparse
21+
import json
22+
import sys
23+
from pathlib import Path
24+
25+
26+
# --- Schema constants (llvm-cov JSON segment format) -------------------------
27+
# Segment = [line, col, count, has_count, is_region_entry, is_gap_region]
28+
_SEG_LINE = 0
29+
_SEG_COUNT = 2
30+
_SEG_HAS_COUNT = 3
31+
_SEG_IS_GAP = 5
32+
_SEG_MIN_LEN = 6
33+
34+
35+
def normalise_path(path: str) -> str:
36+
"""Normalise a filename so the same source file from different CI runners
37+
or build trees compares equal.
38+
39+
Rule: strip everything before (and including) the last ``/src/snmalloc/``
40+
occurrence, after replacing backslashes with forward slashes. Paths
41+
without ``/src/snmalloc/`` are kept verbatim (they are out-of-tree).
42+
"""
43+
p = path.replace("\\", "/")
44+
needle = "/src/snmalloc/"
45+
idx = p.rfind(needle)
46+
if idx >= 0:
47+
# Result begins with ``src/snmalloc/...`` (no leading slash).
48+
return p[idx + 1:]
49+
return p
50+
51+
52+
def parse_platform(doc: dict) -> dict[str, dict]:
53+
"""Convert a single llvm-cov JSON doc into
54+
``{normalised_path: {"executable": set[int], "covered": set[int],
55+
"regions_executable": int, "regions_covered": int}}``.
56+
57+
A line ``ℓ`` enters ``executable`` iff at least one segment on ``ℓ``
58+
has ``has_count == true`` AND ``is_gap_region == false``. It enters
59+
``covered`` iff additionally at least one such segment has ``count > 0``.
60+
By construction ``covered ⊆ executable``.
61+
"""
62+
files: dict[str, dict] = {}
63+
for entry in doc.get("data", []):
64+
for f in entry.get("files", []):
65+
fn = f.get("filename")
66+
if not isinstance(fn, str):
67+
continue
68+
key = normalise_path(fn)
69+
executable = files.setdefault(key, {
70+
"executable": set(),
71+
"covered": set(),
72+
"regions_executable": 0,
73+
"regions_covered": 0,
74+
})
75+
for seg in f.get("segments", []):
76+
if not isinstance(seg, list) or len(seg) < _SEG_MIN_LEN:
77+
continue
78+
if not seg[_SEG_HAS_COUNT]:
79+
continue
80+
if seg[_SEG_IS_GAP]:
81+
continue
82+
line = seg[_SEG_LINE]
83+
if not isinstance(line, int):
84+
continue
85+
executable["executable"].add(line)
86+
if isinstance(seg[_SEG_COUNT], (int, float)) and seg[_SEG_COUNT] > 0:
87+
executable["covered"].add(line)
88+
# Region totals: read from per-file summary (advisory only).
89+
summary = f.get("summary", {})
90+
regions = summary.get("regions", {})
91+
r_count = regions.get("count")
92+
r_covered = regions.get("covered")
93+
if isinstance(r_count, int):
94+
executable["regions_executable"] += r_count
95+
if isinstance(r_covered, int):
96+
executable["regions_covered"] += r_covered
97+
return files
98+
99+
100+
def merge(platforms: dict[str, dict[str, dict]]) -> dict:
101+
"""Merge per-platform parsed maps into the canonical merged structure."""
102+
all_files: set[str] = set()
103+
for pmap in platforms.values():
104+
all_files.update(pmap.keys())
105+
106+
merged_files: dict[str, dict] = {}
107+
for fn in sorted(all_files):
108+
executable: set[int] = set()
109+
covered: set[int] = set()
110+
for pmap in platforms.values():
111+
entry = pmap.get(fn)
112+
if entry is None:
113+
continue
114+
executable |= entry["executable"]
115+
covered |= entry["covered"]
116+
# Defensive: assert invariant covered ⊆ executable
117+
assert covered <= executable, f"invariant violation for {fn}"
118+
merged_files[fn] = {
119+
"executable": sorted(executable),
120+
"covered": sorted(covered),
121+
}
122+
123+
total_exec = sum(len(v["executable"]) for v in merged_files.values())
124+
total_covered = sum(len(v["covered"]) for v in merged_files.values())
125+
126+
plat_out: dict[str, dict] = {}
127+
for label, pmap in platforms.items():
128+
files_view = {
129+
fn: {
130+
"executable": len(entry["executable"]),
131+
"covered": len(entry["covered"]),
132+
"regions": {
133+
"executable": entry["regions_executable"],
134+
"covered": entry["regions_covered"],
135+
},
136+
}
137+
for fn, entry in pmap.items()
138+
}
139+
p_total_exec = sum(v["executable"] for v in files_view.values())
140+
p_total_covered = sum(v["covered"] for v in files_view.values())
141+
p_total_r_exec = sum(entry["regions_executable"] for entry in pmap.values())
142+
p_total_r_covered = sum(entry["regions_covered"] for entry in pmap.values())
143+
plat_out[label] = {
144+
"files": files_view,
145+
"totals": {
146+
"executable": p_total_exec,
147+
"covered": p_total_covered,
148+
"regions": {
149+
"executable": p_total_r_exec,
150+
"covered": p_total_r_covered,
151+
},
152+
},
153+
}
154+
155+
return {
156+
"files": merged_files,
157+
"totals": {"executable": total_exec, "covered": total_covered},
158+
"platforms": plat_out,
159+
}
160+
161+
162+
# --- Markdown rendering ------------------------------------------------------
163+
164+
def md_escape(s: str) -> str:
165+
"""Escape a filename for inclusion in a markdown table cell."""
166+
return s.replace("|", r"\|").replace("\r", " ").replace("\n", " ")
167+
168+
169+
def _pct(covered: int, executable: int) -> str:
170+
if executable == 0:
171+
return "n/a"
172+
return f"{100.0 * covered / executable:.2f}%"
173+
174+
175+
def _top_dir(path: str) -> str:
176+
"""Group key for the per-directory table.
177+
178+
For paths under ``src/snmalloc/``, group by the immediate sub-directory
179+
(e.g. ``src/snmalloc/pal/...`` → ``src/snmalloc/pal``). Other paths are
180+
grouped under ``other``.
181+
"""
182+
if path.startswith("src/snmalloc/"):
183+
rest = path[len("src/snmalloc/"):]
184+
head, _, _ = rest.partition("/")
185+
return f"src/snmalloc/{head}" if head else "src/snmalloc"
186+
return "other"
187+
188+
189+
def _in_scope(path: str) -> bool:
190+
"""Filter a normalised path to ``src/snmalloc/**`` (excluding tests and
191+
concept headers). Same scoping as ``.copilot/coverage_diff.py``."""
192+
if not path.startswith("src/snmalloc/"):
193+
return False
194+
if path.endswith("_concept.h"):
195+
return False
196+
return True
197+
198+
199+
def render_markdown(merged: dict) -> str:
200+
out: list[str] = []
201+
# Marker used by .github/workflows/coverage-comment.yml's
202+
# find-or-create logic (see the dual-marker policy: comment must
203+
# be authored by github-actions[bot] AND its body must contain
204+
# this marker). If you change this string you MUST update both
205+
# occurrences in coverage-comment.yml in lockstep, or comment
206+
# dedup silently breaks (every run posts a new comment).
207+
out.append("<!-- snmalloc-coverage-bot -->")
208+
out.append("## Coverage report (cross-platform merged)")
209+
out.append("")
210+
# Headline is in-scope (``src/snmalloc/**``) only — that is the project
211+
# code being measured. The JSON artifact retains the full unfiltered
212+
# data for downstream consumers (e.g. ``coverage_diff.py``).
213+
scoped_exec = 0
214+
scoped_cov = 0
215+
for fn, v in merged["files"].items():
216+
if not _in_scope(fn):
217+
continue
218+
scoped_exec += len(v["executable"])
219+
scoped_cov += len(v["covered"])
220+
out.append(
221+
f"**Lines covered (`src/snmalloc/**`): {scoped_cov} / {scoped_exec} "
222+
f"({_pct(scoped_cov, scoped_exec)})**"
223+
)
224+
out.append("")
225+
out.append(
226+
"_Merged line coverage is the per-line union across all platforms. "
227+
"Region coverage is reported per-platform only; no cross-platform "
228+
"region total is computed._"
229+
)
230+
out.append("")
231+
232+
# Per-directory breakdown (in-scope only).
233+
dir_totals: dict[str, dict[str, int]] = {}
234+
for fn, v in merged["files"].items():
235+
if not _in_scope(fn):
236+
continue
237+
d = _top_dir(fn)
238+
bucket = dir_totals.setdefault(d, {"executable": 0, "covered": 0})
239+
bucket["executable"] += len(v["executable"])
240+
bucket["covered"] += len(v["covered"])
241+
242+
out.append("### Per-directory breakdown")
243+
out.append("")
244+
out.append("| Directory | Lines covered | Lines executable | % |")
245+
out.append("| --- | ---: | ---: | ---: |")
246+
rows = sorted(
247+
dir_totals.items(),
248+
key=lambda kv: (
249+
(kv[1]["covered"] / kv[1]["executable"]) if kv[1]["executable"] else 1.0,
250+
kv[0],
251+
),
252+
)
253+
for d, t in rows:
254+
out.append(
255+
f"| {md_escape(d)} | {t['covered']} | {t['executable']} | "
256+
f"{_pct(t['covered'], t['executable'])} |"
257+
)
258+
out.append("")
259+
260+
# Per-platform contributions (advisory).
261+
out.append("<details><summary>Per-platform contributions (advisory)</summary>")
262+
out.append("")
263+
out.append("| Platform | Lines covered | Lines executable | Lines % | Regions covered | Regions executable | Regions % |")
264+
out.append("| --- | ---: | ---: | ---: | ---: | ---: | ---: |")
265+
for label in sorted(merged["platforms"].keys()):
266+
pt = merged["platforms"][label]["totals"]
267+
rt = pt["regions"]
268+
out.append(
269+
f"| {md_escape(label)} | {pt['covered']} | {pt['executable']} | "
270+
f"{_pct(pt['covered'], pt['executable'])} | "
271+
f"{rt['covered']} | {rt['executable']} | "
272+
f"{_pct(rt['covered'], rt['executable'])} |"
273+
)
274+
out.append("")
275+
out.append("</details>")
276+
out.append("")
277+
return "\n".join(out)
278+
279+
280+
# --- CLI ---------------------------------------------------------------------
281+
282+
def parse_inputs(spec_list: list[str]) -> dict[str, Path]:
283+
inputs: dict[str, Path] = {}
284+
for spec in spec_list:
285+
if "=" not in spec:
286+
raise SystemExit(f"error: input must be LABEL=PATH, got {spec!r}")
287+
label, _, path = spec.partition("=")
288+
if not label or not path:
289+
raise SystemExit(f"error: empty label or path in {spec!r}")
290+
if label in inputs:
291+
raise SystemExit(f"error: duplicate label {label!r}")
292+
inputs[label] = Path(path)
293+
return inputs
294+
295+
296+
def load_doc(path: Path) -> dict:
297+
try:
298+
with path.open() as f:
299+
doc = json.load(f)
300+
except (OSError, json.JSONDecodeError) as exc:
301+
raise SystemExit(f"error: cannot load {path}: {exc}")
302+
if not isinstance(doc, dict) or "data" not in doc:
303+
raise SystemExit(f"error: {path} missing top-level 'data' key")
304+
return doc
305+
306+
307+
def main(argv: list[str] | None = None) -> int:
308+
ap = argparse.ArgumentParser(description=__doc__)
309+
ap.add_argument("--output-json", required=True, type=Path)
310+
ap.add_argument("--output-md", required=True, type=Path)
311+
ap.add_argument("inputs", nargs="+", help="LABEL=PATH pairs")
312+
args = ap.parse_args(argv)
313+
314+
spec = parse_inputs(args.inputs)
315+
platforms: dict[str, dict[str, dict]] = {}
316+
for label, path in spec.items():
317+
platforms[label] = parse_platform(load_doc(path))
318+
319+
merged = merge(platforms)
320+
321+
args.output_json.write_text(json.dumps(merged, indent=2, sort_keys=True))
322+
args.output_md.write_text(render_markdown(merged))
323+
return 0
324+
325+
326+
if __name__ == "__main__":
327+
sys.exit(main())

0 commit comments

Comments
 (0)