Skip to content

Surface untested source files at 0% coverage in reports#9

Draft
olivembo wants to merge 6 commits into
eclipse-score:mainfrom
etas-contrib:centralized-coverage
Draft

Surface untested source files at 0% coverage in reports#9
olivembo wants to merge 6 commits into
eclipse-score:mainfrom
etas-contrib:centralized-coverage

Conversation

@olivembo

Copy link
Copy Markdown

Summary

llvm-cov only reports files whose object files are linked into at least one test. Source files that exist in the workspace but no cc_test pulls in silently disappear from the coverage report.

This PR adds a Bazel aspect that walks the dependency graph to collect all C/C++ sources, compares them against what llvm-cov actually reported, and augments the LCOV, HTML, and text outputs with synthetic 0%-coverage entries for the missing files.

Approach

Why an aspect + manifest instead of patching --instrumentation_filter?

instrumentation_filter controls which targets get built with coverage instrumentation, but llvm-cov still only reports files whose .o is linked into a test binary. There's no Bazel-native way to surface the gap. The aspect walks deps/srcs/implementation_deps transitively and writes a manifest of all reachable .cpp/.cc/.cxx/.c files. The reporter then diffs manifest vs. LCOV output.

Why heuristic line counts instead of exact numbers?

Without running llvm-cov against actual instrumented object files (which don't exist for untested sources), we can't get exact instrumentable line counts. The heuristic (_count_instrumentable_lines) filters blank lines, comments, preprocessor directives, lone braces, and namespace declarations. All outputs explicitly label these as estimates (~N, "estimated via heuristic") to avoid false precision. The TOTALS line in summary.txt is intentionally left untouched — only a WARNING banner is appended.

What changed

coverage/defs.bzl: _collect_sources_aspect, score_instrumented_sources_manifest rule, instrumented_sources_manifest parameter on score_coverage_reporter macro
coverage/reporter.py: LCOV augmentation, HTML augmentation (top-banner + per-file pages + detail table), text summary banner, workspace-bounds check, HTML escaping including quotes
coverage/BUILD.bazel: reporter_lib py_library for testability
tests/coverage/: uncovered.cpp/.h fixture (library with no test), reporter_test.py with unit tests for all augmentation helpers including path-traversal rejection

Consumer usage

load("@score_cpp_policies//coverage:defs.bzl",
     "score_coverage_reporter", "score_instrumented_sources_manifest")

score_instrumented_sources_manifest(
    name = "instrumented_sources",
    targets = ["//src:mylib"],
)

score_coverage_reporter(
    name = "reporter_wrapper",
    llvm_cov = "@llvm_toolchain//:llvm-cov",
    llvm_profdata = "@llvm_toolchain//:llvm-profdata",
    instrumented_sources_manifest = ":instrumented_sources",
)

olivembo added 6 commits June 23, 2026 08:25
Introduces a reusable coverage toolchain based on llvm-cov:

- coverage/merger.py: per-test profraw -> profdata + object file packaging
- coverage/reporter.py: cross-test aggregation, HTML/LCOV/text reports
- coverage/effective_coverage.py: justification overlay + effective metrics
- coverage/justify.py: justification manifest resolution
- coverage/defs.bzl: score_coverage_reporter macro for consumer wiring
- coverage/coverage.bazelrc: shared coverage flags
- coverage/filter_regexes.txt: baseline source exclusions
- coverage/generate_coverage_html.sh: convenience entry point

Adds an end-to-end example under tests/coverage exercising the pipeline
with a small instrumented library, test, justification file, and consumer
filter regexes.

Known limitation: source files not linked into any cc_test are not yet
included in the report (no instrumented object file -> invisible to
llvm-cov).
llvm-cov only reports files linked into at least one test binary.
Sources that exist in the workspace but no test pulls in silently
disappear from the report, causing coverage to appear higher than
it actually is.

Three mechanisms work together to fix this:

Bazel aspect + manifest rule (coverage/defs.bzl):
  _collect_sources_aspect walks the dependency graph of all configured
  targets and collects C/C++ source files. score_instrumented_sources_manifest
  writes a workspace-relative path-per-line manifest.
  score_coverage_reporter gains an optional instrumented_sources_manifest
  parameter that passes the manifest to the reporter via
  --instrumented_sources_manifest.

Reporter augmentation (coverage/reporter.py):
  After llvm-cov export, the reporter compares the manifest against
  covered sources from the LCOV output. For each missing file a
  synthetic 0%-coverage LCOV record is appended (SF + DA per
  non-blank line + LF/LH). The llvm-cov text summary TOTALS line is
  updated in-place using re.finditer to preserve fixed-width column
  alignment. Per-file HTML pages and a "Not Linked Into Tests" index
  section are generated for visibility.

  Correctness and security fixes applied during review:
  - Use str.replace() instead of str.format() for HTML template
    rendering so that { and } in C++ source bodies do not crash the
    reporter with KeyError/ValueError.
  - Separate stderr from stdout for run_llvm_cov_export and
    run_llvm_cov_report (separate_stderr=True) so that llvm-cov
    warning messages are not mixed into LCOV/summary output.
  - Validate that resolved manifest paths stay within workspace_root
    via Path.is_relative_to() before reading files.
  - Extend _escape_html to cover ' (') and " (") so that
    file paths with apostrophes do not break HTML attributes.
  - Count only non-blank lines for LF in synthetic LCOV records
    to avoid inflating the denominator in aggregate metrics.

Test fixture (tests/coverage/uncovered.cpp, uncovered.h, BUILD.bazel):
  cc_library intentionally not linked into any cc_test. Verifies that
  the reporter surfaces the file at 0% coverage rather than omitting it.

Docs (coverage/README.md): new section 5a with usage example.
- Use heuristic to identify instrumentable lines instead of counting all
  non-blank lines. Filters comments, preprocessor directives, lone braces,
  namespace declarations, and access specifiers to avoid inflating LF values
  in synthetic LCOV records.
- Augment summary.txt and console output with untested file line counts so
  the visible TOTALS reflect the true coverage including 0%-files.
- Parse the llvm-cov column header to determine the Lines group index
  dynamically instead of hardcoding position 1.
- Add workspace-bounds check after resolve() in _find_untested_sources to
  prevent path traversal via symlinks.
- Escape single and double quotes in _escape_html to prevent attribute
  breakout in generated HTML pages.
- Narrow _NON_EXECUTABLE_RE block-comment pattern from `|\*.*` to
  `|\*(?:[/\s].*)?` so that pointer dereferences (`*ptr = value;`) are
  correctly classified as executable.
- Add py_test with unit tests for all reporter augmentation helpers:
  _is_likely_executable, _count_instrumentable_lines,
  _covered_sources_from_lcov, _find_untested_sources,
  _append_zero_coverage_lcov, _augment_text_summary, _escape_html.
  Includes a path-traversal rejection test for _find_untested_sources.
- Add py_library target for reporter so unit tests can import it.
- Remove coverable_test from instrumented_sources manifest targets
  to avoid testonly dependency violation (test sources don't need to
  appear in the manifest — they're tested by definition).
The heuristic line count (_count_instrumentable_lines) cannot replicate
what llvm-cov would report for actually-instrumented objects. Rewriting
TOTALS with approximate numbers gives false precision.

- _augment_text_summary: no longer rewrites the TOTALS line; appends a
  clearly-labelled WARNING banner with ~N estimated lines instead.
- _inject_untested_section_into_index: injects a prominent banner right
  after <body> so it is the first thing reviewers see. Detail table uses
  ~N notation and includes a disclaimer about the heuristic.
- _append_zero_coverage_lcov docstring documents the approximation.
- Tests updated to match banner-only behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant