Surface untested source files at 0% coverage in reports by olivembo · Pull Request #9 · eclipse-score/score_cpp_policies

olivembo · 2026-06-23T09:46:38Z

Summary

llvm-cov only reports files whose object files are linked into at least one test. Source files that exist in the workspace but no cc_test pulls in silently disappear from the coverage report.

This PR adds a Bazel aspect that walks the dependency graph to collect all C/C++ sources, compares them against what llvm-cov actually reported, and augments the LCOV, HTML, and text outputs with synthetic 0%-coverage entries for the missing files.

Approach

Why an aspect + manifest instead of patching `--instrumentation_filter`?

instrumentation_filter controls which targets get built with coverage instrumentation, but llvm-cov still only reports files whose .o is linked into a test binary. There's no Bazel-native way to surface the gap. The aspect walks deps/srcs/implementation_deps transitively and writes a manifest of all reachable .cpp/.cc/.cxx/.c files. The reporter then diffs manifest vs. LCOV output.

Why heuristic line counts instead of exact numbers?

Without running llvm-cov against actual instrumented object files (which don't exist for untested sources), we can't get exact instrumentable line counts. The heuristic (_count_instrumentable_lines) filters blank lines, comments, preprocessor directives, lone braces, and namespace declarations. All outputs explicitly label these as estimates (~N, "estimated via heuristic") to avoid false precision. The TOTALS line in summary.txt is intentionally left untouched — only a WARNING banner is appended.

What changed

coverage/defs.bzl: _collect_sources_aspect, score_instrumented_sources_manifest rule, instrumented_sources_manifest parameter on score_coverage_reporter macro
coverage/reporter.py: LCOV augmentation, HTML augmentation (top-banner + per-file pages + detail table), text summary banner, workspace-bounds check, HTML escaping including quotes
coverage/BUILD.bazel: reporter_lib py_library for testability
tests/coverage/: uncovered.cpp/.h fixture (library with no test), reporter_test.py with unit tests for all augmentation helpers including path-traversal rejection

Consumer usage

load("@score_cpp_policies//coverage:defs.bzl",
     "score_coverage_reporter", "score_instrumented_sources_manifest")

score_instrumented_sources_manifest(
    name = "instrumented_sources",
    targets = ["//src:mylib"],
)

score_coverage_reporter(
    name = "reporter_wrapper",
    llvm_cov = "@llvm_toolchain//:llvm-cov",
    llvm_profdata = "@llvm_toolchain//:llvm-profdata",
    instrumented_sources_manifest = ":instrumented_sources",
)

Introduces a reusable coverage toolchain based on llvm-cov: - coverage/merger.py: per-test profraw -> profdata + object file packaging - coverage/reporter.py: cross-test aggregation, HTML/LCOV/text reports - coverage/effective_coverage.py: justification overlay + effective metrics - coverage/justify.py: justification manifest resolution - coverage/defs.bzl: score_coverage_reporter macro for consumer wiring - coverage/coverage.bazelrc: shared coverage flags - coverage/filter_regexes.txt: baseline source exclusions - coverage/generate_coverage_html.sh: convenience entry point Adds an end-to-end example under tests/coverage exercising the pipeline with a small instrumented library, test, justification file, and consumer filter regexes. Known limitation: source files not linked into any cc_test are not yet included in the report (no instrumented object file -> invisible to llvm-cov).

llvm-cov only reports files linked into at least one test binary. Sources that exist in the workspace but no test pulls in silently disappear from the report, causing coverage to appear higher than it actually is. Three mechanisms work together to fix this: Bazel aspect + manifest rule (coverage/defs.bzl): _collect_sources_aspect walks the dependency graph of all configured targets and collects C/C++ source files. score_instrumented_sources_manifest writes a workspace-relative path-per-line manifest. score_coverage_reporter gains an optional instrumented_sources_manifest parameter that passes the manifest to the reporter via --instrumented_sources_manifest. Reporter augmentation (coverage/reporter.py): After llvm-cov export, the reporter compares the manifest against covered sources from the LCOV output. For each missing file a synthetic 0%-coverage LCOV record is appended (SF + DA per non-blank line + LF/LH). The llvm-cov text summary TOTALS line is updated in-place using re.finditer to preserve fixed-width column alignment. Per-file HTML pages and a "Not Linked Into Tests" index section are generated for visibility. Correctness and security fixes applied during review: - Use str.replace() instead of str.format() for HTML template rendering so that { and } in C++ source bodies do not crash the reporter with KeyError/ValueError. - Separate stderr from stdout for run_llvm_cov_export and run_llvm_cov_report (separate_stderr=True) so that llvm-cov warning messages are not mixed into LCOV/summary output. - Validate that resolved manifest paths stay within workspace_root via Path.is_relative_to() before reading files. - Extend _escape_html to cover ' (') and " (") so that file paths with apostrophes do not break HTML attributes. - Count only non-blank lines for LF in synthetic LCOV records to avoid inflating the denominator in aggregate metrics. Test fixture (tests/coverage/uncovered.cpp, uncovered.h, BUILD.bazel): cc_library intentionally not linked into any cc_test. Verifies that the reporter surfaces the file at 0% coverage rather than omitting it. Docs (coverage/README.md): new section 5a with usage example.

- Use heuristic to identify instrumentable lines instead of counting all non-blank lines. Filters comments, preprocessor directives, lone braces, namespace declarations, and access specifiers to avoid inflating LF values in synthetic LCOV records. - Augment summary.txt and console output with untested file line counts so the visible TOTALS reflect the true coverage including 0%-files. - Parse the llvm-cov column header to determine the Lines group index dynamically instead of hardcoding position 1. - Add workspace-bounds check after resolve() in _find_untested_sources to prevent path traversal via symlinks. - Escape single and double quotes in _escape_html to prevent attribute breakout in generated HTML pages.

- Narrow _NON_EXECUTABLE_RE block-comment pattern from `|\*.*` to `|\*(?:[/\s].*)?` so that pointer dereferences (`*ptr = value;`) are correctly classified as executable. - Add py_test with unit tests for all reporter augmentation helpers: _is_likely_executable, _count_instrumentable_lines, _covered_sources_from_lcov, _find_untested_sources, _append_zero_coverage_lcov, _augment_text_summary, _escape_html. Includes a path-traversal rejection test for _find_untested_sources.

- Add py_library target for reporter so unit tests can import it. - Remove coverable_test from instrumented_sources manifest targets to avoid testonly dependency violation (test sources don't need to appear in the manifest — they're tested by definition).

The heuristic line count (_count_instrumentable_lines) cannot replicate what llvm-cov would report for actually-instrumented objects. Rewriting TOTALS with approximate numbers gives false precision. - _augment_text_summary: no longer rewrites the TOTALS line; appends a clearly-labelled WARNING banner with ~N estimated lines instead. - _inject_untested_section_into_index: injects a prominent banner right after <body> so it is the first thing reviewers see. Detail table uses ~N notation and includes a disclaimer about the heuristic. - _append_zero_coverage_lcov docstring documents the approximation. - Tests updated to match banner-only behavior.

olivembo added 6 commits June 23, 2026 08:25

olivembo requested a review from RSingh1511 June 23, 2026 09:52

olivembo mentioned this pull request Jun 23, 2026

use llvm cov for coverage and justification eclipse-score/communication#549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface untested source files at 0% coverage in reports#9

Surface untested source files at 0% coverage in reports#9
olivembo wants to merge 6 commits into
eclipse-score:mainfrom
etas-contrib:centralized-coverage

olivembo commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

olivembo commented Jun 23, 2026

Summary

Approach

Why an aspect + manifest instead of patching --instrumentation_filter?

Why heuristic line counts instead of exact numbers?

What changed

Consumer usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why an aspect + manifest instead of patching `--instrumentation_filter`?