Skip to content

Track C++ Google Benchmark microbenchmarks via ASV#2943

Open
vasil-pashov wants to merge 3 commits into
masterfrom
track-google-bench-asv
Open

Track C++ Google Benchmark microbenchmarks via ASV#2943
vasil-pashov wants to merge 3 commits into
masterfrom
track-google-bench-asv

Conversation

@vasil-pashov
Copy link
Copy Markdown
Collaborator

@vasil-pashov vasil-pashov commented Feb 27, 2026

Track C++ Google Benchmark microbenchmarks via ASV

Registering benchmarks

C++ microbenchmark results were not visible in the ASV dashboard and could not block PRs on regressions. This wires the existing Google Benchmark suite into ASV so C++ performance regressions are caught with the same tooling and blocking semantics as Python benchmarks.

python/benchmarks/cpp_microbenchmarks.py is a new ASV module bridging Google Benchmark and ASV. At import time it invokes the benchmark binary with --benchmark_list_tests=true and dynamically registers one ASV track_time_ms class per top-level Google Benchmark function. Parameter variants (e.g. BM_foo/10000/1) become ASV params. Each track_time_ms call shells out to the binary with --benchmark_filter and --benchmark_format=json.

Handling flaky benchmarks

Discovering flaky benchmarks

build_tooling/check_cpp_bench_stability.py is a script that can run a subset of google benchmarks multiple times. It can be used to discover flaky benchmarks. The script records statistics about the runs. If max_time / min_time is more than than ASV's threshold the benchmark is considered as potentially flaky.

Handling flaky benchmarks

There we a couple of flaky benchmarks. Google benchmark supports several options similar to ASV that control benchmark execution and we can use to stabilize run time.

  • Iterations - control the inner (benchmark loop). This tells GB how many time to run the body of the inner loop. Internally GB times each run of the body and gives us back the average.
  • Repetitions - This tells GB how many time to run the benchmark function itself. It then returns statistics about each run, these statistics can be aggregated. We use the median. Example if we set iterations to 10 an then repetitions to 3 it will run the the whole body of the test 3 times and each run will run the inner benchmark loop 10 times. The timings we'll get will be for the time it took to run the whole test i.e. 10 iteration of the inner loop.
  • MinTime - controls the inner loop (cannot be used alongside iterations). How many seconds to run the inner loop for.

Timing: when a benchmark uses ->Repetitions(N)->ReportAggregatesOnly(true), Google Benchmark emits only aggregate rows; the bridge picks the median row. This matches ASV's own convention for timing benchmarks. When no repetitions are configured there is one iteration row and its value is used directly.

Workflow changes

Both benchmark_commits.yml and analysis_workflow.yml gain a new "Build C++ benchmarks" step that compiles the benchmarks CMake target. The binary path defaults to cpp/out/linux-release-build/arcticdb/benchmarks and can be overridden via ARCTICDB_CPP_BENCHMARKS_BINARY.

Results land in the same ArcticDB library (asv_results on S3 bucket arcticdb-ci-benchmark-results, symbol = short commit hash) as the Python benchmarks. On non-master branches the library is _asv_results.

The analysis_workflow.yml step is needed so cpp_microbenchmarks.py can register its classes at import time during the asv_checks.py hash verification — without it the check would fail on every PR.

Comment thread .github/workflows/analysis_workflow.yml Fixed
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Fixed
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Fixed
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Fixed
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from 0bd2cee to d1b5fb4 Compare February 27, 2026 17:30
@vasil-pashov vasil-pashov added no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version labels Feb 27, 2026
@vasil-pashov vasil-pashov marked this pull request as ready for review February 27, 2026 17:35
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from d1b5fb4 to 09d395c Compare February 27, 2026 17:39
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Outdated
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Outdated
Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Outdated
Comment thread python/benchmarks/cpp_microbenchmarks.py Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Feb 27, 2026

ArcticDB Code Review Summary

PR: Track C++ Google Benchmark microbenchmarks via ASV (#2943)
Reviewed: synchronize event — delta (6a8f96f to c130515)

This push adds:

  • A "Build C++ benchmarks" step to benchmark_commits.yml (compiles the benchmarks target before asv run).
  • A "Build C++ benchmarks" step to the run-asv-check-script job in analysis_workflow.yml (needed so cpp_microbenchmarks.py can register classes at import time).
  • 25 new C++ benchmark entries in python/.asv/results/benchmarks.json, one ASV track class per top-level Google Benchmark function.

The run-asv-check-script job already has submodules: recursive checkout, vcpkg caching, and sccache configured, so the new cmake step there is correctly placed.

API & Compatibility

  • No breaking changes to public Python API (Arctic, Library, NativeVersionStore)
  • Deprecation protocol not applicable (no API removed/renamed)
  • On-disk format unchanged
  • Protobuf schema unchanged
  • Key types and ValueType enum unchanged

Memory & Safety

  • No C++ memory management involved (Python/CI-only change)
  • RAII not applicable
  • No GIL concerns

Correctness

  • FIXED: warnings.warn fallback replacedcpp_microbenchmarks.py now raises FileNotFoundError with clear build instructions when the binary is absent.
  • FIXED: Silent identifier collision resolvedcpp_microbenchmarks.py now raises RuntimeError when two C++ benchmark names map to the same Python identifier.
  • OPEN: Import-time CalledProcessErrorcpp_microbenchmarks.py line ~154 (_NAMES = list_benchmark_names(CPP_BENCHMARKS_BINARY)): if the binary exists but --benchmark_list_tests=true exits non-zero (corrupt binary, linker error, crash on startup), check=True raises CalledProcessError uncaught at module level, aborting asv run with an unformatted traceback instead of a clear error message. The binary-absent case is handled; the binary-exists-but-fails case is not. Still open.
  • OPEN: C++ build step can block Python benchmarks in benchmark_commits.yml — "Build C++ benchmarks" step (~line 74) has no continue-on-error: true. If the C++ build fails, all subsequent steps (asv run for Python benchmarks) are skipped because their default if: success() condition evaluates to false — losing Python benchmark results for that commit. Still open.

Code Quality

  • benchmarks.json entries are correctly structured: all 25 C++ entries have "type": "track", "unit": "ms", and "timeout": 3600. Parameter sets match the Google Benchmark variant naming scheme.
  • All 25 entries share a consistent version hash (9e23b54b...), as expected from a single code generation run.
  • INFO: When the binary exists but fails, stderr is captured (capture_output=True) but not surfaced in the CalledProcessError. If the crash-on-import issue is fixed, consider including e.stderr in the error message for diagnosability.

Testing

  • Infrastructure/CI change — no behavioural change to the ArcticDB data path; no functional tests required.
  • Existing Python ASV benchmarks unaffected.

Build & Dependencies

  • No new Python or C++ dependencies introduced.

Security

  • No hardcoded credentials.

PR Title & Description

  • Title is clear and uses imperative form.
  • WARNING: Description is partially stale — the description still references a separate benchmark_cpp_microbenchmarks.yml workflow and a separate cpp_microbenchmarks_asv_results library, neither of which exist in the current implementation. Results now land in the same ArcticDB library as Python benchmarks. The description should be updated to reflect the actual implementation. Still open.

Open Issues Summary

# File Line Issue
1 python/benchmarks/cpp_microbenchmarks.py ~154 Binary exists but fails: CalledProcessError propagates uncaught at module level with no friendly message
2 .github/workflows/benchmark_commits.yml ~74 No continue-on-error: true on C++ build step — Python benchmark steps are skipped if C++ build fails
3 PR description Description references deleted benchmark_cpp_microbenchmarks.yml workflow and separate library; should describe current implementation

Comment thread .github/workflows/benchmark_cpp_microbenchmarks.yml Fixed
Comment thread .github/workflows/analysis_workflow.yml Outdated
apt update
apt install -y git
python -m pip install arcticdb[Testing] "protobuf<6"
python -m pip install arcticdb[Testing] asv "protobuf<6"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asv is already in the Testing deps?

env:
CMAKE_BUILD_PARALLEL_LEVEL: ${{vars.CMAKE_BUILD_PARALLEL_LEVEL}}

- name: Build C++ benchmarks binary
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already done in the other workflow file?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplication with the analysis workflow. Can't we just fold the new steps in to that one?

shell: bash -l {0}
run: |
# Look up the most recent master commit stored in our ASV database and compare against it.
ARCTICDB_REAL_S3_BUCKET=${ASV_RESULTS_BUCKET} ./tooling_venv/bin/python build_tooling/transform_asv_results.py --mode extract-recent --arcticdb_library cpp_microbenchmarks_asv_results || exit 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a totally separate library to store these? Are they presented properly in the UI?

# python/benchmarks/cpp_microbenchmarks.py → python/benchmarks → python → repo root).
REPO_ROOT = Path(__file__).resolve().parents[2]

CPP_BENCHMARKS_BINARY = Path(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure we blow up with a clear error message explaining that you need to build the benchmarks binary, if this is pointing at a binary that doesn't exist. Otherwise it will be hard for people to update the ASV .json file.

@poodlewars
Copy link
Copy Markdown
Collaborator

I'm fine with the script to convert the benchmarks to ASV format, but think it should all be folded in to the existing analysis workflow rather than creating a new one.

@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from 64e121b to bc8ca04 Compare April 23, 2026 15:41
Comment thread .github/workflows/benchmark_commits.yml Outdated
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from bc8ca04 to 137ee1e Compare April 24, 2026 10:25
Comment thread python/benchmarks/cpp_microbenchmarks.py Outdated
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch 2 times, most recently from b10cc66 to c7ac738 Compare April 24, 2026 10:58
Comment thread python/benchmarks/cpp_microbenchmarks.py Outdated
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch 2 times, most recently from 98b3899 to 6a8f96f Compare April 24, 2026 13:13
Adds C++ microbenchmark tracking alongside the existing Python ASV benchmarks
so C++ regressions are visible on the same dashboard, with the same tooling,
and the same blocking semantics as Python benchmarks.

python/benchmarks/cpp_microbenchmarks.py
  ASV module that owns both discovery and execution. At import time it
  invokes the benchmark binary with --benchmark_list_tests=true and registers
  one ASV class per top-level Google Benchmark function; parameter variants
  (e.g. BM_foo/10000/1) become ASV params of that class. Each track_time_ms
  call shells out to the binary with --benchmark_filter and --benchmark_format=json
  and returns real_time converted to milliseconds. Binary path comes from
  ARCTICDB_CPP_BENCHMARKS_BINARY or a repo-relative default. If the binary
  is missing the module raises FileNotFoundError at import with clear build
  instructions — a silent fallback would cause asv_checks.py to regenerate
  benchmarks.json without the C++ entries and produce a confusing hash
  mismatch on every PR.

.github/workflows/benchmark_commits.yml
  Adds a "Build C++ benchmarks" step that compiles the benchmarks target.
  The binary is invoked by ASV during the existing asv run step via the
  Python module — no separate run, no results-JSON coordination, no new
  library. Results land in the same ArcticDB library as the Python
  benchmarks.

.github/workflows/analysis_workflow.yml
  run-asv-check-script also builds the benchmarks target so cpp_microbenchmarks.py
  can register its classes at import time; without it asv_checks.py would
  fail the benchmarks.json hash comparison on every PR.
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from 6a8f96f to c130515 Compare April 27, 2026 12:04
@vasil-pashov vasil-pashov force-pushed the track-google-bench-asv branch from ec8e570 to 01798c5 Compare May 11, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants