Track C++ Google Benchmark microbenchmarks via ASV#2943
Conversation
0bd2cee to
d1b5fb4
Compare
d1b5fb4 to
09d395c
Compare
ArcticDB Code Review SummaryPR: Track C++ Google Benchmark microbenchmarks via ASV (#2943) This push adds:
The API & Compatibility
Memory & Safety
Correctness
Code Quality
Testing
Build & Dependencies
Security
PR Title & Description
Open Issues Summary
|
| apt update | ||
| apt install -y git | ||
| python -m pip install arcticdb[Testing] "protobuf<6" | ||
| python -m pip install arcticdb[Testing] asv "protobuf<6" |
There was a problem hiding this comment.
asv is already in the Testing deps?
| env: | ||
| CMAKE_BUILD_PARALLEL_LEVEL: ${{vars.CMAKE_BUILD_PARALLEL_LEVEL}} | ||
|
|
||
| - name: Build C++ benchmarks binary |
There was a problem hiding this comment.
This was already done in the other workflow file?
There was a problem hiding this comment.
There's a lot of duplication with the analysis workflow. Can't we just fold the new steps in to that one?
| shell: bash -l {0} | ||
| run: | | ||
| # Look up the most recent master commit stored in our ASV database and compare against it. | ||
| ARCTICDB_REAL_S3_BUCKET=${ASV_RESULTS_BUCKET} ./tooling_venv/bin/python build_tooling/transform_asv_results.py --mode extract-recent --arcticdb_library cpp_microbenchmarks_asv_results || exit 1 |
There was a problem hiding this comment.
Why do we need a totally separate library to store these? Are they presented properly in the UI?
| # python/benchmarks/cpp_microbenchmarks.py → python/benchmarks → python → repo root). | ||
| REPO_ROOT = Path(__file__).resolve().parents[2] | ||
|
|
||
| CPP_BENCHMARKS_BINARY = Path( |
There was a problem hiding this comment.
We should make sure we blow up with a clear error message explaining that you need to build the benchmarks binary, if this is pointing at a binary that doesn't exist. Otherwise it will be hard for people to update the ASV .json file.
|
I'm fine with the script to convert the benchmarks to ASV format, but think it should all be folded in to the existing analysis workflow rather than creating a new one. |
64e121b to
bc8ca04
Compare
bc8ca04 to
137ee1e
Compare
b10cc66 to
c7ac738
Compare
98b3899 to
6a8f96f
Compare
Adds C++ microbenchmark tracking alongside the existing Python ASV benchmarks so C++ regressions are visible on the same dashboard, with the same tooling, and the same blocking semantics as Python benchmarks. python/benchmarks/cpp_microbenchmarks.py ASV module that owns both discovery and execution. At import time it invokes the benchmark binary with --benchmark_list_tests=true and registers one ASV class per top-level Google Benchmark function; parameter variants (e.g. BM_foo/10000/1) become ASV params of that class. Each track_time_ms call shells out to the binary with --benchmark_filter and --benchmark_format=json and returns real_time converted to milliseconds. Binary path comes from ARCTICDB_CPP_BENCHMARKS_BINARY or a repo-relative default. If the binary is missing the module raises FileNotFoundError at import with clear build instructions — a silent fallback would cause asv_checks.py to regenerate benchmarks.json without the C++ entries and produce a confusing hash mismatch on every PR. .github/workflows/benchmark_commits.yml Adds a "Build C++ benchmarks" step that compiles the benchmarks target. The binary is invoked by ASV during the existing asv run step via the Python module — no separate run, no results-JSON coordination, no new library. Results land in the same ArcticDB library as the Python benchmarks. .github/workflows/analysis_workflow.yml run-asv-check-script also builds the benchmarks target so cpp_microbenchmarks.py can register its classes at import time; without it asv_checks.py would fail the benchmarks.json hash comparison on every PR.
6a8f96f to
c130515
Compare
ec8e570 to
01798c5
Compare
Track C++ Google Benchmark microbenchmarks via ASV
Registering benchmarks
C++ microbenchmark results were not visible in the ASV dashboard and could not block PRs on regressions. This wires the existing Google Benchmark suite into ASV so C++ performance regressions are caught with the same tooling and blocking semantics as Python benchmarks.
python/benchmarks/cpp_microbenchmarks.pyis a new ASV module bridging Google Benchmark and ASV. At import time it invokes the benchmark binary with--benchmark_list_tests=trueand dynamically registers one ASVtrack_time_msclass per top-level Google Benchmark function. Parameter variants (e.g.BM_foo/10000/1) become ASV params. Each track_time_ms call shells out to the binary with--benchmark_filterand--benchmark_format=json.Handling flaky benchmarks
Discovering flaky benchmarks
build_tooling/check_cpp_bench_stability.pyis a script that can run a subset of google benchmarks multiple times. It can be used to discover flaky benchmarks. The script records statistics about the runs. Ifmax_time / min_timeis more than than ASV's threshold the benchmark is considered as potentially flaky.Handling flaky benchmarks
There we a couple of flaky benchmarks. Google benchmark supports several options similar to ASV that control benchmark execution and we can use to stabilize run time.
iterationsto 10 an thenrepetitionsto 3 it will run the the whole body of the test 3 times and each run will run the inner benchmark loop 10 times. The timings we'll get will be for the time it took to run the whole test i.e. 10 iteration of the inner loop.iterations). How many seconds to run the inner loop for.Timing: when a benchmark uses ->Repetitions(N)->ReportAggregatesOnly(true), Google Benchmark emits only aggregate rows; the bridge picks the median row. This matches ASV's own convention for timing benchmarks. When no repetitions are configured there is one iteration row and its value is used directly.
Workflow changes
Both
benchmark_commits.ymlandanalysis_workflow.ymlgain a new "Build C++ benchmarks" step that compiles the benchmarks CMake target. The binary path defaults tocpp/out/linux-release-build/arcticdb/benchmarksand can be overridden viaARCTICDB_CPP_BENCHMARKS_BINARY.Results land in the same ArcticDB library (asv_results on S3 bucket arcticdb-ci-benchmark-results, symbol = short commit hash) as the Python benchmarks. On non-master branches the library is _asv_results.
The
analysis_workflow.ymlstep is needed so cpp_microbenchmarks.py can register its classes at import time during the asv_checks.py hash verification — without it the check would fail on every PR.