From ffcd82c2d1d925566487f5a8838596f3052ad26b Mon Sep 17 00:00:00 2001 From: kholdrex Date: Sat, 6 Jun 2026 15:25:36 -0500 Subject: [PATCH] docs: add benchmark baseline reporting guidance --- CHANGELOG.md | 6 ++++++ CONTRIBUTING.md | 33 +++++++++++++++++++++++++++++++++ README.md | 28 +++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a5dfc5d..e40e246 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +### Added +- Documentation guidance for reproducible Criterion benchmark baseline capture, + local comparisons, and non-flaky performance regression reporting. + ## [0.3.0] - 2026-06-06 ### Added diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bbb2b14..580ce8b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -45,6 +45,39 @@ cargo doc --all-features --no-deps cargo bench ``` +### Benchmark Baseline Reporting + +Benchmarks use Criterion and are intended to inform reviews without making CI +flaky. Do not add mandatory performance pass/fail gates unless the project later +adopts a dedicated, stable benchmarking runner. + +For performance-sensitive changes, capture a reproducible local baseline before +editing and compare your branch against it: + +```bash +# From the baseline branch or commit you want to compare against +cargo bench --all-features -- --save-baseline before-change + +# From your feature branch +cargo bench --all-features -- --baseline before-change +``` + +Include the following details in the PR description when benchmark results are +relevant: + +- Baseline and candidate git SHAs (`git rev-parse --short HEAD` for each) +- `rustc --version` and `cargo --version` +- CPU model/core count, operating system, and any relevant power or thermal + constraints +- Exact benchmark command, including feature flags and any Criterion filters +- A short summary of the Criterion comparison, plus the HTML report location + (`target/criterion/report/index.html`, or the equivalent path under + `$CARGO_TARGET_DIR`) or attached artifacts when helpful + +Prefer local Criterion comparisons and posted artifacts over CI performance +thresholds. CI should continue checking that benchmarks compile, for example with +`cargo bench --no-run --all-features`, without failing on noisy timing deltas. + ## Contribution Guidelines ### Code Style diff --git a/README.md b/README.md index 6228267..6e073f5 100644 --- a/README.md +++ b/README.md @@ -333,12 +333,38 @@ println!("Slope estimate: {:.3} ± {:.3}", ## Performance -Run benchmarks to see performance characteristics: +Run the Criterion benchmarks to see performance characteristics: ```bash cargo bench ``` +When sharing benchmark results, include enough context for others to reproduce +and interpret them: + +- Git revision, for example `git rev-parse --short HEAD` +- Rust toolchain versions from `rustc --version` and `cargo --version` +- CPU, operating system, and any notable power/thermal settings +- Exact command used, including feature flags or Criterion filters +- Criterion output location, such as `target/criterion/report/index.html` (or + the equivalent path under `$CARGO_TARGET_DIR`), when the HTML report is useful + for review + +For local regression checks, capture a named Criterion baseline and compare the +current work against it: + +```bash +# From the baseline branch or commit you want to compare against +cargo bench --all-features -- --save-baseline before-change + +# From your feature branch +cargo bench --all-features -- --baseline before-change +``` + +Treat benchmark numbers as local evidence rather than CI pass/fail gates. Runtime +noise varies by machine, so prefer posting the captured context and Criterion +comparison in reviews over adding flaky performance thresholds to automation. + The library is optimized for: - Efficient matrix operations using `nalgebra` - Minimal memory allocations during sampling