From ffcd82c2d1d925566487f5a8838596f3052ad26b Mon Sep 17 00:00:00 2001
From: kholdrex <alexandrkholodniak@gmail.com>
Date: Sat, 6 Jun 2026 15:25:36 -0500
Subject: [PATCH] docs: add benchmark baseline reporting guidance

---
 CHANGELOG.md    |  6 ++++++
 CONTRIBUTING.md | 33 +++++++++++++++++++++++++++++++++
 README.md       | 28 +++++++++++++++++++++++++++-
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a5dfc5d..e40e246 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+
+### Added
+- Documentation guidance for reproducible Criterion benchmark baseline capture,
+  local comparisons, and non-flaky performance regression reporting.
+
 ## [0.3.0] - 2026-06-06
 
 ### Added
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index bbb2b14..580ce8b 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -45,6 +45,39 @@ cargo doc --all-features --no-deps
 cargo bench
 ```
 
+### Benchmark Baseline Reporting
+
+Benchmarks use Criterion and are intended to inform reviews without making CI
+flaky. Do not add mandatory performance pass/fail gates unless the project later
+adopts a dedicated, stable benchmarking runner.
+
+For performance-sensitive changes, capture a reproducible local baseline before
+editing and compare your branch against it:
+
+```bash
+# From the baseline branch or commit you want to compare against
+cargo bench --all-features -- --save-baseline before-change
+
+# From your feature branch
+cargo bench --all-features -- --baseline before-change
+```
+
+Include the following details in the PR description when benchmark results are
+relevant:
+
+- Baseline and candidate git SHAs (`git rev-parse --short HEAD` for each)
+- `rustc --version` and `cargo --version`
+- CPU model/core count, operating system, and any relevant power or thermal
+  constraints
+- Exact benchmark command, including feature flags and any Criterion filters
+- A short summary of the Criterion comparison, plus the HTML report location
+  (`target/criterion/report/index.html`, or the equivalent path under
+  `$CARGO_TARGET_DIR`) or attached artifacts when helpful
+
+Prefer local Criterion comparisons and posted artifacts over CI performance
+thresholds. CI should continue checking that benchmarks compile, for example with
+`cargo bench --no-run --all-features`, without failing on noisy timing deltas.
+
 ## Contribution Guidelines
 
 ### Code Style
diff --git a/README.md b/README.md
index 6228267..6e073f5 100644
--- a/README.md
+++ b/README.md
@@ -333,12 +333,38 @@ println!("Slope estimate: {:.3} ± {:.3}",
 
 ## Performance
 
-Run benchmarks to see performance characteristics:
+Run the Criterion benchmarks to see performance characteristics:
 
 ```bash
 cargo bench
 ```
 
+When sharing benchmark results, include enough context for others to reproduce
+and interpret them:
+
+- Git revision, for example `git rev-parse --short HEAD`
+- Rust toolchain versions from `rustc --version` and `cargo --version`
+- CPU, operating system, and any notable power/thermal settings
+- Exact command used, including feature flags or Criterion filters
+- Criterion output location, such as `target/criterion/report/index.html` (or
+  the equivalent path under `$CARGO_TARGET_DIR`), when the HTML report is useful
+  for review
+
+For local regression checks, capture a named Criterion baseline and compare the
+current work against it:
+
+```bash
+# From the baseline branch or commit you want to compare against
+cargo bench --all-features -- --save-baseline before-change
+
+# From your feature branch
+cargo bench --all-features -- --baseline before-change
+```
+
+Treat benchmark numbers as local evidence rather than CI pass/fail gates. Runtime
+noise varies by machine, so prefer posting the captured context and Criterion
+comparison in reviews over adding flaky performance thresholds to automation.
+
 The library is optimized for:
 - Efficient matrix operations using `nalgebra`
 - Minimal memory allocations during sampling