Skip to content

Commit 252a1a9

Browse files
authored
docs: mirco-benchmarks best practices (#6922)
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
1 parent c870564 commit 252a1a9

1 file changed

Lines changed: 146 additions & 49 deletions

File tree

docs/developer-guide/benchmarking.md

Lines changed: 146 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,158 @@
11
# Benchmarking
22

33
Vortex has two categories of benchmarks: microbenchmarks for individual operations, and SQL
4-
benchmarks for end-to-end query performance. The `bench-orchestrator` tool coordinates running
5-
SQL benchmarks across different engines without compiling them all into a single binary.
4+
benchmarks for end-to-end query performance.
65

76
## Microbenchmarks
87

9-
Microbenchmarks use the Divan framework and live in `benches/` directories within individual
10-
crates. They cover low-level operations such as encoding, decoding, compute kernels, buffer
11-
operations, and scalar access.
8+
Microbenchmarks use the Divan framework and live in `benches/` directories within individual crates.
129

1310
Run microbenchmarks for a specific crate with:
1411

1512
```bash
1613
cargo bench -p <crate-name>
1714
```
1815

16+
## Best Practices
17+
18+
### Separate setup from profiled code
19+
20+
Always use `bencher.with_inputs(|| ...)` so fixture construction is excluded from timing:
21+
22+
```rust
23+
bencher
24+
.with_inputs(|| bench_fixture()))
25+
.bench_refs(|(array, indices)| {
26+
array.take(indices.to_array()).unwrap()
27+
});
28+
```
29+
30+
### Exclude `Drop` from measurements
31+
32+
Divan measures only the closure body, **not** the `Drop` of its return value.
33+
Structure your benchmark so that expensive drops happen via the return value or
34+
via bench_refs inputs.
35+
36+
- **Return the value** from the closure — Divan will drop it after timing stops:
37+
38+
```rust
39+
bencher
40+
.with_inputs(|| make_big_vec())
41+
.bench_values(|v| transform(v)) // drop of the result is NOT timed
42+
```
43+
44+
- **Use `bench_refs`** — the input is dropped after the entire sample loop, not per-iteration:
45+
46+
```rust
47+
bencher
48+
.with_inputs(|| make_big_vec())
49+
.bench_refs(|v| v.sort()) // v is dropped outside the timed region
50+
```
51+
52+
Structure your benchmark so that expensive drops happen via the return value or via `bench_refs` inputs.
53+
54+
### Black-box inputs to prevent compiler optimization
55+
56+
The compiler can constant-fold or eliminate work if it can prove that inputs are known at
57+
compile time.
58+
59+
Values provided through `with_inputs` are automatically black-boxed by Divan — no action
60+
needed:
61+
62+
```rust
63+
// ✓ `array` and `indices` are automatically black-boxed by Divan
64+
bencher
65+
.with_inputs(|| (&prebuilt_array, &prebuilt_indices))
66+
.bench_refs(|(array, indices)| array.take(indices.to_array()).unwrap());
67+
```
68+
69+
### Captured variables
70+
71+
Variables captured from the surrounding scope are _not_ black-boxed. Wrap them with
72+
`divan::black_box()` or pass them through `with_inputs` instead:
73+
74+
```rust
75+
let array = make_array();
76+
77+
// ✗ `array` is captured — the compiler may optimize based on its known contents
78+
bencher.bench(|| process(&array));
79+
80+
// ✓ Option A: pass through with_inputs
81+
bencher
82+
.with_inputs(|| &array)
83+
.bench_refs(|array| process(array));
84+
85+
// ✓ Option B: explicit black_box on the capture
86+
bencher.bench(|| process(divan::black_box(&array)));
87+
```
88+
89+
### Return values and manual loops
90+
91+
Return values are automatically black-boxed. You only need explicit
92+
`black_box` for side-effect-free results inside manual loops:
93+
94+
```rust
95+
bencher.with_inputs(|| &array).bench_refs(|array| {
96+
for idx in 0..len {
97+
divan::black_box(array.scalar_at(idx).unwrap());
98+
}
99+
});
100+
```
101+
102+
### Use deterministic, seeded RNG
103+
104+
Always use `StdRng::seed_from_u64(N)` for reproducible data generation:
105+
106+
```rust
107+
let mut rng = StdRng::seed_from_u64(0);
108+
```
109+
110+
### Parameterize with `args`, `consts`, and `types`
111+
112+
Use Divan's parameterization features and define parameter arrays as named constants:
113+
114+
```rust
115+
const NUM_INDICES: &[usize] = &[1_000, 10_000, 100_000];
116+
const VECTOR_SIZE: &[usize] = &[16, 256, 2048, 8192];
117+
118+
#[divan::bench(args = NUM_INDICES, consts = VECTOR_SIZE)]
119+
fn my_bench<const N: usize>(bencher: Bencher, num_indices: usize) { ... }
120+
```
121+
122+
### Keep per-iteration execution time under ~1 ms
123+
124+
Each individual iteration of the benchmarked closure should complete in
125+
**less than 1ms**. This is to keep benchmarks snappy, locally and on CI.
126+
127+
### Gate CodSpeed-incompatible benchmarks
128+
129+
Use `#[cfg(not(codspeed))]` for benchmarks that are incompatible with CodSpeed.
130+
131+
### CodSpeed's single-run model
132+
133+
CI benchmarks run under [CodSpeed's CPU simulation](https://codspeed.io/docs/instruments/cpu),
134+
which executes each benchmark **exactly once** and estimates CPU cycles from the instruction
135+
trace — including cache and memory access costs. This has several implications:
136+
137+
- **`sample_count` and `sample_size` have no effect** — CodSpeed always runs one iteration.
138+
- **Results are deterministic** — the simulated cycle count is derived from the instruction
139+
trace, not wall-clock time, so there is no noise from system load or scheduling.
140+
- **System calls are excluded** — CodSpeed only measures user-space code. Benchmarks that
141+
rely on I/O or kernel interactions will not reflect those costs, so they should use the
142+
[walltime instrument](https://codspeed.io/docs/instruments/walltime) or be gated with
143+
`#[cfg(not(codspeed))]`.
144+
145+
### Prefer `mimalloc` for throughput benchmarks
146+
147+
Throughput benchmarks should use `mimalloc` as the global allocator to reduce system allocator
148+
noise:
149+
150+
```rust
151+
use mimalloc::MiMalloc;
152+
#[global_allocator]
153+
static GLOBAL: MiMalloc = MiMalloc;
154+
```
155+
19156
## SQL Benchmarks
20157

21158
SQL benchmarks measure end-to-end query performance across different engines and file formats.
@@ -48,51 +185,11 @@ cargo run --release --bin duckdb-bench -- <benchmark>
48185

49186
## Orchestrator
50187

51-
The `bench-orchestrator` is a Python CLI tool (`vx-bench`) that coordinates running benchmarks
52-
across multiple engines. It builds and invokes the per-engine binaries, stores results, and
53-
provides comparison tooling. This avoids compiling all engines into a single binary, which
54-
would be slow and create dependency conflicts.
55-
56-
Install it with:
57-
58-
```bash
59-
uv tool install "bench_orchestrator @ ./bench-orchestrator/"
60-
```
61-
62-
### Running Benchmarks
63-
64-
```bash
65-
# Run TPC-H on DataFusion and DuckDB, comparing Parquet and Vortex
66-
vx-bench run tpch --engine datafusion,duckdb --format parquet,vortex
67-
68-
# Run a subset of queries with fewer iterations
69-
vx-bench run tpch -q 1,6,12 -i 3
70-
71-
# Run with memory tracking
72-
vx-bench run tpch --track-memory
73-
74-
# Run with CPU profiling
75-
vx-bench run tpch --samply
76-
```
77-
78-
### Comparing Results
79-
80-
```bash
81-
# Compare formats/engines within the most recent run
82-
vx-bench compare --run latest
83-
84-
# Compare across two labeled runs
85-
vx-bench compare --runs baseline,feature
86-
```
87-
88-
Comparison output is color-coded: green for improvements (>10%), yellow for neutral, red for
89-
regressions.
90-
91-
### Result Storage
188+
The `bench-orchestrator` is a Python CLI tool (`vx-bench`) that coordinates running SQL
189+
benchmarks across multiple engines, stores results, and provides comparison tooling.
92190

93-
Results are stored as JSON Lines files under `target/vortex-bench/runs/`, with each run
94-
containing metadata (git commit, timestamp, configuration) and per-query timing data. The
95-
`vx-bench list` command shows recent runs.
191+
See [`bench-orchestrator/README.md`](https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md) for installation,
192+
commands, and example workflows.
96193

97194
## CI Benchmarks
98195

0 commit comments

Comments
 (0)