Skip to content

Commit 5cb3235

Browse files
committed
docs: add custom harness integration guide
Adds CUSTOM_HARNESS.md, a step-by-step guide for building a CodSpeed integration for a new language or benchmarking framework using the instrument-hooks C library. Includes a copy-paste prompt for scaffolding the integration with an AI agent. Linked from the main README.
1 parent ecdf31a commit 5cb3235

2 files changed

Lines changed: 356 additions & 13 deletions

File tree

CUSTOM_HARNESS.md

Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
# Building a Custom Harness
2+
3+
This guide is for developers building a CodSpeed integration ("custom harness") for a new language or benchmarking framework. It explains how to use the `instrument-hooks` C library to connect your benchmarks to the CodSpeed runner.
4+
5+
A minimal working C harness lives in [`example/`](./example/) — refer to it alongside this guide.
6+
7+
For existing integrations you can reference as examples, see:
8+
- [codspeed-rust](https://github.com/CodSpeedHQ/codspeed-rust) (Criterion, Divan)
9+
- [codspeed-cpp](https://github.com/CodSpeedHQ/codspeed-cpp) (Google Benchmark)
10+
- [codspeed-go](https://github.com/CodSpeedHQ/codspeed-go)
11+
12+
## Let Your Agent Build the Integration
13+
14+
Copy this block and paste it to your AI assistant to scaffold an instrument-hooks integration:
15+
16+
```text
17+
I want to build a CodSpeed integration for [LANGUAGE/FRAMEWORK] using the instrument-hooks C library.
18+
19+
Repository: https://github.com/CodSpeedHQ/instrument-hooks
20+
Read the full guide: CUSTOM_HARNESS.md in that repo.
21+
22+
Reference integrations to study:
23+
- Rust: https://github.com/CodSpeedHQ/codspeed-rust
24+
- C++: https://github.com/CodSpeedHQ/codspeed-cpp
25+
- Go: https://github.com/CodSpeedHQ/codspeed-go
26+
27+
What instrument-hooks is:
28+
- Single-file C library (dist/core.c + includes/) that bridges benchmark integrations with the CodSpeed runner via IPC
29+
- Supports CPU Simulation (Callgrind) and Walltime (perf) measurement modes — auto-detected, the integration doesn't choose
30+
31+
What I need you to do:
32+
1. Add instrument-hooks to my project as a git submodule (or fetch script for dist/ + includes/)
33+
2. Set up the build to compile dist/core.c with warning suppression flags (see Build Notes in the guide)
34+
3. Implement the benchmark lifecycle via FFI:
35+
a. instrument_hooks_init() → check for NULL
36+
b. instrument_hooks_is_instrumented() → gate CodSpeed-specific code paths
37+
c. instrument_hooks_set_integration(name, version) → register metadata
38+
d. instrument_hooks_start_benchmark() / instrument_hooks_stop_benchmark() → wrap benchmark execution
39+
e. instrument_hooks_set_executed_benchmark(pid, uri) → report what ran
40+
f. instrument_hooks_deinit() → clean up
41+
4. Implement __codspeed_root_frame__:
42+
- The benchmarked code MUST execute inside a function whose name starts with __codspeed_root_frame__
43+
- This function MUST be marked noinline (__attribute__((noinline)), #[inline(never)], etc.)
44+
- This is required for flamegraphs to have a clean root
45+
5. Construct benchmark URIs in the format: {git_relative_file_path}::{benchmark_name}[optional_params]
46+
6. Test with: codspeed run --skip-upload -- <benchmark_command>
47+
48+
Critical rules:
49+
- All functions return uint8_t where 0 = success. Always check return values.
50+
- For CPU Simulation: start_benchmark/stop_benchmark must be as CLOSE as possible to the actual benchmark code (every instruction between them is counted)
51+
- Benchmark markers (add_marker with BENCHMARK_START/END) are OPTIONAL and only relevant for Walltime flamegraph precision — skip them for a first implementation
52+
- If using markers: every BENCHMARK_START must have a matching BENCHMARK_END, in chronological order
53+
54+
My setup:
55+
- Language: [FILL IN]
56+
- Benchmarking framework: [FILL IN]
57+
- Build system: [FILL IN]
58+
```
59+
60+
## Getting the Library
61+
62+
The library is distributed as a single C file (`dist/core.c`) plus headers (`includes/`).
63+
64+
**Preferred: Git submodule**
65+
66+
```bash
67+
git submodule add https://github.com/CodSpeedHQ/instrument-hooks.git
68+
```
69+
70+
Then reference `instrument-hooks/dist/core.c` and `instrument-hooks/includes/` in your build system.
71+
72+
**Alternative: Fetch script**
73+
74+
If your language's build system doesn't support submodules well, write a small script that downloads the `dist/` and `includes/` directories from a pinned release.
75+
76+
## Build Notes
77+
78+
The generated `dist/core.c` produces compiler warnings that are harmless. Suppress them in your build:
79+
80+
**GCC/Clang:**
81+
```
82+
-Wno-maybe-uninitialized -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-type-limits
83+
```
84+
85+
**MSVC:**
86+
```
87+
/wd4101 /wd4189 /wd4100 /wd4245 /wd4132 /wd4146
88+
```
89+
90+
See the [example CMakeLists.txt](CMakeLists.txt) for a complete build configuration.
91+
92+
## Concepts
93+
94+
### CPU Simulation vs Walltime
95+
96+
CodSpeed supports two main measurement instruments. The choice is made by the user when configuring their CI — your integration doesn't need to detect or switch between them. However, understanding the difference matters for how you structure your integration code.
97+
98+
- **CPU Simulation**: Simulates CPU behavior to measure performance. Hardware-agnostic and deterministic. Best for small, CPU-bound workloads. See [CPU Simulation docs](https://codspeed.io/docs/instruments/cpu-simulation).
99+
- **Walltime**: Measures real elapsed time on bare-metal runners with low noise. Supports flamegraphs and profiling. Best for I/O-heavy or longer-running benchmarks. See [Walltime docs](https://codspeed.io/docs/instruments/walltime).
100+
101+
Both instruments are supported through `instrument-hooks`. The main difference for integration authors is that **CPU Simulation requires `start_benchmark` / `stop_benchmark` to be as close as possible to the actual benchmark code** (see [Simulation Mode Notes](#simulation-mode-notes)).
102+
103+
### Benchmark Lifecycle
104+
105+
From your integration's perspective, the lifecycle is:
106+
107+
1. **Initialize** the library
108+
2. **Check** if running under CodSpeed instrumentation
109+
3. **Register** your integration's name and version
110+
4. **For each benchmark:**
111+
- Start the benchmark measurement
112+
- Execute the benchmarked code (inside a [`__codspeed_root_frame__`](#codspeed-root-frame))
113+
- Stop the benchmark measurement
114+
- Report which benchmark was executed
115+
116+
4. **Clean up**
117+
118+
## Integration Walkthrough
119+
120+
### 1. Initialize
121+
122+
```c
123+
InstrumentHooks *hooks = instrument_hooks_init();
124+
if (!hooks) {
125+
// Initialization failed — handle error
126+
return 1;
127+
}
128+
```
129+
130+
### 2. Check if Instrumented
131+
132+
```c
133+
if (instrument_hooks_is_instrumented(hooks)) {
134+
// Running under CodSpeed — enable measurement code paths
135+
}
136+
```
137+
138+
When `is_instrumented()` returns `false`, your integration should fall back to the framework's normal benchmarking behavior. When `true`, the CodSpeed runner is active and all `instrument-hooks` calls will communicate with it.
139+
140+
### 3. Register Your Integration
141+
142+
```c
143+
instrument_hooks_set_integration(hooks, "my-framework-codspeed", "1.0.0");
144+
```
145+
146+
This metadata helps CodSpeed identify which integration produced the results.
147+
148+
### 4. Run a Benchmark
149+
150+
```c
151+
// Start measurement — tells the runner to begin recording
152+
if (instrument_hooks_start_benchmark(hooks) != 0) {
153+
// handle error
154+
}
155+
156+
// Execute the benchmark inside __codspeed_root_frame__ (see below)
157+
run_benchmark();
158+
159+
// Stop measurement — tells the runner to stop recording
160+
if (instrument_hooks_stop_benchmark(hooks) != 0) {
161+
// handle error
162+
}
163+
```
164+
165+
### 5. Report the Benchmark
166+
167+
```c
168+
instrument_hooks_set_executed_benchmark(hooks, getpid(), "path/to/bench.rs::bench_name");
169+
```
170+
171+
See [URI Convention](#uri-convention) for the expected format.
172+
173+
### 6. Clean Up
174+
175+
```c
176+
instrument_hooks_deinit(hooks);
177+
```
178+
179+
### CodSpeed Root Frame
180+
181+
For flamegraphs to work correctly, the actual benchmark code must execute inside a function named with the `__codspeed_root_frame__` prefix. This function acts as the root of the flamegraph — everything inside it is attributed to the benchmark, everything outside is filtered out.
182+
183+
**Requirements:**
184+
- The function name must start with `__codspeed_root_frame__`
185+
- It must **not** be inlined (use `__attribute__((noinline))`, `#[inline(never)]`, or equivalent)
186+
- It must wrap the actual benchmark execution (the code being measured)
187+
188+
**C/C++ example:**
189+
190+
```c
191+
__attribute__((noinline))
192+
void __codspeed_root_frame__run(void (*benchmark_fn)(void)) {
193+
benchmark_fn();
194+
}
195+
```
196+
197+
**Rust example** (from the Criterion integration):
198+
199+
```rust
200+
#[inline(never)]
201+
pub fn __codspeed_root_frame__iter<O, R>(&mut self, mut routine: R)
202+
where
203+
R: FnMut() -> O,
204+
{
205+
let bench_start = InstrumentHooks::current_timestamp();
206+
for _ in 0..self.iters {
207+
black_box(routine());
208+
}
209+
let bench_end = InstrumentHooks::current_timestamp();
210+
InstrumentHooks::instance().add_benchmark_timestamps(bench_start, bench_end);
211+
}
212+
213+
// Public API delegates to the root frame function:
214+
#[inline(never)]
215+
pub fn iter<O, R>(&mut self, routine: R) {
216+
self.__codspeed_root_frame__iter(routine)
217+
}
218+
```
219+
220+
The pattern is: your public API method delegates to a `__codspeed_root_frame__`-prefixed implementation that contains all the measurement logic.
221+
222+
## URI Convention
223+
224+
The benchmark URI passed to `set_executed_benchmark` should follow this format:
225+
226+
```
227+
{git_relative_file_path}::{benchmark_name_components}
228+
```
229+
230+
- **`git_relative_file_path`**: Path to the benchmark file, relative to the git repository root
231+
- **`benchmark_name_components`**: Benchmark identifiers separated by `::`, optionally with parameters in `[]`
232+
233+
**Examples:**
234+
235+
```
236+
benches/my_bench.rs::group_name::bench_function
237+
benches/my_bench.rs::group_name::bench_function[parameter_value]
238+
bench_test.go::BenchmarkSort::BySize[100]
239+
```
240+
241+
For reference, see how existing integrations construct URIs:
242+
- **Rust/Criterion**: `{file}::{macro_group}::{bench_id}[::function][params]`
243+
- **Rust/Divan**: `{file}::{module_path}::{bench_name}[type, arg]`
244+
- **Go**: `{file}::{sub_bench_components}`
245+
246+
## Precise Flamegraphs (Optional)
247+
248+
By default, the flamegraph shows everything that happened between `start_benchmark()` and `stop_benchmark()`. This is often good enough.
249+
250+
For more precise flamegraphs, you can add **benchmark markers** that mark exactly when the benchmarked code was running, excluding setup and teardown code within the measurement window.
251+
252+
This is **only relevant for walltime** — CPU Simulation does not use markers for flamegraphs.
253+
254+
### How It Works
255+
256+
1. Capture a timestamp **before** the benchmarked code runs
257+
2. Execute the benchmark
258+
3. Capture a timestamp **after** the benchmarked code runs
259+
4. Send both timestamps as `BENCHMARK_START` and `BENCHMARK_END` markers
260+
261+
```c
262+
uint32_t pid = getpid();
263+
264+
// Inside the measurement window (between start_benchmark/stop_benchmark):
265+
for (int i = 0; i < iterations; i++) {
266+
expensive_setup(); // This will be EXCLUDED from the flamegraph
267+
268+
uint64_t start_time = instrument_hooks_current_timestamp();
269+
benchmark_function(); // This will be INCLUDED in the flamegraph
270+
uint64_t end_time = instrument_hooks_current_timestamp();
271+
272+
instrument_hooks_add_marker(hooks, pid, MARKER_TYPE_BENCHMARK_START, start_time);
273+
instrument_hooks_add_marker(hooks, pid, MARKER_TYPE_BENCHMARK_END, end_time);
274+
}
275+
```
276+
277+
You can add multiple pairs of `BENCHMARK_START` / `BENCHMARK_END` markers within a single benchmark — for example, one pair per iteration.
278+
279+
### Marker Ordering Rules
280+
281+
Markers must follow this strict ordering:
282+
283+
```
284+
start_benchmark()
285+
└─ BENCHMARK_START(t1)
286+
└─ BENCHMARK_END(t2) // t2 > t1
287+
└─ BENCHMARK_START(t3) // t3 > t2 (optional, more iterations)
288+
└─ BENCHMARK_END(t4) // t4 > t3
289+
└─ ...
290+
stop_benchmark()
291+
```
292+
293+
- Every `BENCHMARK_START` must have a matching `BENCHMARK_END`
294+
- Markers must be in chronological order
295+
- Markers are optional — if you don't add any, the entire `start_benchmark` / `stop_benchmark` window is used
296+
297+
## Simulation Mode Notes
298+
299+
In CPU Simulation mode, the measurement works differently from walltime. The key thing to know:
300+
301+
**`start_benchmark()` and `stop_benchmark()` must be as close as possible to the actual benchmark code.** In simulation mode, the simulator counts every instruction between start and stop — any framework overhead (setup, teardown, bookkeeping) will be included in the measurement and distort the results.
302+
303+
For reference on how existing integrations handle this:
304+
- **Rust/Criterion**: [`crates/criterion_compat/criterion_fork/src/routine.rs`](https://github.com/CodSpeedHQ/codspeed-rust/blob/main/crates/criterion_compat/criterion_fork/src/routine.rs)`start_benchmark()` and `stop_benchmark()` wrap only the benchmark execution
305+
- **C++/Google Benchmark**: [`google_benchmark/src/benchmark_runner.cc`](https://github.com/CodSpeedHQ/codspeed-cpp/blob/main/google_benchmark/src/benchmark_runner.cc)
306+
307+
Markers (`add_marker`) are **not needed** for simulation mode.
308+
309+
## Testing Your Integration
310+
311+
### Basic Verification
312+
313+
Run your integration with CodSpeed using the `--skip-upload` flag to test locally without sending data:
314+
315+
```bash
316+
codspeed run --skip-upload -- <your_benchmark_command>
317+
```
318+
319+
Check that:
320+
- `is_instrumented()` returns `true`
321+
- Benchmarks execute without errors
322+
- The output shows your benchmarks being detected
323+
324+
### Full Test
325+
326+
Once the basic flow works, try without `--skip-upload`:
327+
328+
```bash
329+
codspeed run -- <your_benchmark_command>
330+
```
331+
332+
This will attempt to upload results to CodSpeed, verifying the full pipeline.
333+
334+
### Getting Help
335+
336+
If you run into issues, reach out on [Discord](https://discord.com/invite/MxpaCfKSqF) or by email.
337+
338+
## Common Pitfalls
339+
340+
### Marker Ordering Violations
341+
342+
The backend strictly validates marker ordering. Every `BENCHMARK_START` must be followed by a `BENCHMARK_END` before the next `BENCHMARK_START`. Unclosed or out-of-order markers will cause errors.
343+
344+
### Simulation: Start/Stop Distance
345+
346+
In CPU Simulation mode, every instruction between `start_benchmark()` and `stop_benchmark()` is counted. If your framework does bookkeeping, memory allocation, or logging between these calls, it will show up in the measurement. Keep the window tight around the actual benchmark code.
347+
348+
### Function Return Values
349+
350+
All `instrument_hooks_*` functions return `uint8_t` where `0` means success. Always check return values — a non-zero return indicates communication with the runner failed.
351+
352+
### Root Frame Optimization
353+
354+
If `__codspeed_root_frame__` gets inlined by the compiler, flamegraphs won't have a clean root. Always mark it as `noinline`. In C/C++, use `__attribute__((noinline))`. In Rust, use `#[inline(never)]`.

README.md

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,9 @@ Zig library to control instrumentations via IPC.
1313
- **Zig**: 0.14
1414
- [**Just**](https://github.com/casey/just) (optional): To easily run the build, formatter or tests
1515

16-
## How to add new integration?
16+
## Adding CodSpeed support for a new language
1717

18-
This library is intended to be used as a C library. The main source file is in `dist/core.c` and the headers are in `includes/`. See `examples/main.c` for an example on how to use it.
19-
20-
To test if it worked, call `is_instrumented` which should return `false` when running without Codspeed. To run with Codspeed, execute the following:
21-
```
22-
codspeed run -- <your_cmd>
23-
```
24-
25-
To make sure your integration is fully working, you have to implement all these hooks:
26-
- start_benchmark: Call this when the benchmark starts, to start measuring the performance.
27-
- stop_benchmark: Stop measuring the performance after the benchmark stopped.
28-
- set_executed_benchmark: Provide metadata about which benchmark was executed.
29-
- set_integration: Provide metadata about the integration.
18+
To integrate CodSpeed with a new language or benchmarking framework, you need to build a **custom harness** on top of `instrument-hooks`. See the **[custom harness guide](./CUSTOM_HARNESS.md)** for a step-by-step walkthrough, including a copy-paste prompt for setting it up with an AI agent. A minimal C harness is available in [`example/`](./example/).
3019

3120
## Run tests
3221

0 commit comments

Comments
 (0)