|
| 1 | +# Building a Custom Harness |
| 2 | + |
| 3 | +This guide is for developers building a CodSpeed integration ("custom harness") for a new language or benchmarking framework. It explains how to use the `instrument-hooks` C library to connect your benchmarks to the CodSpeed runner. |
| 4 | + |
| 5 | +A minimal working C harness lives in [`example/`](./example/) — refer to it alongside this guide. |
| 6 | + |
| 7 | +For existing integrations you can reference as examples, see: |
| 8 | +- [codspeed-rust](https://github.com/CodSpeedHQ/codspeed-rust) (Criterion, Divan) |
| 9 | +- [codspeed-cpp](https://github.com/CodSpeedHQ/codspeed-cpp) (Google Benchmark) |
| 10 | +- [codspeed-go](https://github.com/CodSpeedHQ/codspeed-go) |
| 11 | + |
| 12 | +## Let Your Agent Build the Integration |
| 13 | + |
| 14 | +Copy this block and paste it to your AI assistant to scaffold an instrument-hooks integration: |
| 15 | + |
| 16 | +```text |
| 17 | +I want to build a CodSpeed integration for [LANGUAGE/FRAMEWORK] using the instrument-hooks C library. |
| 18 | +
|
| 19 | +Repository: https://github.com/CodSpeedHQ/instrument-hooks |
| 20 | +Read the full guide: CUSTOM_HARNESS.md in that repo. |
| 21 | +
|
| 22 | +Reference integrations to study: |
| 23 | +- Rust: https://github.com/CodSpeedHQ/codspeed-rust |
| 24 | +- C++: https://github.com/CodSpeedHQ/codspeed-cpp |
| 25 | +- Go: https://github.com/CodSpeedHQ/codspeed-go |
| 26 | +
|
| 27 | +What instrument-hooks is: |
| 28 | +- Single-file C library (dist/core.c + includes/) that bridges benchmark integrations with the CodSpeed runner via IPC |
| 29 | +- Supports CPU Simulation (Callgrind) and Walltime (perf) measurement modes — auto-detected, the integration doesn't choose |
| 30 | +
|
| 31 | +What I need you to do: |
| 32 | +1. Add instrument-hooks to my project as a git submodule (or fetch script for dist/ + includes/) |
| 33 | +2. Set up the build to compile dist/core.c with warning suppression flags (see Build Notes in the guide) |
| 34 | +3. Implement the benchmark lifecycle via FFI: |
| 35 | + a. instrument_hooks_init() → check for NULL |
| 36 | + b. instrument_hooks_is_instrumented() → gate CodSpeed-specific code paths |
| 37 | + c. instrument_hooks_set_integration(name, version) → register metadata |
| 38 | + d. instrument_hooks_start_benchmark() / instrument_hooks_stop_benchmark() → wrap benchmark execution |
| 39 | + e. instrument_hooks_set_executed_benchmark(pid, uri) → report what ran |
| 40 | + f. instrument_hooks_deinit() → clean up |
| 41 | +4. Implement __codspeed_root_frame__: |
| 42 | + - The benchmarked code MUST execute inside a function whose name starts with __codspeed_root_frame__ |
| 43 | + - This function MUST be marked noinline (__attribute__((noinline)), #[inline(never)], etc.) |
| 44 | + - This is required for flamegraphs to have a clean root |
| 45 | +5. Construct benchmark URIs in the format: {git_relative_file_path}::{benchmark_name}[optional_params] |
| 46 | +6. Test with: codspeed run --skip-upload -- <benchmark_command> |
| 47 | +
|
| 48 | +Critical rules: |
| 49 | +- All functions return uint8_t where 0 = success. Always check return values. |
| 50 | +- For CPU Simulation: start_benchmark/stop_benchmark must be as CLOSE as possible to the actual benchmark code (every instruction between them is counted) |
| 51 | +- Benchmark markers (add_marker with BENCHMARK_START/END) are OPTIONAL and only relevant for Walltime flamegraph precision — skip them for a first implementation |
| 52 | +- If using markers: every BENCHMARK_START must have a matching BENCHMARK_END, in chronological order |
| 53 | +
|
| 54 | +My setup: |
| 55 | +- Language: [FILL IN] |
| 56 | +- Benchmarking framework: [FILL IN] |
| 57 | +- Build system: [FILL IN] |
| 58 | +``` |
| 59 | + |
| 60 | +## Getting the Library |
| 61 | + |
| 62 | +The library is distributed as a single C file (`dist/core.c`) plus headers (`includes/`). |
| 63 | + |
| 64 | +**Preferred: Git submodule** |
| 65 | + |
| 66 | +```bash |
| 67 | +git submodule add https://github.com/CodSpeedHQ/instrument-hooks.git |
| 68 | +``` |
| 69 | + |
| 70 | +Then reference `instrument-hooks/dist/core.c` and `instrument-hooks/includes/` in your build system. |
| 71 | + |
| 72 | +**Alternative: Fetch script** |
| 73 | + |
| 74 | +If your language's build system doesn't support submodules well, write a small script that downloads the `dist/` and `includes/` directories from a pinned release. |
| 75 | + |
| 76 | +## Build Notes |
| 77 | + |
| 78 | +The generated `dist/core.c` produces compiler warnings that are harmless. Suppress them in your build: |
| 79 | + |
| 80 | +**GCC/Clang:** |
| 81 | +``` |
| 82 | +-Wno-maybe-uninitialized -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-type-limits |
| 83 | +``` |
| 84 | + |
| 85 | +**MSVC:** |
| 86 | +``` |
| 87 | +/wd4101 /wd4189 /wd4100 /wd4245 /wd4132 /wd4146 |
| 88 | +``` |
| 89 | + |
| 90 | +See the [example CMakeLists.txt](CMakeLists.txt) for a complete build configuration. |
| 91 | + |
| 92 | +## Concepts |
| 93 | + |
| 94 | +### CPU Simulation vs Walltime |
| 95 | + |
| 96 | +CodSpeed supports two main measurement instruments. The choice is made by the user when configuring their CI — your integration doesn't need to detect or switch between them. However, understanding the difference matters for how you structure your integration code. |
| 97 | + |
| 98 | +- **CPU Simulation**: Simulates CPU behavior to measure performance. Hardware-agnostic and deterministic. Best for small, CPU-bound workloads. See [CPU Simulation docs](https://codspeed.io/docs/instruments/cpu-simulation). |
| 99 | +- **Walltime**: Measures real elapsed time on bare-metal runners with low noise. Supports flamegraphs and profiling. Best for I/O-heavy or longer-running benchmarks. See [Walltime docs](https://codspeed.io/docs/instruments/walltime). |
| 100 | + |
| 101 | +Both instruments are supported through `instrument-hooks`. The main difference for integration authors is that **CPU Simulation requires `start_benchmark` / `stop_benchmark` to be as close as possible to the actual benchmark code** (see [Simulation Mode Notes](#simulation-mode-notes)). |
| 102 | + |
| 103 | +### Benchmark Lifecycle |
| 104 | + |
| 105 | +From your integration's perspective, the lifecycle is: |
| 106 | + |
| 107 | +1. **Initialize** the library |
| 108 | +2. **Check** if running under CodSpeed instrumentation |
| 109 | +3. **Register** your integration's name and version |
| 110 | +4. **For each benchmark:** |
| 111 | + - Start the benchmark measurement |
| 112 | + - Execute the benchmarked code (inside a [`__codspeed_root_frame__`](#codspeed-root-frame)) |
| 113 | + - Stop the benchmark measurement |
| 114 | + - Report which benchmark was executed |
| 115 | + |
| 116 | +4. **Clean up** |
| 117 | + |
| 118 | +## Integration Walkthrough |
| 119 | + |
| 120 | +### 1. Initialize |
| 121 | + |
| 122 | +```c |
| 123 | +InstrumentHooks *hooks = instrument_hooks_init(); |
| 124 | +if (!hooks) { |
| 125 | + // Initialization failed — handle error |
| 126 | + return 1; |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +### 2. Check if Instrumented |
| 131 | + |
| 132 | +```c |
| 133 | +if (instrument_hooks_is_instrumented(hooks)) { |
| 134 | + // Running under CodSpeed — enable measurement code paths |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +When `is_instrumented()` returns `false`, your integration should fall back to the framework's normal benchmarking behavior. When `true`, the CodSpeed runner is active and all `instrument-hooks` calls will communicate with it. |
| 139 | + |
| 140 | +### 3. Register Your Integration |
| 141 | + |
| 142 | +```c |
| 143 | +instrument_hooks_set_integration(hooks, "my-framework-codspeed", "1.0.0"); |
| 144 | +``` |
| 145 | +
|
| 146 | +This metadata helps CodSpeed identify which integration produced the results. |
| 147 | +
|
| 148 | +### 4. Run a Benchmark |
| 149 | +
|
| 150 | +```c |
| 151 | +// Start measurement — tells the runner to begin recording |
| 152 | +if (instrument_hooks_start_benchmark(hooks) != 0) { |
| 153 | + // handle error |
| 154 | +} |
| 155 | +
|
| 156 | +// Execute the benchmark inside __codspeed_root_frame__ (see below) |
| 157 | +run_benchmark(); |
| 158 | +
|
| 159 | +// Stop measurement — tells the runner to stop recording |
| 160 | +if (instrument_hooks_stop_benchmark(hooks) != 0) { |
| 161 | + // handle error |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +### 5. Report the Benchmark |
| 166 | + |
| 167 | +```c |
| 168 | +instrument_hooks_set_executed_benchmark(hooks, getpid(), "path/to/bench.rs::bench_name"); |
| 169 | +``` |
| 170 | +
|
| 171 | +See [URI Convention](#uri-convention) for the expected format. |
| 172 | +
|
| 173 | +### 6. Clean Up |
| 174 | +
|
| 175 | +```c |
| 176 | +instrument_hooks_deinit(hooks); |
| 177 | +``` |
| 178 | + |
| 179 | +### CodSpeed Root Frame |
| 180 | + |
| 181 | +For flamegraphs to work correctly, the actual benchmark code must execute inside a function named with the `__codspeed_root_frame__` prefix. This function acts as the root of the flamegraph — everything inside it is attributed to the benchmark, everything outside is filtered out. |
| 182 | + |
| 183 | +**Requirements:** |
| 184 | +- The function name must start with `__codspeed_root_frame__` |
| 185 | +- It must **not** be inlined (use `__attribute__((noinline))`, `#[inline(never)]`, or equivalent) |
| 186 | +- It must wrap the actual benchmark execution (the code being measured) |
| 187 | + |
| 188 | +**C/C++ example:** |
| 189 | + |
| 190 | +```c |
| 191 | +__attribute__((noinline)) |
| 192 | +void __codspeed_root_frame__run(void (*benchmark_fn)(void)) { |
| 193 | + benchmark_fn(); |
| 194 | +} |
| 195 | +``` |
| 196 | +
|
| 197 | +**Rust example** (from the Criterion integration): |
| 198 | +
|
| 199 | +```rust |
| 200 | +#[inline(never)] |
| 201 | +pub fn __codspeed_root_frame__iter<O, R>(&mut self, mut routine: R) |
| 202 | +where |
| 203 | + R: FnMut() -> O, |
| 204 | +{ |
| 205 | + let bench_start = InstrumentHooks::current_timestamp(); |
| 206 | + for _ in 0..self.iters { |
| 207 | + black_box(routine()); |
| 208 | + } |
| 209 | + let bench_end = InstrumentHooks::current_timestamp(); |
| 210 | + InstrumentHooks::instance().add_benchmark_timestamps(bench_start, bench_end); |
| 211 | +} |
| 212 | +
|
| 213 | +// Public API delegates to the root frame function: |
| 214 | +#[inline(never)] |
| 215 | +pub fn iter<O, R>(&mut self, routine: R) { |
| 216 | + self.__codspeed_root_frame__iter(routine) |
| 217 | +} |
| 218 | +``` |
| 219 | + |
| 220 | +The pattern is: your public API method delegates to a `__codspeed_root_frame__`-prefixed implementation that contains all the measurement logic. |
| 221 | + |
| 222 | +## URI Convention |
| 223 | + |
| 224 | +The benchmark URI passed to `set_executed_benchmark` should follow this format: |
| 225 | + |
| 226 | +``` |
| 227 | +{git_relative_file_path}::{benchmark_name_components} |
| 228 | +``` |
| 229 | + |
| 230 | +- **`git_relative_file_path`**: Path to the benchmark file, relative to the git repository root |
| 231 | +- **`benchmark_name_components`**: Benchmark identifiers separated by `::`, optionally with parameters in `[]` |
| 232 | + |
| 233 | +**Examples:** |
| 234 | + |
| 235 | +``` |
| 236 | +benches/my_bench.rs::group_name::bench_function |
| 237 | +benches/my_bench.rs::group_name::bench_function[parameter_value] |
| 238 | +bench_test.go::BenchmarkSort::BySize[100] |
| 239 | +``` |
| 240 | + |
| 241 | +For reference, see how existing integrations construct URIs: |
| 242 | +- **Rust/Criterion**: `{file}::{macro_group}::{bench_id}[::function][params]` |
| 243 | +- **Rust/Divan**: `{file}::{module_path}::{bench_name}[type, arg]` |
| 244 | +- **Go**: `{file}::{sub_bench_components}` |
| 245 | + |
| 246 | +## Precise Flamegraphs (Optional) |
| 247 | + |
| 248 | +By default, the flamegraph shows everything that happened between `start_benchmark()` and `stop_benchmark()`. This is often good enough. |
| 249 | + |
| 250 | +For more precise flamegraphs, you can add **benchmark markers** that mark exactly when the benchmarked code was running, excluding setup and teardown code within the measurement window. |
| 251 | + |
| 252 | +This is **only relevant for walltime** — CPU Simulation does not use markers for flamegraphs. |
| 253 | + |
| 254 | +### How It Works |
| 255 | + |
| 256 | +1. Capture a timestamp **before** the benchmarked code runs |
| 257 | +2. Execute the benchmark |
| 258 | +3. Capture a timestamp **after** the benchmarked code runs |
| 259 | +4. Send both timestamps as `BENCHMARK_START` and `BENCHMARK_END` markers |
| 260 | + |
| 261 | +```c |
| 262 | +uint32_t pid = getpid(); |
| 263 | + |
| 264 | +// Inside the measurement window (between start_benchmark/stop_benchmark): |
| 265 | +for (int i = 0; i < iterations; i++) { |
| 266 | + expensive_setup(); // This will be EXCLUDED from the flamegraph |
| 267 | + |
| 268 | + uint64_t start_time = instrument_hooks_current_timestamp(); |
| 269 | + benchmark_function(); // This will be INCLUDED in the flamegraph |
| 270 | + uint64_t end_time = instrument_hooks_current_timestamp(); |
| 271 | + |
| 272 | + instrument_hooks_add_marker(hooks, pid, MARKER_TYPE_BENCHMARK_START, start_time); |
| 273 | + instrument_hooks_add_marker(hooks, pid, MARKER_TYPE_BENCHMARK_END, end_time); |
| 274 | +} |
| 275 | +``` |
| 276 | + |
| 277 | +You can add multiple pairs of `BENCHMARK_START` / `BENCHMARK_END` markers within a single benchmark — for example, one pair per iteration. |
| 278 | + |
| 279 | +### Marker Ordering Rules |
| 280 | + |
| 281 | +Markers must follow this strict ordering: |
| 282 | + |
| 283 | +``` |
| 284 | +start_benchmark() |
| 285 | + └─ BENCHMARK_START(t1) |
| 286 | + └─ BENCHMARK_END(t2) // t2 > t1 |
| 287 | + └─ BENCHMARK_START(t3) // t3 > t2 (optional, more iterations) |
| 288 | + └─ BENCHMARK_END(t4) // t4 > t3 |
| 289 | + └─ ... |
| 290 | +stop_benchmark() |
| 291 | +``` |
| 292 | + |
| 293 | +- Every `BENCHMARK_START` must have a matching `BENCHMARK_END` |
| 294 | +- Markers must be in chronological order |
| 295 | +- Markers are optional — if you don't add any, the entire `start_benchmark` / `stop_benchmark` window is used |
| 296 | + |
| 297 | +## Simulation Mode Notes |
| 298 | + |
| 299 | +In CPU Simulation mode, the measurement works differently from walltime. The key thing to know: |
| 300 | + |
| 301 | +**`start_benchmark()` and `stop_benchmark()` must be as close as possible to the actual benchmark code.** In simulation mode, the simulator counts every instruction between start and stop — any framework overhead (setup, teardown, bookkeeping) will be included in the measurement and distort the results. |
| 302 | + |
| 303 | +For reference on how existing integrations handle this: |
| 304 | +- **Rust/Criterion**: [`crates/criterion_compat/criterion_fork/src/routine.rs`](https://github.com/CodSpeedHQ/codspeed-rust/blob/main/crates/criterion_compat/criterion_fork/src/routine.rs) — `start_benchmark()` and `stop_benchmark()` wrap only the benchmark execution |
| 305 | +- **C++/Google Benchmark**: [`google_benchmark/src/benchmark_runner.cc`](https://github.com/CodSpeedHQ/codspeed-cpp/blob/main/google_benchmark/src/benchmark_runner.cc) |
| 306 | + |
| 307 | +Markers (`add_marker`) are **not needed** for simulation mode. |
| 308 | + |
| 309 | +## Testing Your Integration |
| 310 | + |
| 311 | +### Basic Verification |
| 312 | + |
| 313 | +Run your integration with CodSpeed using the `--skip-upload` flag to test locally without sending data: |
| 314 | + |
| 315 | +```bash |
| 316 | +codspeed run --skip-upload -- <your_benchmark_command> |
| 317 | +``` |
| 318 | + |
| 319 | +Check that: |
| 320 | +- `is_instrumented()` returns `true` |
| 321 | +- Benchmarks execute without errors |
| 322 | +- The output shows your benchmarks being detected |
| 323 | + |
| 324 | +### Full Test |
| 325 | + |
| 326 | +Once the basic flow works, try without `--skip-upload`: |
| 327 | + |
| 328 | +```bash |
| 329 | +codspeed run -- <your_benchmark_command> |
| 330 | +``` |
| 331 | + |
| 332 | +This will attempt to upload results to CodSpeed, verifying the full pipeline. |
| 333 | + |
| 334 | +### Getting Help |
| 335 | + |
| 336 | +If you run into issues, reach out on [Discord](https://discord.com/invite/MxpaCfKSqF) or by email. |
| 337 | + |
| 338 | +## Common Pitfalls |
| 339 | + |
| 340 | +### Marker Ordering Violations |
| 341 | + |
| 342 | +The backend strictly validates marker ordering. Every `BENCHMARK_START` must be followed by a `BENCHMARK_END` before the next `BENCHMARK_START`. Unclosed or out-of-order markers will cause errors. |
| 343 | + |
| 344 | +### Simulation: Start/Stop Distance |
| 345 | + |
| 346 | +In CPU Simulation mode, every instruction between `start_benchmark()` and `stop_benchmark()` is counted. If your framework does bookkeeping, memory allocation, or logging between these calls, it will show up in the measurement. Keep the window tight around the actual benchmark code. |
| 347 | + |
| 348 | +### Function Return Values |
| 349 | + |
| 350 | +All `instrument_hooks_*` functions return `uint8_t` where `0` means success. Always check return values — a non-zero return indicates communication with the runner failed. |
| 351 | + |
| 352 | +### Root Frame Optimization |
| 353 | + |
| 354 | +If `__codspeed_root_frame__` gets inlined by the compiler, flamegraphs won't have a clean root. Always mark it as `noinline`. In C/C++, use `__attribute__((noinline))`. In Rust, use `#[inline(never)]`. |
0 commit comments