Skip to content

Commit 01bb72d

Browse files
committed
docs(bench): use multi-run medians and correct version label
Archive 5 runs per side under runs/, report medians with run-to-run spread as the noise floor. The prior single-run 64-subscriber regression was a statistical outlier (within 1% across 5 runs per side). Rename the directory to match the v3.1.0 release label.
1 parent a08f609 commit 01bb72d

14 files changed

Lines changed: 238 additions & 106 deletions

File tree

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# EventDispatcher Bench, v3.1.0
2+
3+
Before/after numbers for the lock-free COW snapshot `emit()` landed in v3.1.0.
4+
The previous implementation used `std::shared_mutex` for `emit()` / `emit_safe()`
5+
and an exclusive lock for `subscribe()` / `unsubscribe()`. The new implementation
6+
stores handlers in a `std::atomic<std::shared_ptr<const std::vector<Entry>>>`
7+
snapshot published on mutation, with a lock-free atomic handler-count fast
8+
path for the zero-subscriber case.
9+
10+
## Results (median of 5 runs per side)
11+
12+
| Scenario | Subs | Before (ns/op) | After (ns/op) | Delta | Verdict |
13+
| --------------------------- | ---: | -------------: | ------------: | ----------------- | :------ |
14+
| `emit` | 0 | 103.9 | **6.0** | **-94.2% (17x)** | REAL |
15+
| `emit` | 1 | 120.1 | 94.4 | **-21.4%** | REAL |
16+
| `emit` | 8 | 245.6 | 216.3 | **-11.9%** | REAL |
17+
| `emit` | 64 | 1103.5 | 1092.1 | -1.0% | NOISE |
18+
| `emit_safe` | 0 | 103.1 | **5.7** | **-94.5% (18x)** | REAL |
19+
| `emit_safe` | 1 | 118.6 | 96.4 | **-18.7%** | REAL |
20+
| `emit_safe` | 8 | 233.2 | 219.1 | -6.0% | REAL |
21+
| `emit_safe` | 64 | 1086.3 | 1099.8 | +1.2% | NOISE |
22+
| `emit_concurrent_4_threads` | 8 | 517.9 | **248.2** | **-52.1% (2.1x)** | REAL |
23+
| `subscribe_unsub_roundtrip` || 446.0 | 1150.4 | +158.0% | REAL |
24+
| `reentrancy_rejection` | 1 | 212.5 | 192.7 | -9.4% | marginal|
25+
26+
Verdict key:
27+
28+
- **REAL**: median delta exceeds 1.5x the combined run-to-run spread on both sides.
29+
- **NOISE**: median delta is smaller than the run-to-run spread; cannot be distinguished from measurement jitter.
30+
- **marginal**: delta is larger than spread but smaller than 1.5x spread.
31+
32+
Run-to-run coefficient of variation was 1% to 5% per scenario. Full per-run
33+
TSVs live in [runs/](runs/) (5 OLD + 5 NEW). A representative single run per
34+
side is preserved in [before.tsv](before.tsv) and [after.tsv](after.tsv) for
35+
quick reference.
36+
37+
## Interpretation
38+
39+
**Zero-subscriber fast path.** The atomic handler-count short-circuit in
40+
`emit()` / `emit_safe()` collapses a `shared_mutex` acquire/release plus
41+
iteration setup into a single `memory_order_acquire` load of an 8-byte counter.
42+
The 17x factor is the cost of an uncontended `shared_mutex` acquire/release
43+
on Windows SRWLOCK relative to a naked atomic load, and it is the dominant
44+
result for dispatchers that are wired up at init but rarely subscribed to.
45+
46+
**1 to 8 subscriber uncontended emit.** Consistent wins (6% to 21%) from
47+
removing the reader lock. The snapshot load is a release-acquire atomic plus
48+
a `shared_ptr` refcount bump, which is cheaper than touching a mutex's state
49+
word unconditionally.
50+
51+
**Concurrent emit (4 threads, 8 subs).** 2.1x throughput. No reader lock
52+
means no cache-line contention on the mutex state, so all four threads make
53+
progress in parallel instead of serializing on the SRWLOCK read side.
54+
55+
**64 subscriber emit.** Within noise on both `emit` (-1.0%) and `emit_safe`
56+
(+1.2%). An earlier single-run measurement suggested an 18% regression; that
57+
was a statistical outlier. Across 5 runs per side the two implementations
58+
are indistinguishable at this subscriber count: the per-handler iteration
59+
cost dominates and both paths reach the same `std::vector<Entry>` buffer
60+
layout through one extra dereference either way.
61+
62+
**Subscribe / unsubscribe round-trip.** 2.6x slower (446 ns to 1150 ns).
63+
Each mutation allocates a fresh handler vector, appends or removes the
64+
entry, and publishes via atomic store. This is documented in the header
65+
and is the accepted tradeoff for lock-free reads. Subscribe is not on a
66+
hot path in any realistic mod workload.
67+
68+
**Reentrancy rejection.** Marginal improvement (within 1.5x spread). Not a
69+
meaningful claim; effectively unchanged.
70+
71+
## Methodology
72+
73+
- Host: Windows 11, MinGW `mingw-release` preset (GCC 13, libstdc++, -O3 LTO).
74+
- CMake: `cmake --preset mingw-release -DDMK_BUILD_BENCHMARKS=ON -DDMK_BUILD_TESTS=OFF`.
75+
- Build: `DetourModKit_bench` target only. No gtest linkage, no other test deps.
76+
- Each sample runs N iterations of the scenario inside a single
77+
`steady_clock::now()` pair. Reported value is the median per-op cost across
78+
11 samples inside one process invocation. Iteration counts are chosen so
79+
each sample takes roughly the same wall time.
80+
- 5 process invocations per side (OLD vs NEW), back-to-back, same machine,
81+
same thermal state. Tables above report the median across those 5 runs
82+
for each scenario.
83+
- Verdicts use run-to-run spread (max minus min across the 5 runs) as the
84+
noise floor. A claim is "REAL" only when the median delta exceeds 1.5x
85+
that noise floor on both sides.
86+
87+
## Reproduce
88+
89+
```bash
90+
cmake --preset mingw-release -DDMK_BUILD_BENCHMARKS=ON -DDMK_BUILD_TESTS=OFF
91+
PATH="/c/msys64/mingw64/bin:$PATH" cmake --build build/mingw-release --target DetourModKit_bench --parallel
92+
PATH="/c/msys64/mingw64/bin:$PATH" ./build/mingw-release/tests/DetourModKit_bench.exe > run.tsv
93+
```
94+
95+
For a clean before/after comparison, bench the new implementation first,
96+
copy the header aside, `git checkout HEAD -- include/DetourModKit/event_dispatcher.hpp`
97+
to restore the baseline header, rebuild the `DetourModKit_bench` target, run
98+
again into the baseline TSV, then restore the new header. Repeat N times
99+
per side and compare medians with an explicit noise-floor check.
100+
101+
## Caveat on committed TSVs
102+
103+
The TSVs in this directory are raw artifacts from a specific host and
104+
compiler version. They are not a stable baseline. Treat them as evidence
105+
for the claims in this document, not as a regression gate. Future bench
106+
runs should regenerate their own numbers and compare against the structure
107+
of the results (17x fast-path win, 2x concurrent win, COW subscribe cost)
108+
rather than the absolute nanosecond values.
File renamed without changes.

docs/analysis/event_dispatcher_bench_v3.2.0/before.tsv renamed to docs/analysis/event_dispatcher_bench_v3.1.0/before.tsv

File renamed without changes.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 6.06 668
3+
emit 1 5000000 104.53 5715
4+
emit 8 1000000 220.31 2449
5+
emit 64 200000 1126.05 2521
6+
emit_safe 0 10000000 5.71 630
7+
emit_safe 1 5000000 97.17 5387
8+
emit_safe 8 1000000 219.06 2428
9+
emit_safe 64 200000 1120.66 2478
10+
subscribe_unsub_roundtrip 0 100000 1180.98 1313
11+
emit_concurrent_4_threads 8 4000000 245.09 980
12+
reentrancy_rejection 1 500000 203.29 1120
13+
# sink=23939455106
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 6.00 663
3+
emit 1 5000000 96.19 5303
4+
emit 8 1000000 220.13 2431
5+
emit 64 200000 1116.99 2462
6+
emit_safe 0 10000000 5.69 624
7+
emit_safe 1 5000000 96.83 5311
8+
emit_safe 8 1000000 220.24 2438
9+
emit_safe 64 200000 1111.00 2443
10+
subscribe_unsub_roundtrip 0 100000 1156.80 1275
11+
emit_concurrent_4_threads 8 4000000 248.19 992
12+
reentrancy_rejection 1 500000 190.95 1072
13+
# sink=23940408562
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 6.08 675
3+
emit 1 5000000 94.44 5200
4+
emit 8 1000000 215.45 2394
5+
emit 64 200000 1092.06 2412
6+
emit_safe 0 10000000 5.79 641
7+
emit_safe 1 5000000 96.42 5376
8+
emit_safe 8 1000000 216.73 2395
9+
emit_safe 64 200000 1099.84 2487
10+
subscribe_unsub_roundtrip 0 100000 1150.42 1277
11+
emit_concurrent_4_threads 8 4000000 257.35 1029
12+
reentrancy_rejection 1 500000 192.65 1081
13+
# sink=23936874150
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 5.72 627
3+
emit 1 5000000 93.81 5154
4+
emit 8 1000000 216.31 2375
5+
emit 64 200000 1091.75 2408
6+
emit_safe 0 10000000 5.57 614
7+
emit_safe 1 5000000 95.42 5236
8+
emit_safe 8 1000000 220.80 2418
9+
emit_safe 64 200000 1095.09 2407
10+
subscribe_unsub_roundtrip 0 100000 1123.53 1244
11+
emit_concurrent_4_threads 8 4000000 235.05 940
12+
reentrancy_rejection 1 500000 188.63 1060
13+
# sink=23945296277
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 5.80 642
3+
emit 1 5000000 92.25 5104
4+
emit 8 1000000 211.34 2341
5+
emit 64 200000 1085.13 2377
6+
emit_safe 0 10000000 5.67 618
7+
emit_safe 1 5000000 93.57 5143
8+
emit_safe 8 1000000 218.15 2393
9+
emit_safe 64 200000 1082.98 2385
10+
subscribe_unsub_roundtrip 0 100000 1127.63 1247
11+
emit_concurrent_4_threads 8 4000000 255.91 1023
12+
reentrancy_rejection 1 500000 193.98 1071
13+
# sink=23933756560
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 106.61 11705
3+
emit 1 5000000 128.61 7036
4+
emit 8 1000000 249.45 2768
5+
emit 64 200000 1139.86 2500
6+
emit_safe 0 10000000 105.04 11582
7+
emit_safe 1 5000000 120.53 6652
8+
emit_safe 8 1000000 241.97 2673
9+
emit_safe 64 200000 1093.30 2430
10+
subscribe_unsub_roundtrip 0 100000 469.98 514
11+
emit_concurrent_4_threads 8 4000000 519.06 2076
12+
reentrancy_rejection 1 500000 203.05 1151
13+
# sink=24040673944
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
scenario subscribers iterations median_ns_per_op total_ms
2+
emit 0 10000000 106.19 11701
3+
emit 1 5000000 120.09 6623
4+
emit 8 1000000 245.57 2706
5+
emit 64 200000 1109.16 2439
6+
emit_safe 0 10000000 105.08 11571
7+
emit_safe 1 5000000 118.56 6567
8+
emit_safe 8 1000000 233.16 2585
9+
emit_safe 64 200000 1081.20 2374
10+
subscribe_unsub_roundtrip 0 100000 444.50 488
11+
emit_concurrent_4_threads 8 4000000 509.16 2036
12+
reentrancy_rejection 1 500000 212.53 1178
13+
# sink=24038786247

0 commit comments

Comments
 (0)