Inter-process RTT benchmark. Two processes, two SPSC ring buffers over memfd shared memory. The hot path contains no
syscalls, no allocations, and no kernel involvement after the UDS handshake.
Hardware: i7-12650H · DDR5-5600 · Fedora 43 · Linux 6.19.11 · cores 8 and 9 pinned ·SCHED_FIFO priority 99 ·
mlockall · no isolcpus.
┌─────────────────────────────────────────────────┐
│ Tachyon SHM — inter-process RTT benchmark │
│ Payload: 32 bytes Samples: 1000000 │
│ Cores: ping= 8 pong= 9 rdtsc / spin-only │
├──────────────────────────────────┬──────────────┤
│ Metric │ RTT (ns) │
├──────────────────────────────────┼──────────────┤
│ Min │ 51.3 │
│ p50 (median) │ 56.5 │
│ p90 │ 101.2 │
│ p99 │ 112.4 │
│ p99.9 │ 122.0 │
│ p99.99 │ 467.3 │
│ Max │ 4938.0 │
├──────────────────────────────────┼──────────────┤
│ One-way p50 estimate │ 28.3 │
│ Throughput (K RTT/s) │ 13229.0 │
└──────────────────────────────────┴──────────────┘
One-way p50: 28.3 ns. p99.99 at 467.3 ns is scheduler jitter — isolcpus=8,9 brings it below 200 ns.
Requires GCC ≥ 14 or Clang ≥ 17.
for dir in ping pong; do
cmake -S examples/cpp_producer_cpp_consumer/$dir \
-B examples/cpp_producer_cpp_consumer/$dir/build \
-DCMAKE_BUILD_TYPE=Release
cmake --build examples/cpp_producer_cpp_consumer/$dir/build -j$(nproc)
donerm -f /tmp/tachyon_pa.sock /tmp/tachyon_ap.sock
# terminal 1
sudo ./pong/build/tachyon_pong
# terminal 2
sudo ./ping/build/tachyon_pingStart order is arbitrary — both sides retry until connected. sudo is required
for SCHED_FIFO and mlockall; omitting it degrades tail latency.
tachyon_bus_listen() blocks on accept(). With two buses in opposing
directions a sequential handshake deadlocks. Each process runs listen in a
background thread while main retries connect at 50 ms intervals.
ping pong
──── ────
thread → listen(SOCK_PA) ←─── connect(SOCK_PA) retrying
connect(SOCK_AP) retrying ───→ thread → listen(SOCK_AP)
Timing uses __rdtsc() calibrated against CLOCK_MONOTONIC over 10 ms.
rdtsc overhead is ~0.37 ns on this machine; clock_gettime(CLOCK_MONOTONIC)
costs ~25 ns — a 40% distortion at p50 RTT of 124 ns.
| Parameter | Value | Notes |
|---|---|---|
PING_CORE / PONG_CORE |
8 / 9 | Must be distinct physical cores. Verify with lscpu -e. |
SCHED_FIFO priority |
99 | Prevents preemption mid-RTT. |
mlockall |
MCL_CURRENT | MCL_FUTURE |
No page faults on hot path. |
isolcpus=8,9 |
kernel cmdline | Eliminates OS jitter. Cuts p99.9 to ~150 ns. |
CAPACITY |
1 << 16 (64 KB) |
Holds 512 in-flight 32-byte frames. Not the bottleneck. |
WARMUP |
10 000 | Fills caches and branch predictors before measurement. |