Skip to content

bench(streaming): mpsc vs crossbeam vs direct callback (refs #482)#485

Merged
userFRM merged 1 commit into
mainfrom
bench/482-streaming-channels
May 6, 2026
Merged

bench(streaming): mpsc vs crossbeam vs direct callback (refs #482)#485
userFRM merged 1 commit into
mainfrom
bench/482-streaming-channels

Conversation

@userFRM

@userFRM userFRM commented May 6, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a streaming_channels criterion bench (crates/thetadatadx/benches/streaming_channels.rs) measuring the per-event cost of the producer/consumer hand-off paid for every FPSS event delivered to a buffered FFI consumer. Five variants timed end-to-end (1 producer + 1 consumer thread, 100k events per sample, payload sized like the real FfiBufferedEvent — 488 B with the tagged union and two heap-owned tails):

  1. std_mpsc_unbounded — current live shape: std::sync::mpsc::channel() + recv_timeout(100ms) poll loop, mirroring tdx_unified_next_event / tdx_fpss_next_event.
  2. crossbeam_bounded_{256,1024,8192}crossbeam_channel::bounded(N) lock-free SPSC at three capacity points covering the proposed backpressure range.
  3. direct_callback — no channel; producer invokes an extern "C" fn(*const FfiBufferedEvent, *mut c_void) through a Box<dyn Fn> trampoline, modelling the C/C++ tier-1 path proposed in Audit std::sync::mpsc on FFI streaming hot path #482.

Payload mirrors ffi::streaming::FfiBufferedEvent field-for-field. crossbeam-channel is added to dev-dependencies only so the runtime dep graph is unchanged. The implementation switch ships in a follow-up PR after we read the numbers — this PR is bench harness alone.

Results

Variant median ns/event p99 ns/event M events/sec
std_mpsc_unbounded 73.77 79.62 13.56
crossbeam_bounded_256 100.44 105.53 9.96
crossbeam_bounded_1024 86.39 88.90 11.58
crossbeam_bounded_8192 57.94 64.24 17.26
direct_callback 12.29 12.55 81.35

p50/p99 derived from criterion sample.json (10 samples per variant, divided by 100k events per iteration). Median taken from criterion's point-estimate. Throughput = 1e9 / median_ns_per_event.

Recommendation

Direct callback is ~6x faster than the best channel and ~6x faster than std::sync::mpsc — for C/C++ consumers, drop the queue (tier-1 path from #482). For Python / TS bindings where the GIL forces a queue, switch to crossbeam_channel::bounded(8192): it's ~22% faster than std::sync::mpsc median and tighter on tail latency (p99 64 ns vs 80 ns). The 256-capacity case is slower than std mpsc because backpressure stalls dominate at that bound; 1024 is roughly even with std mpsc; the crossover where lock-free wins lands between 1024 and 8192. Conclusion: the proposal in #482 is justified. Ship it, default capacity 8192.

Hardware / OS

  • uname -a: Linux thetagamma-systems 6.8.0-110-generic #110-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 15:09:20 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
  • CPU: Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz (8C/16T)
  • Toolchain: workspace rust-toolchain.toml; criterion 0.8.2; crossbeam-channel 0.5.15

Test plan

  • cargo fmt --all -- --check
  • cargo bench --bench streaming_channels --no-run
  • cargo clippy --workspace --all-targets -- -D warnings
  • Full criterion run completed locally; results above.

Add `streaming_channels` criterion bench measuring the per-event cost
of the producer/consumer hand-off the FFI streaming surface pays for
every FPSS event delivered to a buffered consumer. Five variants:

- `std_mpsc_unbounded` — current live shape (`std::sync::mpsc::channel()`
  + `recv_timeout(100ms)` poll).
- `crossbeam_bounded_{256,1024,8192}` — lock-free SPSC candidate at
  three capacity points covering the proposed backpressure range.
- `direct_callback` — no channel; producer invokes an `extern "C" fn`
  through a `Box<dyn Fn>` trampoline, modelling the C/C++ tier-1 path.

Payload mirrors `ffi::streaming::FfiBufferedEvent` field-for-field:
`#[repr(C)]` tagged event with `TdxContract` embedded in every data
variant plus two heap-owned tails (`Option<CString>` + `Option<Vec<u8>>`).
Sizes match the generated `fpss_event_structs.rs` byte-for-byte
(Event = 448 B, BufferedEvent = 488 B on x86_64).

`crossbeam-channel` is added to dev-dependencies only so the runtime
dep graph is unchanged. The implementation switch ships in a follow-up
PR after we read the numbers.
@userFRM userFRM merged commit a532f6a into main May 6, 2026
32 checks passed
@userFRM userFRM deleted the bench/482-streaming-channels branch May 6, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant