bench(streaming): mpsc vs crossbeam vs direct callback (refs #482) by userFRM · Pull Request #485 · userFRM/ThetaDataDx

userFRM · 2026-05-06T08:48:26Z

Summary

Adds a streaming_channels criterion bench (crates/thetadatadx/benches/streaming_channels.rs) measuring the per-event cost of the producer/consumer hand-off paid for every FPSS event delivered to a buffered FFI consumer. Five variants timed end-to-end (1 producer + 1 consumer thread, 100k events per sample, payload sized like the real FfiBufferedEvent — 488 B with the tagged union and two heap-owned tails):

std_mpsc_unbounded — current live shape: std::sync::mpsc::channel() + recv_timeout(100ms) poll loop, mirroring tdx_unified_next_event / tdx_fpss_next_event.
crossbeam_bounded_{256,1024,8192} — crossbeam_channel::bounded(N) lock-free SPSC at three capacity points covering the proposed backpressure range.
direct_callback — no channel; producer invokes an extern "C" fn(*const FfiBufferedEvent, *mut c_void) through a Box<dyn Fn> trampoline, modelling the C/C++ tier-1 path proposed in Audit std::sync::mpsc on FFI streaming hot path #482.

Payload mirrors ffi::streaming::FfiBufferedEvent field-for-field. crossbeam-channel is added to dev-dependencies only so the runtime dep graph is unchanged. The implementation switch ships in a follow-up PR after we read the numbers — this PR is bench harness alone.

Results

Variant	median ns/event	p99 ns/event	M events/sec
`std_mpsc_unbounded`	73.77	79.62	13.56
`crossbeam_bounded_256`	100.44	105.53	9.96
`crossbeam_bounded_1024`	86.39	88.90	11.58
`crossbeam_bounded_8192`	57.94	64.24	17.26
`direct_callback`	12.29	12.55	81.35

p50/p99 derived from criterion sample.json (10 samples per variant, divided by 100k events per iteration). Median taken from criterion's point-estimate. Throughput = 1e9 / median_ns_per_event.

Recommendation

Direct callback is ~6x faster than the best channel and ~6x faster than std::sync::mpsc — for C/C++ consumers, drop the queue (tier-1 path from #482). For Python / TS bindings where the GIL forces a queue, switch to crossbeam_channel::bounded(8192): it's ~22% faster than std::sync::mpsc median and tighter on tail latency (p99 64 ns vs 80 ns). The 256-capacity case is slower than std mpsc because backpressure stalls dominate at that bound; 1024 is roughly even with std mpsc; the crossover where lock-free wins lands between 1024 and 8192. Conclusion: the proposal in #482 is justified. Ship it, default capacity 8192.

Hardware / OS

uname -a: Linux thetagamma-systems 6.8.0-110-generic #110-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 15:09:20 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
CPU: Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz (8C/16T)
Toolchain: workspace rust-toolchain.toml; criterion 0.8.2; crossbeam-channel 0.5.15

Test plan

cargo fmt --all -- --check
cargo bench --bench streaming_channels --no-run
cargo clippy --workspace --all-targets -- -D warnings
Full criterion run completed locally; results above.

Add `streaming_channels` criterion bench measuring the per-event cost of the producer/consumer hand-off the FFI streaming surface pays for every FPSS event delivered to a buffered consumer. Five variants: - `std_mpsc_unbounded` — current live shape (`std::sync::mpsc::channel()` + `recv_timeout(100ms)` poll). - `crossbeam_bounded_{256,1024,8192}` — lock-free SPSC candidate at three capacity points covering the proposed backpressure range. - `direct_callback` — no channel; producer invokes an `extern "C" fn` through a `Box<dyn Fn>` trampoline, modelling the C/C++ tier-1 path. Payload mirrors `ffi::streaming::FfiBufferedEvent` field-for-field: `#[repr(C)]` tagged event with `TdxContract` embedded in every data variant plus two heap-owned tails (`Option<CString>` + `Option<Vec<u8>>`). Sizes match the generated `fpss_event_structs.rs` byte-for-byte (Event = 448 B, BufferedEvent = 488 B on x86_64). `crossbeam-channel` is added to dev-dependencies only so the runtime dep graph is unchanged. The implementation switch ships in a follow-up PR after we read the numbers.

userFRM merged commit a532f6a into main May 6, 2026
32 checks passed

userFRM deleted the bench/482-streaming-channels branch May 6, 2026 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(streaming): mpsc vs crossbeam vs direct callback (refs #482)#485

bench(streaming): mpsc vs crossbeam vs direct callback (refs #482)#485
userFRM merged 1 commit into
mainfrom
bench/482-streaming-channels

userFRM commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

userFRM commented May 6, 2026

Summary

Results

Recommendation

Hardware / OS

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant