bench(streaming): mpsc vs crossbeam vs direct callback (refs #482)#485
Merged
Conversation
Add `streaming_channels` criterion bench measuring the per-event cost
of the producer/consumer hand-off the FFI streaming surface pays for
every FPSS event delivered to a buffered consumer. Five variants:
- `std_mpsc_unbounded` — current live shape (`std::sync::mpsc::channel()`
+ `recv_timeout(100ms)` poll).
- `crossbeam_bounded_{256,1024,8192}` — lock-free SPSC candidate at
three capacity points covering the proposed backpressure range.
- `direct_callback` — no channel; producer invokes an `extern "C" fn`
through a `Box<dyn Fn>` trampoline, modelling the C/C++ tier-1 path.
Payload mirrors `ffi::streaming::FfiBufferedEvent` field-for-field:
`#[repr(C)]` tagged event with `TdxContract` embedded in every data
variant plus two heap-owned tails (`Option<CString>` + `Option<Vec<u8>>`).
Sizes match the generated `fpss_event_structs.rs` byte-for-byte
(Event = 448 B, BufferedEvent = 488 B on x86_64).
`crossbeam-channel` is added to dev-dependencies only so the runtime
dep graph is unchanged. The implementation switch ships in a follow-up
PR after we read the numbers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
streaming_channelscriterion bench (crates/thetadatadx/benches/streaming_channels.rs) measuring the per-event cost of the producer/consumer hand-off paid for every FPSS event delivered to a buffered FFI consumer. Five variants timed end-to-end (1 producer + 1 consumer thread, 100k events per sample, payload sized like the realFfiBufferedEvent— 488 B with the tagged union and two heap-owned tails):std_mpsc_unbounded— current live shape:std::sync::mpsc::channel()+recv_timeout(100ms)poll loop, mirroringtdx_unified_next_event/tdx_fpss_next_event.crossbeam_bounded_{256,1024,8192}—crossbeam_channel::bounded(N)lock-free SPSC at three capacity points covering the proposed backpressure range.direct_callback— no channel; producer invokes anextern "C" fn(*const FfiBufferedEvent, *mut c_void)through aBox<dyn Fn>trampoline, modelling the C/C++ tier-1 path proposed in Audit std::sync::mpsc on FFI streaming hot path #482.Payload mirrors
ffi::streaming::FfiBufferedEventfield-for-field.crossbeam-channelis added to dev-dependencies only so the runtime dep graph is unchanged. The implementation switch ships in a follow-up PR after we read the numbers — this PR is bench harness alone.Results
std_mpsc_unboundedcrossbeam_bounded_256crossbeam_bounded_1024crossbeam_bounded_8192direct_callbackp50/p99 derived from criterion
sample.json(10 samples per variant, divided by 100k events per iteration). Median taken from criterion's point-estimate. Throughput =1e9 / median_ns_per_event.Recommendation
Direct callback is ~6x faster than the best channel and ~6x faster than
std::sync::mpsc— for C/C++ consumers, drop the queue (tier-1 path from #482). For Python / TS bindings where the GIL forces a queue, switch tocrossbeam_channel::bounded(8192): it's ~22% faster thanstd::sync::mpscmedian and tighter on tail latency (p99 64 ns vs 80 ns). The 256-capacity case is slower than std mpsc because backpressure stalls dominate at that bound; 1024 is roughly even with std mpsc; the crossover where lock-free wins lands between 1024 and 8192. Conclusion: the proposal in #482 is justified. Ship it, default capacity 8192.Hardware / OS
uname -a:Linux thetagamma-systems 6.8.0-110-generic #110-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 15:09:20 UTC 2026 x86_64 x86_64 x86_64 GNU/LinuxIntel(R) Core(TM) i7-10700KF CPU @ 3.80GHz(8C/16T)rust-toolchain.toml; criterion 0.8.2; crossbeam-channel 0.5.15Test plan
cargo fmt --all -- --checkcargo bench --bench streaming_channels --no-runcargo clippy --workspace --all-targets -- -D warnings