perf(zenoh-sync): replace event_listener with parking_lot Mutex+Condvar#2513
Open
YuanYuYuan wants to merge 5 commits into
Open
perf(zenoh-sync): replace event_listener with parking_lot Mutex+Condvar#2513YuanYuYuan wants to merge 5 commits into
YuanYuYuan wants to merge 5 commits into
Conversation
9 tasks
The Waiter::wait() hot path in the transport pipeline created and dropped
an event_listener::EventListener on every batch send cycle. In perf
profiles of ros-z-pingpong this showed as 12.18% CPU in
drop_in_place<EventListener> on the tx-0 thread.
Replace EventInner { EventLib + AtomicU8 } with { Mutex<u8> + Condvar }.
This eliminates per-wakeup EventListener allocation and intrusive-list
register/deregister overhead. Semantics are preserved:
- Sticky notification: if notify() fires with no waiter, the next wait()
returns immediately (flag stored in the mutex-protected state).
- Error propagation: last Notifier drop sets ERR and wakes all waiters.
- wait_async: delegated to spawn_blocking (not on the hot path).
All existing zenoh-sync tests pass.
The original EventListener alloc+intrusive-list per wait() cost CPU on the transport hot path (12% in perf profiles). Previous fix attempt used spawn_blocking for wait_async(), which is not cancellation-safe: tokio::time::timeout() cancels the outer future but the blocking thread detaches and eventually consumes the next real notification, causing a hang at 64KB+ payloads (fragmented messages). New approach: - sync paths (wait/wait_deadline/wait_timeout): std Mutex<u8> + Condvar (zero allocation, no intrusive list) - async path (wait_async): tokio::sync::Notify with enable() pattern (cancellation-safe: dropped Notified futures return their permit) The enable()-before-state-check pattern ensures notifications are never missed between the state check and the .await.
…hroughput std::sync::Mutex uncontended lock/unlock costs ~90 ns; parking_lot::Mutex costs ~20 ns. The ev-fix design requires notify() to briefly acquire the condvar mutex on every call (lost-wakeup fence). With std that made the sticky (already-set) wait() path cost 179 ns vs event_listener's 31 ns. With parking_lot the sticky path measures 32 ns — matching event_listener — while the cross-thread wakeup path improves to 4.7 µs (from 5.1 µs with event_listener). No semantic change; parking_lot is already in the lockfile.
39d103a to
c4712ae
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2513 +/- ##
==========================================
+ Coverage 72.43% 72.48% +0.05%
==========================================
Files 390 390
Lines 63374 63367 -7
==========================================
+ Hits 45902 45931 +29
+ Misses 17472 17436 -36 ☔ View full report in Codecov by Sentry. |
Measures the sticky (hot) path and cross-thread wakeup path. Used to compare event_listener vs the Mutex+Condvar+parking_lot replacement introduced in the preceding commits.
Sort [[bench]] keys alphabetically (harness before name) to satisfy taplo reorder_keys=true. Clear the default peer listen endpoint in pub_config of test_session_from_cloned_config to prevent the OS from assigning port 38446 to pub_session's random listener, which collided with sub_session's explicit bind on that port.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace
event_listenerinzenoh-sync'sEventwith aparking_lotMutex + Condvar +tokio::sync::Notifytriple. The previous implementation allocated a newevent_listener::Listeneron everywait()call, triggering a lazy DWARF/backtrace init insideevent_listenerthat showed up as ~3% CPU overhead on the sender thread in profiling.Key Changes
commons/zenoh-sync/src/event.rs: replaceEvent + AtomicU8withAtomicU8 flag + parking_lot::Mutex<()> + parking_lot::Condvar + tokio::sync::Notify; async path usesNotify::notified(), sync path uses the Condvarcommons/zenoh-sync/benches/notify_wait.rs: criterion benchmark covering sticky (hot, same-thread re-check) and cross-thread wakeup pathsBenchmark Results
Flamegraph:
std::backtrace_rs::symbolize::gimli::Context::newat 3.12% on the sender thread before, completely absent after.Reproducing
Results land in
target/criterion/notify_wait/.Breaking Changes
None
🏷️ Label-Based Checklist
Based on the labels applied to this PR, please complete these additional requirements:
Labels:
enhancement✨ Enhancement Requirements
Since this PR enhances existing functionality:
Remember: Enhancements should not introduce new APIs or breaking changes.
Instructions:
- [ ]to- [x])This checklist updates automatically when labels change, but preserves your checked boxes.