Skip to content

perf(zenoh-sync): replace event_listener with parking_lot Mutex+Condvar#2513

Open
YuanYuYuan wants to merge 5 commits into
eclipse-zenoh:mainfrom
ZettaScaleLabs:feat/ev-fix
Open

perf(zenoh-sync): replace event_listener with parking_lot Mutex+Condvar#2513
YuanYuYuan wants to merge 5 commits into
eclipse-zenoh:mainfrom
ZettaScaleLabs:feat/ev-fix

Conversation

@YuanYuYuan
Copy link
Copy Markdown
Contributor

@YuanYuYuan YuanYuYuan commented Mar 23, 2026

Summary

Replace event_listener in zenoh-sync's Event with a parking_lot Mutex + Condvar + tokio::sync::Notify triple. The previous implementation allocated a new event_listener::Listener on every wait() call, triggering a lazy DWARF/backtrace init inside event_listener that showed up as ~3% CPU overhead on the sender thread in profiling.

Key Changes

  • commons/zenoh-sync/src/event.rs: replace Event + AtomicU8 with AtomicU8 flag + parking_lot::Mutex<()> + parking_lot::Condvar + tokio::sync::Notify; async path uses Notify::notified(), sync path uses the Condvar
  • commons/zenoh-sync/benches/notify_wait.rs: criterion benchmark covering sticky (hot, same-thread re-check) and cross-thread wakeup paths

Benchmark Results

path before after delta
sticky (hot re-check) 31.7 ns 32.3 ns ~0%
cross-thread wakeup 4.67 µs 4.39 µs −8%

Flamegraph: std::backtrace_rs::symbolize::gimli::Context::new at 3.12% on the sender thread before, completely absent after.

Reproducing

# save baseline on main
git stash  # or checkout main
taskset -c 0,1 cargo bench -j4 -p zenoh-sync --bench notify_wait -- --save-baseline before

# compare on this branch
git stash pop  # or checkout feat/ev-fix
taskset -c 0,1 cargo bench -j4 -p zenoh-sync --bench notify_wait -- --baseline before

Results land in target/criterion/notify_wait/.

Breaking Changes

None


🏷️ Label-Based Checklist

Based on the labels applied to this PR, please complete these additional requirements:

Labels: enhancement

✨ Enhancement Requirements

Since this PR enhances existing functionality:

  • Enhancement scope documented - Clear description of what is being improved
  • Minimum necessary code - Implementation is as simple as possible, doesn't overcomplicate the system
  • Backwards compatible - Existing code/APIs still work unchanged
  • No new APIs added - Only improving existing functionality
  • Tests updated - Existing tests pass, new test cases added if needed
  • Performance improvement measured - If applicable, before/after metrics provided
  • Documentation updated - Existing docs updated to reflect improvements
  • User impact documented - How users benefit from this enhancement

Remember: Enhancements should not introduce new APIs or breaking changes.

Instructions:

  1. Check off items as you complete them (change - [ ] to - [x])
  2. The PR checklist CI will verify these are completed

This checklist updates automatically when labels change, but preserves your checked boxes.

The Waiter::wait() hot path in the transport pipeline created and dropped
an event_listener::EventListener on every batch send cycle. In perf
profiles of ros-z-pingpong this showed as 12.18% CPU in
drop_in_place<EventListener> on the tx-0 thread.

Replace EventInner { EventLib + AtomicU8 } with { Mutex<u8> + Condvar }.
This eliminates per-wakeup EventListener allocation and intrusive-list
register/deregister overhead. Semantics are preserved:
- Sticky notification: if notify() fires with no waiter, the next wait()
  returns immediately (flag stored in the mutex-protected state).
- Error propagation: last Notifier drop sets ERR and wakes all waiters.
- wait_async: delegated to spawn_blocking (not on the hot path).

All existing zenoh-sync tests pass.
The original EventListener alloc+intrusive-list per wait() cost CPU on
the transport hot path (12% in perf profiles).

Previous fix attempt used spawn_blocking for wait_async(), which is not
cancellation-safe: tokio::time::timeout() cancels the outer future but
the blocking thread detaches and eventually consumes the next real
notification, causing a hang at 64KB+ payloads (fragmented messages).

New approach:
- sync paths (wait/wait_deadline/wait_timeout): std Mutex<u8> + Condvar
  (zero allocation, no intrusive list)
- async path (wait_async): tokio::sync::Notify with enable() pattern
  (cancellation-safe: dropped Notified futures return their permit)

The enable()-before-state-check pattern ensures notifications are never
missed between the state check and the .await.
…hroughput

std::sync::Mutex uncontended lock/unlock costs ~90 ns; parking_lot::Mutex
costs ~20 ns. The ev-fix design requires notify() to briefly acquire the
condvar mutex on every call (lost-wakeup fence). With std that made the
sticky (already-set) wait() path cost 179 ns vs event_listener's 31 ns.

With parking_lot the sticky path measures 32 ns — matching event_listener —
while the cross-thread wakeup path improves to 4.7 µs (from 5.1 µs with
event_listener). No semantic change; parking_lot is already in the lockfile.
@YuanYuYuan YuanYuYuan added the enhancement Existing things could work better label Mar 23, 2026
@YuanYuYuan YuanYuYuan force-pushed the feat/ev-fix branch 3 times, most recently from 39d103a to c4712ae Compare March 23, 2026 12:42
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 92.40506% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.48%. Comparing base (1fab5d0) to head (9b3faca).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
commons/zenoh-sync/src/event.rs 92.40% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2513      +/-   ##
==========================================
+ Coverage   72.43%   72.48%   +0.05%     
==========================================
  Files         390      390              
  Lines       63374    63367       -7     
==========================================
+ Hits        45902    45931      +29     
+ Misses      17472    17436      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Measures the sticky (hot) path and cross-thread wakeup path.
Used to compare event_listener vs the Mutex+Condvar+parking_lot
replacement introduced in the preceding commits.
Sort [[bench]] keys alphabetically (harness before name) to satisfy
taplo reorder_keys=true.

Clear the default peer listen endpoint in pub_config of
test_session_from_cloned_config to prevent the OS from assigning
port 38446 to pub_session's random listener, which collided with
sub_session's explicit bind on that port.
@YuanYuYuan YuanYuYuan requested a review from yellowhatter March 24, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Existing things could work better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant