perf(runtime): use parking_lot for TCP stream registry#11065
Conversation
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Walkthrough
ChangesSync Mutex Migration and Stream Bookkeeping
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/runtime/src/pipeline/network/tcp/server.rs (1)
472-485: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winRun both cleanup closures inline —
RegisteredStreamalready executes the cleanup closure synchronously onDrop, sotokio::spawnonly defers registry removal and makes dropping outside a Tokio runtime panic. Lockcleanup_statedirectly in both closures.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/runtime/src/pipeline/network/tcp/server.rs` around lines 472 - 485, The cleanup logic in RegisteredStream’s with_cleanup callback is being deferred with tokio::spawn, which is unnecessary because the Drop path already runs synchronously and can panic outside a Tokio runtime. Update the cleanup closures in the tcp server code to lock cleanup_state directly and perform the registry removals inline, keeping the same subject_instance and instance_subjects updates without spawning a task.
🧹 Nitpick comments (1)
lib/runtime/src/pipeline/network/tcp/server.rs (1)
1972-2009: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake the registration half concurrent too.
This test currently awaits
server.register(options)inside the loop before spawning each client task, so it stresses concurrent call-home but not concurrent registration. Spawn the per-stream registration + call-home body together to cover the registry contention this PR is targeting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/runtime/src/pipeline/network/tcp/server.rs` around lines 1972 - 2009, The test only exercises concurrent call-home because `server.register(options)` is still awaited serially inside the loop. Update `test_concurrent_response_registration_and_call_home` so each stream’s registration and client setup run together in a spawned task, letting `server.register`, `registered_stream.into_parts`, and `TcpClient::create_response_stream` contend concurrently. Keep the same `STREAMS` coverage and assertions, but move the per-stream registration/body into the task so the registry path is stressed as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@lib/runtime/src/pipeline/network/tcp/server.rs`:
- Around line 472-485: The cleanup logic in RegisteredStream’s with_cleanup
callback is being deferred with tokio::spawn, which is unnecessary because the
Drop path already runs synchronously and can panic outside a Tokio runtime.
Update the cleanup closures in the tcp server code to lock cleanup_state
directly and perform the registry removals inline, keeping the same
subject_instance and instance_subjects updates without spawning a task.
---
Nitpick comments:
In `@lib/runtime/src/pipeline/network/tcp/server.rs`:
- Around line 1972-2009: The test only exercises concurrent call-home because
`server.register(options)` is still awaited serially inside the loop. Update
`test_concurrent_response_registration_and_call_home` so each stream’s
registration and client setup run together in a spawned task, letting
`server.register`, `registered_stream.into_parts`, and
`TcpClient::create_response_stream` contend concurrently. Keep the same
`STREAMS` coverage and assertions, but move the per-stream registration/body
into the task so the registry path is stressed as intended.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1b946797-6e02-4fbc-b53a-1def7ed0108d
📒 Files selected for processing (1)
lib/runtime/src/pipeline/network/tcp/server.rs
What changed
TcpStreamServer's shared registry state withparking_lot::Mutex.State, including request/response maps, instance associations, tombstones, and listener handle.This is deliberately a one-file change. It does not add telemetry or alter the public API, TCP wire format, routing, cancellation, tombstone, or cleanup behavior.
Why
The response registry's Tokio mutex serializes frontend registration with worker call-home lookup. An instrumented v1.1.1 experiment isolated that wait as nearly the entire request-plane
pre_sendsetup time at N=8/C=1024.The parking-lot prototype retained one
Stateand produced the same registry-latency result as the earlier DashMap approach:pre_sendA fresh repeat reproduced those timing changes. All four benchmark cells completed 50,000 requests with zero failures or 503s and exact OSL 32.
The experiment did not verify an end-to-end throughput gain. Parking-lot was -0.77% and -1.37% in the two paired comparisons (-1.06% by the two-run mean), while the identical Tokio baselines varied by 4.52%. This PR should therefore be evaluated as removing measured registry setup latency with a smaller state-preserving design, not as a demonstrated throughput optimization.
This is a narrower alternative to #11008 and avoids splitting the registry across DashMaps.
Validation
cargo fmt --allcargo test -p dynamo-runtime pipeline::network::tcp::server::tests -- --nocapture(30 passed)cargo check -p dynamo-runtime -p dynamo-llmcargo clippy -p dynamo-runtime --all-targets -- -D warningsgit diff --checkSummary by CodeRabbit