Skip to content

perf(runtime): use parking_lot for TCP stream registry#11065

Merged
jthomson04 merged 1 commit into
mainfrom
jthomson04/parking-lot-response-registry
Jun 30, 2026
Merged

perf(runtime): use parking_lot for TCP stream registry#11065
jthomson04 merged 1 commit into
mainfrom
jthomson04/parking-lot-response-registry

Conversation

@jthomson04

@jthomson04 jthomson04 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What changed

  • Replace the Tokio mutex around TcpStreamServer's shared registry state with parking_lot::Mutex.
  • Keep the existing single State, including request/response maps, instance associations, tombstones, and listener handle.
  • Bound registration and call-home map operations in private synchronous helpers so an async caller cannot await while holding the guard.
  • Prepare stream metadata and cleanup state outside the critical section, and format unknown-subject errors after releasing it.
  • Add a 128-stream concurrent registration/call-home test with unique payload validation and a timeout.

This is deliberately a one-file change. It does not add telemetry or alter the public API, TCP wire format, routing, cancellation, tombstone, or cleanup behavior.

Why

The response registry's Tokio mutex serializes frontend registration with worker call-home lookup. An instrumented v1.1.1 experiment isolated that wait as nearly the entire request-plane pre_send setup time at N=8/C=1024.

The parking-lot prototype retained one State and produced the same registry-latency result as the earlier DashMap approach:

Timing Tokio avg/p99 Parking-lot avg/p99
request-plane pre_send 14.24/68.42 ms 0.05/0.40 ms
response registration 14.208/68.264 ms 0.011/0.049 ms
registry acquisition 14.195/68.183 ms 0.00019/0.010 ms
call-home lookup acquisition 13.736/49.679 ms 0.00020/0.010 ms

A fresh repeat reproduced those timing changes. All four benchmark cells completed 50,000 requests with zero failures or 503s and exact OSL 32.

The experiment did not verify an end-to-end throughput gain. Parking-lot was -0.77% and -1.37% in the two paired comparisons (-1.06% by the two-run mean), while the identical Tokio baselines varied by 4.52%. This PR should therefore be evaluated as removing measured registry setup latency with a smaller state-preserving design, not as a demonstrated throughput optimization.

This is a narrower alternative to #11008 and avoids splitting the registry across DashMaps.

Validation

  • cargo fmt --all
  • cargo test -p dynamo-runtime pipeline::network::tcp::server::tests -- --nocapture (30 passed)
  • cargo check -p dynamo-runtime -p dynamo-llm
  • cargo clippy -p dynamo-runtime --all-targets -- -D warnings
  • git diff --check

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability when handling concurrent TCP stream activity, reducing the chance of connection or stream registration issues under load.
    • Stream cleanup and cancellation now behave more consistently, helping prevent stale or missed stream state.
  • Tests
    • Added broader concurrency coverage to validate TCP stream handling in parallel scenarios.

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
@github-actions github-actions Bot added the perf label Jun 29, 2026
@jthomson04 jthomson04 marked this pull request as ready for review June 29, 2026 22:35
@jthomson04 jthomson04 requested a review from a team as a code owner June 29, 2026 22:35

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

TcpStreamServer switches its internal state lock from tokio::sync::Mutex (async) to parking_lot::Mutex (sync). New insert_request_stream/insert_response_stream and take_request_stream/take_response_stream helpers centralize pending stream bookkeeping. All call sites, RAII cleanup closures, and tests are updated. A new concurrent integration test is added.

Changes

Sync Mutex Migration and Stream Bookkeeping

Layer / File(s) Summary
Import wiring and stream management methods
lib/runtime/src/pipeline/network/tcp/server.rs
Replaces tokio::sync::Mutex with parking_lot::Mutex and updates associate_instance, cancel_recv_stream, cancel_send_stream, cancel_instance_streams, clear_instance_tombstone, and start to use synchronous lock acquisition; introduces insert_request_stream, insert_response_stream, take_request_stream, and take_response_stream helpers on State.
register() and process_*_stream updates
lib/runtime/src/pipeline/network/tcp/server.rs
register() uses the new insertion helpers for tx/rx subjects and rewrites RAII cleanup closures to acquire the lock synchronously; process_request_stream and process_response_stream replace inline async-locked mutations with calls to take_request_stream/take_response_stream.
Test updates and new concurrency test
lib/runtime/src/pipeline/network/tcp/server.rs
All existing tests replace .lock().await with .lock() for state inspection; a new multi-thread tokio test concurrently registers many response streams, dials them via TcpClient, sends prologue and payloads, and asserts correct delivery inside a timeout guard.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is detailed, but it does not follow the required template sections and omits the mandatory Related Issues entry. Add the missing template sections: Overview, Details, Where should the reviewer start?, and the required Related Issues block with an issue link or confirmation of none.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: switching the TCP stream registry mutex to parking_lot for runtime performance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/runtime/src/pipeline/network/tcp/server.rs (1)

472-485: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Run both cleanup closures inlineRegisteredStream already executes the cleanup closure synchronously on Drop, so tokio::spawn only defers registry removal and makes dropping outside a Tokio runtime panic. Lock cleanup_state directly in both closures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/runtime/src/pipeline/network/tcp/server.rs` around lines 472 - 485, The
cleanup logic in RegisteredStream’s with_cleanup callback is being deferred with
tokio::spawn, which is unnecessary because the Drop path already runs
synchronously and can panic outside a Tokio runtime. Update the cleanup closures
in the tcp server code to lock cleanup_state directly and perform the registry
removals inline, keeping the same subject_instance and instance_subjects updates
without spawning a task.
🧹 Nitpick comments (1)
lib/runtime/src/pipeline/network/tcp/server.rs (1)

1972-2009: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the registration half concurrent too.

This test currently awaits server.register(options) inside the loop before spawning each client task, so it stresses concurrent call-home but not concurrent registration. Spawn the per-stream registration + call-home body together to cover the registry contention this PR is targeting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/runtime/src/pipeline/network/tcp/server.rs` around lines 1972 - 2009, The
test only exercises concurrent call-home because `server.register(options)` is
still awaited serially inside the loop. Update
`test_concurrent_response_registration_and_call_home` so each stream’s
registration and client setup run together in a spawned task, letting
`server.register`, `registered_stream.into_parts`, and
`TcpClient::create_response_stream` contend concurrently. Keep the same
`STREAMS` coverage and assertions, but move the per-stream registration/body
into the task so the registry path is stressed as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/runtime/src/pipeline/network/tcp/server.rs`:
- Around line 472-485: The cleanup logic in RegisteredStream’s with_cleanup
callback is being deferred with tokio::spawn, which is unnecessary because the
Drop path already runs synchronously and can panic outside a Tokio runtime.
Update the cleanup closures in the tcp server code to lock cleanup_state
directly and perform the registry removals inline, keeping the same
subject_instance and instance_subjects updates without spawning a task.

---

Nitpick comments:
In `@lib/runtime/src/pipeline/network/tcp/server.rs`:
- Around line 1972-2009: The test only exercises concurrent call-home because
`server.register(options)` is still awaited serially inside the loop. Update
`test_concurrent_response_registration_and_call_home` so each stream’s
registration and client setup run together in a spawned task, letting
`server.register`, `registered_stream.into_parts`, and
`TcpClient::create_response_stream` contend concurrently. Keep the same
`STREAMS` coverage and assertions, but move the per-stream registration/body
into the task so the registry path is stressed as intended.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1b946797-6e02-4fbc-b53a-1def7ed0108d

📥 Commits

Reviewing files that changed from the base of the PR and between 0adf974 and bc9c536.

📒 Files selected for processing (1)
  • lib/runtime/src/pipeline/network/tcp/server.rs

@jthomson04 jthomson04 merged commit 97b8acf into main Jun 30, 2026
102 checks passed
@jthomson04 jthomson04 deleted the jthomson04/parking-lot-response-registry branch June 30, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants