[serve][3/3] Expose choose_replica/dispatch on deployment handles#63255
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a two-step routing mechanism in Ray Serve, adding choose_replica and dispatch methods to the Router interface and DeploymentHandle. This allows for explicit replica selection and slot reservation prior to request execution, supporting both standard and streaming workflows. The implementation includes logic to bridge these calls across event loops in SingletonThreadRouter and provides validation to ensure selections are dispatched to the correct deployment. Feedback identifies a critical resource leak vulnerability in SingletonThreadRouter.choose_replica where cancellation during context entry could result in leaked request slots. It is recommended to wrap the context entry in a try...finally block to ensure proper cleanup even if the task is cancelled early.
82037f1 to
f1a3114
Compare
e4b95cb to
905f7df
Compare
choose_replica/dispatch on deployment handleschoose_replica/dispatch on deployment handles
905f7df to
14b4d44
Compare
8db5cab to
6afface
Compare
873ecc4 to
874103a
Compare
| self._asyncio_router._dispatch_to_marked_selection( | ||
| selection, request_meta, *request_args, **request_kwargs | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Race causes double-fire of on_request_completed callback
Medium Severity
CurrentLoopRouter.dispatch and SingletonThreadRouter.dispatch schedule _dispatch_to_marked_selection as a task/future that hasn't executed yet when they return. If the choose_replica context exits before that task completes, the AsyncioRouter.choose_replica finally block sees _dispatched=True but _completion_callback_registered=False, and fires on_request_completed. Later, when the dispatch task actually runs and registers its done-callback, on_request_completed fires a second time upon request completion — violating the "exactly once" invariant stated in the comment.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 874103a. Configure here.
Co-Authored-By: machichima <nary12321@gmail.com> Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
874103a to
e95f4a6
Compare
kouroshHakha
left a comment
There was a problem hiding this comment.
Overall the design is sound — the slot-reservation contract, the _mark_dispatched synchronous flag to prevent the choose_replica finally from racing the dispatch task, and the _completion_callback_registered exactly-once invariant are all well-thought-out. The CurrentLoopRouter wrapper is clean. One critical resource safety issue in the SingletonThreadRouter bridge needs fixing before merge; everything else is minor.
The open Gemini bot comment (on SingletonThreadRouter.choose_replica) is still unaddressed — I've investigated it and believe it's valid; details in the inline comment.
Note
This review was co-written with AI assistance (Claude Code).
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
…text() completes Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3a4056c. Configure here.
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
Before we merge, I'd like to understand the overhead of using The interesting case is Concretely, could you add a async with self._handle.choose_replica() as sel:
await self._handle.dispatch(sel)and run both modes ( |
|
Going to follow up nightly benchmarks with #63293. |
kouroshHakha
left a comment
There was a problem hiding this comment.
ok so we cannot just replace .remote() with choose + dispatch.
…ray-project#63255) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: machichima <nary12321@gmail.com>
…ray-project#63255) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: machichima <nary12321@gmail.com> Signed-off-by: anindyam1969 <amukherjee@kinetica.com>
…ray-project#63255) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: machichima <nary12321@gmail.com>
…ray-project#63255) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: machichima <nary12321@gmail.com> Signed-off-by: Alexandr Plashchinsky <alexandr.plashchinsky@alexandrplashchinsky-H765G66H9V.local>
) ## Description Following up on #63255 (comment), we'd like to show the delta between `choose_replica + dispatch` and `remote` in our DB dashboards. **Release test results** -- latencies in ms (https://buildkite.com/ray-project/release/builds/92517/canvas?sid=019e1a3c-642f-4a42-867b-2818577272b6&tab=output) | Percentile | `remote` | `choose_replica + dispatch` | Delta | |---|---|---|---| | p50 | 1.040 | 1.784 | +0.74 | | p90 | 1.149 | 1.922 | +0.77 | | p95 | 1.199 | 1.970 | +0.77 | | p99 | 1.437 | 2.137 | +0.70 | ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>


Description
Plumb
choose_replicaanddispatchthroughDeploymentHandle, not just the underlyingAsyncioRouterwhich #63254 ends at.Deployment handle
DeploymentHandle.choose_replica()/dispatch(selection, ...)validates that the selection belongs to this deployment.SingletonThreadRouter / CurrentLoopRouter
SingletonThreadRouter.choose_replica: bridges the async context manager onto the router thread viarun_coroutine_threadsafe, so all state mutations happen on the singleton router loop.SingletonThreadRouter.dispatchandCurrentLoopRouter.dispatch: both callselection._mark_dispatched()synchronously before scheduling the dispatch coroutine, so thechoose_replica__aexit__cleanup can't race with the dispatch task. (The race would otherwise double-decrement the queue-length cache onCurrentLoopRouter, where__aexit__runs synchronously without yielding.)CurrentLoopRouter.choose_replica: thin async-with passthrough since it shares an event loop with the caller.Test-only router
LocalRouterplaceholder inlocal_testing_mode:choose_replica/dispatchraiseNotImplementedErrorbecause the local-testing path invokes user callables directly and doesn't exercise router selection.Benchmark
The overhead of
choose_replica() + dispatch()overremote()is constant. This is consistent with the one extrarun_coroutine_threadsaferound-trip:remotedoes one thread crossing,choose_dispatchdoes two (one for the__aenter__bridge, one for the_dispatch_to_marked_selectiontask). Replica count doesn't change the delta, as the overhead is in the bridge, not routing.Related issues
RFC: #59792
Original PR: #60865
Next PR: Metric changes #62356
Additional information