Skip to content

[Serve] decouple routing primitives#60865

Closed
machichima wants to merge 63 commits into
ray-project:masterfrom
machichima:59792-serve-decouple-routing-primitives
Closed

[Serve] decouple routing primitives#60865
machichima wants to merge 63 commits into
ray-project:masterfrom
machichima:59792-serve-decouple-routing-primitives

Conversation

@machichima
Copy link
Copy Markdown
Contributor

@machichima machichima commented Feb 9, 2026

Description

Based on the RFC #59792, to better support Parallel PD pattern for prefill-decode disaggregation, we can add two new primitives to Ray Serve's DeploymentHandle API: choose_replica() (an async context manager) and dispatch()

Main changes

Public API

python/ray/serve/handle.py

  • Implemented DeploymentHandle.choose_replica(): Returns async context manager for replica selection
  • Implemented DeploymentHandle.dispatch(): Dispatch request to selected replica
    • Added deployment ID validation to prevent dispatching to wrong deployment

python/ray/serve/exceptions.py

  • Added ReplicaUnavailableError: Raised when selected replica is no longer available

Core Router Logic

python/ray/serve/_private/router.py

  • Added AsyncioRouter.choose_replica(): Context manager to select and reserve a replica slot
  • Added AsyncioRouter.dispatch(): Send request to a previously selected replica
  • Add choose_replica() and dispatch() to SingletonThreadRouter and CurrentLoopRouter

python/ray/serve/_private/request_router/request_router.py

  • Added RequestRouter.on_replica_result_finished(): Decrement queue length cache when reserved slot is released
  • Updated replica selection logic to account for reserved slots when checking availability:

python/ray/serve/_private/request_router/replica_wrapper.py

  • Added slot reservation tracking: _reserved_slots set to track active reservations
  • Added reserve_slot() and release_slot() to reserve and release a slot and return unique token

Tests

python/ray/serve/tests/test_handle_1.py (integration test based on the examples showed in #59792)

  • Added test_choose_replica_and_dispatch_single(): Tests basic single selection pattern
  • Added test_choose_replica_and_dispatch_parallel(): Tests parallel selection pattern (PD proxy use case) using AsyncExitStack

python/ray/serve/tests/unit/test_router.py (unit test)

  • Add tests under TestChooseReplica for checking AsyncioRouter's choose_replica() and dispatch() behavior

Related issues

Related to #59792

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new primitives to decouple replica selection from request dispatching in Ray Serve's routing logic. It adds a new ReplicaSelection data class and choose_replica and dispatch methods to the AsyncioRouter. This change enables a two-phase process where a replica can be chosen and a slot reserved before the request is actually sent.

While this is a good step towards more flexible routing, I've found a few critical issues in the current implementation that will cause runtime errors. Specifically, there are calls to attributes and methods that don't exist (_deployment_handle on AsyncioRouter, and reserve_slot, release_slot, send_request_with_slot on RunningReplica). Additionally, there's a logic error in checking for replica availability that would cause all dispatches to fail. Please see my detailed comments for suggestions on how to fix these issues.

Comment thread python/ray/serve/_private/router.py Outdated
Comment thread python/ray/serve/_private/router.py Outdated
Comment thread python/ray/serve/_private/router.py Outdated
machichima and others added 15 commits February 14, 2026 20:30
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
@machichima machichima marked this pull request as ready for review February 17, 2026 12:28
@machichima machichima requested a review from a team as a code owner February 17, 2026 12:28
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
@ray-gardener ray-gardener Bot added the community-contribution Contributed by the community label Feb 17, 2026
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Comment thread python/ray/serve/_private/router.py Outdated
Comment thread python/ray/serve/handle.py
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 21, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 21, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 21, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 21, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
…decouple-routing-primitives

Signed-off-by: machichima <nary12321@gmail.com>
Comment thread python/ray/serve/_private/router.py Outdated
Comment thread python/ray/serve/_private/request_router/request_router.py
…called

Signed-off-by: machichima <nary12321@gmail.com>
Comment thread python/ray/serve/_private/router.py
Comment thread python/ray/serve/_private/router.py
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 28, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 28, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 29, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request Apr 30, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request May 1, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request May 1, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request May 1, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/ray that referenced this pull request May 1, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
kouroshHakha pushed a commit to eicherseiji/ray that referenced this pull request May 7, 2026
This method is consumed by the ingress routing logic in this PR, not by
the base substrate PR. Renamed to private with TODO to migrate to
DeploymentHandle.choose_replica() once ray-project#60865 lands.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Comment thread python/ray/serve/_private/router.py
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

Going to break this PR into 2-3 PRs.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit c9509df. Configure here.

self.request_router.on_new_queue_len_info(
replica.replica_id, queue_info
)
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slot leak when CancelledError interrupts dispatch cleanup

Medium Severity

When _dispatch_to_marked_selection catches asyncio.CancelledError via except BaseException, the subsequent await selection._release_slot(force=True) can itself raise CancelledError (since the task is still cancelled). This prevents the slot from being released. The choose_replica finally block won't help because _dispatched is already True, causing _release_slot() without force to return None. The reserved semaphore slot on the replica leaks until the actor restarts.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c9509df. Configure here.

self._release_slot_if_still_reserved(selection)
)
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary remote call on every dispatch completion

Medium Severity

The _release_slot_if_still_reserved done callback fires on every completed dispatch, unconditionally calling _release_slot(force=True) which always issues a remote actor call to release_slot. In the normal happy path (the vast majority of requests), _start_request has already consumed the token, so this remote call is a guaranteed no-op that adds latency and actor mailbox pressure. For the PD disaggregation use case (latency-sensitive), this adds overhead proportional to request throughput.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c9509df. Configure here.

@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants