Skip to content

[Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments#63179

Open
ryanaoleary wants to merge 32 commits into
ray-project:masterfrom
ryanaoleary:e-serve-accelerator-config
Open

[Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments#63179
ryanaoleary wants to merge 32 commits into
ray-project:masterfrom
ryanaoleary:e-serve-accelerator-config

Conversation

@ryanaoleary
Copy link
Copy Markdown
Contributor

@ryanaoleary ryanaoleary commented May 7, 2026

Description

This PR introduces a structured AcceleratorConfig (starting with TPUAcceleratorConfig) for Ray Serve deployments to support advanced accelerator provisioning. Deployments with accelerator_config set use a per-replica PG creation path that dispatches to slice_placement_group for TPU. Gang scheduling is bypassed for these deployments - SlicePlacementGroup is itself a gang-scheduling primitive, so layering Gang PG on top would solve the same problem twice.

Specific Changes:

  1. API & Configuration
  • Added AcceleratorConfig and TPUAcceleratorConfig Pydantic models defining hardware requirements (topology, version, chips per VM).
  • Added bytes accelerator_config to serve.proto and threaded through ReplicaSchedulingRequest and CreatePlacementGroupRequest.
  1. Per-Replica PG Creation
  • Added ReplicaPlacementGroup wrapper delegating shutdown() and release_head_pgs() to the underlying TPU-specific PG.
  • Added _create_replica_placement_group as the internal scheduler entry point; dispatches on accelerator_config and wraps the result. _default_create_placement_group's public signature is unchanged, so external create_placement_group_fn_override users keep working.
  • Deployment-state cleanup calls ReplicaPlacementGroup.shutdown() on teardown and release_reservation_holders() after worker PG readiness.

Edit: I scoped this PR way down to not include unrelated Gang PG changes - which can be in a separate PR.

Related issues

#57137

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@ryanaoleary ryanaoleary requested review from a team as code owners May 7, 2026 02:53
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a structured AcceleratorConfig for Ray Serve to support TPU slice reservations. It implements a new placement group management layer (_ReplicaPlacementGroup) that handles accelerator-specific lifecycle tasks, such as releasing head placement groups after scheduling. The changes span the Serve controller, deployment scheduler, and LLM engine configurations to enable per-host TPU bundle allocation. Review feedback highlights a critical runtime error where an invalid label_selector is passed to actor options, identifies missing logic for passing user-defined bundle label selectors, and notes a documentation mismatch in the TPU utility classes.

Comment thread python/ray/llm/_internal/serve/core/configs/accelerators.py
Comment thread python/ray/serve/_private/default_impl.py Outdated
Comment thread python/ray/serve/_private/default_impl.py Outdated
Comment thread python/ray/util/tpu.py Outdated
Comment thread python/ray/llm/_internal/serve/core/configs/accelerators.py Outdated
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from 6885e61 to aede10e Compare May 7, 2026 02:56
Comment thread python/ray/serve/_private/deployment_scheduler.py Outdated
Comment thread src/ray/protobuf/serve.proto
Comment thread python/ray/serve/tests/unit/test_accelerator_config.py Outdated
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from aede10e to e737494 Compare May 7, 2026 03:29
Comment thread python/ray/serve/_private/deployment_scheduler.py Outdated
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from e737494 to c3cdf41 Compare May 7, 2026 03:36
Comment thread python/ray/serve/_private/default_impl.py Outdated
@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels May 7, 2026
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from c3cdf41 to 0f0088a Compare May 7, 2026 08:07
Comment thread python/ray/serve/_private/deployment_scheduler.py Outdated
Comment thread python/ray/serve/_private/deployment_scheduler.py Outdated
Comment thread python/ray/serve/_private/deployment_scheduler.py
Comment thread python/ray/serve/_private/config.py Outdated
Comment thread python/ray/serve/_private/deployment_state.py Outdated
@abrarsheikh
Copy link
Copy Markdown
Contributor

please break up the PR, atleast into serve parts first then the llm changes, would make it easier to review

@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from d8ff7d5 to 9d52483 Compare May 7, 2026 23:16
Comment thread python/ray/serve/_private/deployment_scheduler.py
Comment thread python/ray/serve/_private/default_impl.py Outdated
@ryanaoleary ryanaoleary changed the title [Serve] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments [Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments May 7, 2026
… tests

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…t `bundle_label_selector`

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from 9d52483 to 5e79faa Compare May 8, 2026 02:54
Comment thread python/ray/serve/_private/common.py Outdated
Comment thread python/ray/serve/_private/deployment_scheduler.py Outdated
@ryanaoleary
Copy link
Copy Markdown
Contributor Author

ryanaoleary commented May 8, 2026

please break up the PR, atleast into serve parts first then the llm changes, would make it easier to review

Sounds good I'm going to make this PR the Serve changes (although it includes changes from #63171 for now for tests to work, but this should merge first). #63216 will include the changes from this PR so that integration tests work, and the LLM specific changes.

@ryanaoleary ryanaoleary changed the title [Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments [Serve][1/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments May 8, 2026
@ryanaoleary ryanaoleary force-pushed the e-serve-accelerator-config branch from 5e79faa to c547e2a Compare May 8, 2026 11:55
Comment thread python/ray/serve/_private/deployment_state.py Outdated
Comment thread python/ray/serve/_private/default_impl.py Outdated
@ryanaoleary
Copy link
Copy Markdown
Contributor Author

will fix merge conflicts once: #63177 is merged

@ryanaoleary
Copy link
Copy Markdown
Contributor Author

tpu.py change in this PR is from a separate one that is already approved and just needs merged, this PR won't include that change

Copy link
Copy Markdown
Contributor

@jeffreywang-anyscale jeffreywang-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass -- could we break the PR down into the followings?

  • Config / frontend only: ensure that AcceleratorConfig is plumbed through @serve.deployment, Deployment.options, DeploymentSchema (declarative YAMLs), protobuf surfaces. I think we're missing some plumbings in this PR.
  • Scheduler and state reconciliation

Tests haven't been reviewed.

Comment thread python/ray/serve/_private/common.py Outdated
Comment thread python/ray/serve/api.py
Comment thread python/ray/serve/api.py Outdated
Comment thread python/ray/serve/config.py Outdated
Comment thread python/ray/serve/_private/config.py
Comment thread python/ray/serve/api.py Outdated
Comment thread python/ray/serve/_private/default_impl.py Outdated
Comment thread python/ray/serve/_private/common.py Outdated
Comment thread python/ray/serve/_private/default_impl.py Outdated
Comment thread python/ray/serve/_private/deployment_state.py Outdated
@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

Some other questions:

Lifecycle of head_pg and worker_pg

  • head_pg is claimed first to retrieve the slice name.
  • A worker_pg with num_hosts bundles is created using that slice name to claim the entire slice.
  • Once worker_pg has been created, the head_pg is released.

Question

  • At that point, is it possible for another replica to successfully claim head_pg, retrieve the same slice name, and then attempt to create its own worker_pg, only to discover that the slice is still occupied by the existing worker_pg?
  • What does the controller do in this case?
  • If the worker_pg's 0th bundle also ask for head_pg's bundle, could we avoid this race?
  • Taking a step back, can we tolerate this race or do we want to avoid it?

ryanaoleary and others added 2 commits May 21, 2026 00:08
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Comment thread python/ray/serve/_private/default_impl.py Outdated
Comment thread python/ray/serve/_private/config.py
…strings, change from Dev API to PublicAPI, and fix other comments

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Comment thread python/ray/serve/_private/common.py Outdated
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Comment thread python/ray/serve/_private/deployment_scheduler.py
@ryanaoleary
Copy link
Copy Markdown
Contributor Author

ryanaoleary commented May 21, 2026

Some other questions:

Lifecycle of head_pg and worker_pg

  • head_pg is claimed first to retrieve the slice name.
  • A worker_pg with num_hosts bundles is created using that slice name to claim the entire slice.
  • Once worker_pg has been created, the head_pg is released.

Question

  • At that point, is it possible for another replica to successfully claim head_pg, retrieve the same slice name, and then attempt to create its own worker_pg, only to discover that the slice is still occupied by the existing worker_pg?

Yeah that's possible and is a valid issue, the current behavior would allow a slice PG call to discover a seemingly available TPU head, attempt to reserve it, and leave the worker PG hanging indefinitely.

  • What does the controller do in this case?

The controller would just timeout waiting for the PG to become ready.

  • If the worker_pg's 0th bundle also ask for head_pg's bundle, could we avoid this race?

This wouldn't work because the two PGs would be in contention for the same resource, if we release the head_pg first we risk a race with another slice claiming it.

  • Taking a step back, can we tolerate this race or do we want to avoid it?

Yeah I think we should fix this, the simplest solution is to just go with what SlicePlacementGroup currently does by deafult - continue to hold the head_pg until shutdown is called, and then release both PGs at the same time. I'll refactor the logic so that we aren't releasing the former PG early. Since they're both being managed by the ReplicaPlacementGroup abstraction this shouldn't be an issue / cause more complexity in the code.

Should be fixed with 26ae3e2

… change default to SPREAD strategy

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 26ae3e2. Configure here.

Comment thread python/ray/serve/_private/placement_group_utils.py Outdated
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary
Copy link
Copy Markdown
Contributor Author

addressed outstanding design related comments / bugs, will work on splitting this PR into two smaller ones

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary
Copy link
Copy Markdown
Contributor Author

Opened up #63581 to address #63179 (review), adding just the AcceleratorConfig, related classes, and the plumbing through the Serve proto and deployment, etc. That should be merged first and then this PR is just the scheduler change. cc: @jeffreywang-anyscale

@ryanaoleary ryanaoleary changed the title [Serve][1/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments [Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments May 21, 2026
@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

Opened up #63581 to address #63179 (review), adding just the AcceleratorConfig, related classes, and the plumbing through the Serve proto and deployment, etc. That should be merged first and then this PR is just the scheduler change. cc: @jeffreywang-anyscale

Nice thank you! Taking a look now.

Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants