[LLM][3/N] Update Ray LLM to consume `TPUAcceleratorConfig` from Serve#63216

Open

ryanaoleary wants to merge 21 commits into

ray-project:masterfrom

ryanaoleary:e-llm-accelerator-config

Contributor

ryanaoleary commented May 8, 2026 •

edited

Loading

Description

This PR enables Ray LLM to use the new TPUAcceleratorConfig from Serve implemented in #63179, replacing the deferred PG logic for scheduling workers. This resolves a critical bug where deferred placement groups left LLMServer replicas scheduled without accelerator constraints.

Instead of deferring PG creation, we now pass the hardware topology directly to Serve. This ensures replicas are correctly constrained, allowing Serve to take ownership of the placement group lifecycle and successfully clean up temporary head placement groups upon readiness.

Key Changes:

Updated LLMServer.get_deployment_options to translate TPUConfig into Serve TPUAcceleratorConfig` and inject it into the Serve decorator options.
Refactored multi-host TPU tests to pass options via decorator, and added robust assertions verifying successful head-PG teardown.

This PR includes the changes from #63177 and #63179 so that the integration tests work. This PR only modifies the following files, the rest are from the PR this is rebased on:

python/ray/llm/_internal/serve/core/configs/accelerators.py
python/ray/llm/_internal/serve/core/server/llm_server.py
python/ray/llm/tests/serve/cpu/deployments/conftest.py
python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_engine_tpu.py
python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_server.py
python/ray/serve/tests/test_accelerator_config.py

Related issues

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

ryanaoleary added 2 commits

May 8, 2026 02:44


          Change default bundles constructed for TPU in LLM to per-host and fix…

f98d5af

… tests

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          Improve lifecycle handling of SlicePlacementGroup and support explici…

6a1511d

…t `bundle_label_selector`

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary requested review from a team as code owners

May 8, 2026 03:00

gemini-code-assist Bot reviewed

View reviewed changes

Contributor

gemini-code-assist Bot left a comment

Code Review

This pull request introduces structured accelerator configurations for Ray Serve, with a primary focus on supporting TPU topologies. It adds TPUAcceleratorConfig to the serve.deployment API, enabling users to specify pod topologies (e.g., '4x4') which are then managed via ray.util.tpu.slice_placement_group. The changes include updates to the deployment scheduler to handle these specialized placement groups, logic to release temporary reservation-holder PGs once replicas are ready, and necessary protobuf/serialization updates. One critical issue was identified in the LLM-specific accelerator logic where an invalid label_selector key was added to actor options, which would cause a runtime TypeError.

python/ray/llm/_internal/serve/core/configs/accelerators.py Outdated

ryanaoleary mentioned this pull request

[Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments #63179

Open

cursor Bot reviewed

View reviewed changes

python/ray/llm/_internal/serve/core/server/llm_server.py

src/ray/protobuf/serve.proto

ray-gardener Bot added serve community-contribution labels

ryanaoleary changed the title ~~[LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve~~ [LLM][2/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve

ryanaoleary force-pushed the e-llm-accelerator-config branch from 35059d2 to 2ea2663 Compare

May 8, 2026 12:02

cursor Bot reviewed

View reviewed changes

python/ray/serve/_private/deployment_state.py Outdated

python/ray/llm/tests/serve/cpu/deployments/conftest.py Outdated


          Add AcceleratorConfig to Serve and fix gang scheduling

5ec15c0

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from 2ea2663 to db1f9c0 Compare

May 8, 2026 12:12

cursor Bot reviewed

View reviewed changes

python/ray/serve/_private/deployment_scheduler.py

ryanaoleary and others added 9 commits

May 9, 2026 01:28


          fix tests, change discriminator to 'kind', and fix cleanup logic

1c1dda8

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          fix import and var name

649229e

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          add missing import

779d95d

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

scope down PR and remove gang scheduling changes

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          lint and remove unused type alias

3a1d724

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          Merge branch 'master' into e-serve-accelerator-config

da95aac


          add comment to inline import

d603123

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          Tighten typing for placement-group fields after PR restructure

80162c2

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          remove added whitespace

e45ee82

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          fix external placement group function override

f96ef1e

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from db1f9c0 to 812ef63 Compare

May 11, 2026 20:59

cursor Bot reviewed

View reviewed changes

python/ray/serve/_private/deployment_scheduler.py


          add resources_per_bundle and fix bundles defaulting logic, also add t…

5d95c78

…ests

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from 812ef63 to 6796e7f Compare

May 12, 2026 00:47

cursor Bot reviewed

View reviewed changes

python/ray/serve/_private/default_impl.py

ryanaoleary added 2 commits

May 12, 2026 01:58


          Safely unwrap ReplicaPlacementGroup for gangs and fix type alias

afb07a0

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          Fix placement group leakage on actor creation failure for custom over…

70b6a6f

…rides with accelerators

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from 6796e7f to 036a9a8 Compare

May 12, 2026 02:33

ryanaoleary added 2 commits

May 12, 2026 03:01


          Release TPU reservation holders in cross-language replica startup pat…

89dc61f

…h to prevent placement group leaks

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          Remove redundant replica_pg reassignment in deployment scheduler

f82eab9

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from 036a9a8 to 92fdb51 Compare

May 12, 2026 03:25

cursor Bot reviewed

View reviewed changes

python/ray/serve/_private/deployment_state.py


          Safeguard check_stopped placement group teardown with robust exceptio…

19c8aeb

…n handling to prevent controller crashes

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from 92fdb51 to b1c19d9 Compare

May 12, 2026 03:37

ryanaoleary added 2 commits

May 12, 2026 11:45


          fix gang pg cleanup to fix tests

8eab85a

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>


          use Serve's TPUAcceleratorConfig in Ray LLM

6f172f3

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Update Ray LLM to use TPUAcceleratorConfig

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the e-llm-accelerator-config branch from b1c19d9 to 6f172f3 Compare

May 12, 2026 12:06


          Merge branch 'master' into e-llm-accelerator-config

18e66cb

cursor Bot reviewed

View reviewed changes

cursor Bot left a comment

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 18e66cb. Configure here.}

python/ray/serve/_private/deployment_state.py

kouroshHakha added the llm label

jeffreywang-anyscale reviewed

View reviewed changes

Contributor

jeffreywang-anyscale left a comment •

edited

Loading

Reviewed python/ray/llm/_internal/serve/core/configs/accelerators.py and python/ray/llm/_internal/serve/core/server/llm_server.py. Other changes seem to be from preceding PRs.

ryanaoleary changed the title ~~[LLM][2/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve~~ [LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution llm serve