Skip to content

[LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve#63216

Open
ryanaoleary wants to merge 21 commits into
ray-project:masterfrom
ryanaoleary:e-llm-accelerator-config
Open

[LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve#63216
ryanaoleary wants to merge 21 commits into
ray-project:masterfrom
ryanaoleary:e-llm-accelerator-config

Conversation

@ryanaoleary
Copy link
Copy Markdown
Contributor

@ryanaoleary ryanaoleary commented May 8, 2026

Description

This PR enables Ray LLM to use the new TPUAcceleratorConfig from Serve implemented in #63179, replacing the deferred PG logic for scheduling workers. This resolves a critical bug where deferred placement groups left LLMServer replicas scheduled without accelerator constraints.

Instead of deferring PG creation, we now pass the hardware topology directly to Serve. This ensures replicas are correctly constrained, allowing Serve to take ownership of the placement group lifecycle and successfully clean up temporary head placement groups upon readiness.

Key Changes:

  • Updated LLMServer.get_deployment_options to translate TPUConfig into Serve TPUAcceleratorConfig` and inject it into the Serve decorator options.
  • Refactored multi-host TPU tests to pass options via decorator, and added robust assertions verifying successful head-PG teardown.

This PR includes the changes from #63177 and #63179 so that the integration tests work. This PR only modifies the following files, the rest are from the PR this is rebased on:

  • python/ray/llm/_internal/serve/core/configs/accelerators.py
  • python/ray/llm/_internal/serve/core/server/llm_server.py
  • python/ray/llm/tests/serve/cpu/deployments/conftest.py
  • python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_engine_tpu.py
  • python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_server.py
  • python/ray/serve/tests/test_accelerator_config.py

Related issues

#57137

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

… tests

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…t `bundle_label_selector`

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary requested review from a team as code owners May 8, 2026 03:00
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces structured accelerator configurations for Ray Serve, with a primary focus on supporting TPU topologies. It adds TPUAcceleratorConfig to the serve.deployment API, enabling users to specify pod topologies (e.g., '4x4') which are then managed via ray.util.tpu.slice_placement_group. The changes include updates to the deployment scheduler to handle these specialized placement groups, logic to release temporary reservation-holder PGs once replicas are ready, and necessary protobuf/serialization updates. One critical issue was identified in the LLM-specific accelerator logic where an invalid label_selector key was added to actor options, which would cause a runtime TypeError.

Comment thread python/ray/llm/_internal/serve/core/configs/accelerators.py Outdated
Comment thread python/ray/llm/_internal/serve/core/server/llm_server.py
Comment thread src/ray/protobuf/serve.proto
@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels May 8, 2026
@ryanaoleary ryanaoleary changed the title [LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve [LLM][2/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve May 8, 2026
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 35059d2 to 2ea2663 Compare May 8, 2026 12:02
Comment thread python/ray/serve/_private/deployment_state.py Outdated
Comment thread python/ray/llm/tests/serve/cpu/deployments/conftest.py Outdated
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 2ea2663 to db1f9c0 Compare May 8, 2026 12:12
Comment thread python/ray/serve/_private/deployment_scheduler.py
ryanaoleary and others added 9 commits May 9, 2026 01:28
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

scope down PR and remove gang scheduling changes

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from db1f9c0 to 812ef63 Compare May 11, 2026 20:59
Comment thread python/ray/serve/_private/deployment_scheduler.py
…ests

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 812ef63 to 6796e7f Compare May 12, 2026 00:47
Comment thread python/ray/serve/_private/default_impl.py
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…rides with accelerators

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 6796e7f to 036a9a8 Compare May 12, 2026 02:33
…h to prevent placement group leaks

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 036a9a8 to 92fdb51 Compare May 12, 2026 03:25
Comment thread python/ray/serve/_private/deployment_state.py
…n handling to prevent controller crashes

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from 92fdb51 to b1c19d9 Compare May 12, 2026 03:37
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Update Ray LLM to use TPUAcceleratorConfig

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the e-llm-accelerator-config branch from b1c19d9 to 6f172f3 Compare May 12, 2026 12:06
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 18e66cb. Configure here.

Comment thread python/ray/serve/_private/deployment_state.py
Copy link
Copy Markdown
Contributor

@jeffreywang-anyscale jeffreywang-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed python/ray/llm/_internal/serve/core/configs/accelerators.py and python/ray/llm/_internal/serve/core/server/llm_server.py. Other changes seem to be from preceding PRs.

@ryanaoleary ryanaoleary changed the title [LLM][2/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve [LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants