[LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve#63216
[LLM][3/N] Update Ray LLM to consume TPUAcceleratorConfig from Serve#63216ryanaoleary wants to merge 21 commits into
TPUAcceleratorConfig from Serve#63216Conversation
… tests Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…t `bundle_label_selector` Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
There was a problem hiding this comment.
Code Review
This pull request introduces structured accelerator configurations for Ray Serve, with a primary focus on supporting TPU topologies. It adds TPUAcceleratorConfig to the serve.deployment API, enabling users to specify pod topologies (e.g., '4x4') which are then managed via ray.util.tpu.slice_placement_group. The changes include updates to the deployment scheduler to handle these specialized placement groups, logic to release temporary reservation-holder PGs once replicas are ready, and necessary protobuf/serialization updates. One critical issue was identified in the LLM-specific accelerator logic where an invalid label_selector key was added to actor options, which would cause a runtime TypeError.
TPUAcceleratorConfig from ServeTPUAcceleratorConfig from Serve
35059d2 to
2ea2663
Compare
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
2ea2663 to
db1f9c0
Compare
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> scope down PR and remove gang scheduling changes Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
db1f9c0 to
812ef63
Compare
…ests Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
812ef63 to
6796e7f
Compare
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…rides with accelerators Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
6796e7f to
036a9a8
Compare
…h to prevent placement group leaks Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
036a9a8 to
92fdb51
Compare
…n handling to prevent controller crashes Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
92fdb51 to
b1c19d9
Compare
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Update Ray LLM to use TPUAcceleratorConfig Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
b1c19d9 to
6f172f3
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 18e66cb. Configure here.
TPUAcceleratorConfig from ServeTPUAcceleratorConfig from Serve

Description
This PR enables Ray LLM to use the new
TPUAcceleratorConfigfrom Serve implemented in #63179, replacing the deferred PG logic for scheduling workers. This resolves a critical bug where deferred placement groups left LLMServer replicas scheduled without accelerator constraints.Instead of deferring PG creation, we now pass the hardware topology directly to Serve. This ensures replicas are correctly constrained, allowing Serve to take ownership of the placement group lifecycle and successfully clean up temporary head placement groups upon readiness.
Key Changes:
LLMServer.get_deployment_optionsto translateTPUConfig into ServeTPUAcceleratorConfig` and inject it into the Serve decorator options.This PR includes the changes from #63177 and #63179 so that the integration tests work. This PR only modifies the following files, the rest are from the PR this is rebased on:
Related issues
#57137
Additional information