[Build] feat: multi-backend build system to consolidate CUDA, ROCm, CPU, etc by AlpinDale · Pull Request #1540 · dphnAI/aphrodite-engine

AlpinDale · 2025-09-29T12:01:44Z

This PR is an attempt to consolidate all our supported hardware backends, so that we can create one wheel for all of them.

For now, it copies the compiled binaries for each of the 3 backends that compile C++ code to aphrodite/extensions, and those are dispatched based on the detected hardware device.

PyTorch on each of the 3 backends needs to be downloaded from a specific index url, so perhaps we'll need to add that in the setup.py based on the detected target device. Other than that, this should be pretty straightforward.

Currently only tested with CUDA, will need more testing on CPU and ROCm.

…PU, etc

gemini-code-assist · 2025-09-29T12:02:05Z

Summary of Changes

Hello @AlpinDale, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to the project's build and distribution system. The primary objective is to enable the creation of a single Python wheel that supports multiple hardware backends, including CUDA, ROCm, and CPU. This is achieved by centralizing compiled C++ extensions into a new aphrodite/extensions directory, organized by backend, and implementing dynamic loading mechanisms in the Python codebase. This consolidation streamlines the build process and simplifies deployment across diverse hardware environments, ensuring that the correct, optimized extensions are loaded based on the detected platform.

Highlights

Build System Consolidation: The build system has been refactored to support a multi-backend approach, allowing for the compilation of extensions for CUDA, ROCm, and CPU within a unified structure, with the goal of creating a single distribution wheel.
Dynamic Extension Loading: Python code now dynamically imports backend-specific compiled extensions (e.g., _C, _moe_C, _flashmla_C, cumem_allocator) from dedicated subdirectories within aphrodite/extensions based on the detected hardware platform at runtime.
New Directory Structure for Extensions: A new aphrodite/extensions directory has been introduced. This directory will house compiled binaries for each supported backend (e.g., aphrodite/extensions/cuda, aphrodite/extensions/rocm, aphrodite/extensions/cpu), facilitating the consolidated build and dynamic loading.
ROCm Version Update: The rocm-build.txt requirements file has been updated to specify PyTorch wheels for ROCm 6.4, moving from the previous 6.3 version.
Ray Dependency Adjustment: The ray dependency in rocm.txt has been modified to remove its upper version bound, allowing for greater flexibility in ray versions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively refactors the build system to support multiple backends (CUDA, ROCm, CPU) within a single wheel, which is a great improvement for distribution and usability. The changes in CMake and setup.py are well-structured to handle the different backends by placing compiled extensions into backend-specific directories. The Python code is then correctly updated to dynamically load these extensions at runtime based on the detected platform.

My review includes a few suggestions:

A minor formatting fix in a CMake file for consistency.
A refactoring suggestion in _custom_ops.py to reduce code duplication and improve maintainability.
A note on a dependency change in requirements/rocm.txt regarding a removed version constraint, which could be a potential risk.

Overall, this is a solid contribution that simplifies the project's packaging and deployment.

gemini-code-assist · 2025-09-29T12:03:34Z

+define_gpu_extension_target(
+  cumem_allocator
+  DESTINATION aphrodite/extensions/cuda


The indentation for define_gpu_extension_target and its arguments appears to have been accidentally changed, making it inconsistent with the surrounding code style. Please restore the original indentation to improve readability and maintain consistency.

define_gpu_extension_target( cumem_allocator DESTINATION aphrodite/extensions/cuda

gemini-code-assist · 2025-09-29T12:03:34Z

 if not current_platform.is_tpu() and not current_platform.is_xpu():
    try:
-        import aphrodite._C
+        if current_platform.is_cuda():
+            import aphrodite.extensions.cuda._C  # noqa: F401
+        elif current_platform.is_rocm():
+            import aphrodite.extensions.rocm._C  # noqa: F401
+            # Also register ROCm-specific ops if present
+            with contextlib.suppress(ImportError):
+                import aphrodite.extensions.rocm._rocm_C  # noqa: F401
+        elif current_platform.is_cpu():
+            import aphrodite.extensions.cpu._C  # noqa: F401
+        else:
+            # Other platforms not handled here
+            pass
    except ImportError as e:
-        logger.warning("Failed to import from aphrodite._C with {!r}", e)
+        logger.warning("Failed to import platform-specific _C with {!r}", e)

 supports_moe_ops = False
-with contextlib.suppress(ImportError):
-    import aphrodite._moe_C  # noqa: F401
-    supports_moe_ops = True
+if current_platform.is_cuda():
+    with contextlib.suppress(ImportError):
+        import aphrodite.extensions.cuda._moe_C  # noqa: F401
+        supports_moe_ops = True
+elif current_platform.is_rocm():
+    with contextlib.suppress(ImportError):
+        import aphrodite.extensions.rocm._moe_C  # noqa: F401
+        supports_moe_ops = True


The logic for importing platform-specific extensions and setting supports_moe_ops contains duplicated code for CUDA and ROCm backends. This can be refactored to improve readability and maintainability by first determining the backend and then using a single block of code for the import logic. Using importlib.import_module would make dynamic imports cleaner.

import importlib if not current_platform.is_tpu() and not current_platform.is_xpu(): backend = None if current_platform.is_cuda(): backend = "cuda" elif current_platform.is_rocm(): backend = "rocm" elif current_platform.is_cpu(): backend = "cpu" if backend: try: importlib.import_module(f"aphrodite.extensions.{backend}._C") if backend == "rocm": # Also register ROCm-specific ops if present with contextlib.suppress(ImportError): importlib.import_module("aphrodite.extensions.rocm._rocm_C") except ImportError as e: logger.warning("Failed to import platform-specific _C for backend {} with {!r}", backend, e) else: # Other platforms not handled here pass supports_moe_ops = False if current_platform.is_cuda() or current_platform.is_rocm(): backend = "cuda" if current_platform.is_cuda() else "rocm" with contextlib.suppress(ImportError): importlib.import_module(f"aphrodite.extensions.{backend}._moe_C") supports_moe_ops = True

gemini-code-assist · 2025-09-29T12:03:35Z

 botocore
 datasets
-ray>=2.10.0,<2.45.0
+ray>=2.10.0


Removing the upper version constraint for ray could introduce instability if a future version includes breaking changes. It's generally safer to specify a tested upper bound. If this change is intentional and has been tested, consider adding a comment to explain why the upper bound was removed.

[Build] feat: multi-backend build system to consolidate CUDA, ROCm, C…

c3e81e0

…PU, etc

gemini-code-assist Bot reviewed Sep 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Build] feat: multi-backend build system to consolidate CUDA, ROCm, CPU, etc#1540

[Build] feat: multi-backend build system to consolidate CUDA, ROCm, CPU, etc#1540
AlpinDale wants to merge 1 commit into
mainfrom
multibackend

AlpinDale commented Sep 29, 2025

Uh oh!

gemini-code-assist Bot commented Sep 29, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlpinDale commented Sep 29, 2025

Uh oh!

gemini-code-assist Bot commented Sep 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant