[Build] feat: multi-backend build system to consolidate CUDA, ROCm, CPU, etc#1540
[Build] feat: multi-backend build system to consolidate CUDA, ROCm, CPU, etc#1540AlpinDale wants to merge 1 commit into
Conversation
Summary of ChangesHello @AlpinDale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change to the project's build and distribution system. The primary objective is to enable the creation of a single Python wheel that supports multiple hardware backends, including CUDA, ROCm, and CPU. This is achieved by centralizing compiled C++ extensions into a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the build system to support multiple backends (CUDA, ROCm, CPU) within a single wheel, which is a great improvement for distribution and usability. The changes in CMake and setup.py are well-structured to handle the different backends by placing compiled extensions into backend-specific directories. The Python code is then correctly updated to dynamically load these extensions at runtime based on the detected platform.
My review includes a few suggestions:
- A minor formatting fix in a CMake file for consistency.
- A refactoring suggestion in
_custom_ops.pyto reduce code duplication and improve maintainability. - A note on a dependency change in
requirements/rocm.txtregarding a removed version constraint, which could be a potential risk.
Overall, this is a solid contribution that simplifies the project's packaging and deployment.
| define_gpu_extension_target( | ||
| cumem_allocator | ||
| DESTINATION aphrodite/extensions/cuda |
There was a problem hiding this comment.
The indentation for define_gpu_extension_target and its arguments appears to have been accidentally changed, making it inconsistent with the surrounding code style. Please restore the original indentation to improve readability and maintain consistency.
define_gpu_extension_target(
cumem_allocator
DESTINATION aphrodite/extensions/cuda
| if not current_platform.is_tpu() and not current_platform.is_xpu(): | ||
| try: | ||
| import aphrodite._C | ||
| if current_platform.is_cuda(): | ||
| import aphrodite.extensions.cuda._C # noqa: F401 | ||
| elif current_platform.is_rocm(): | ||
| import aphrodite.extensions.rocm._C # noqa: F401 | ||
| # Also register ROCm-specific ops if present | ||
| with contextlib.suppress(ImportError): | ||
| import aphrodite.extensions.rocm._rocm_C # noqa: F401 | ||
| elif current_platform.is_cpu(): | ||
| import aphrodite.extensions.cpu._C # noqa: F401 | ||
| else: | ||
| # Other platforms not handled here | ||
| pass | ||
| except ImportError as e: | ||
| logger.warning("Failed to import from aphrodite._C with {!r}", e) | ||
| logger.warning("Failed to import platform-specific _C with {!r}", e) | ||
|
|
||
| supports_moe_ops = False | ||
| with contextlib.suppress(ImportError): | ||
| import aphrodite._moe_C # noqa: F401 | ||
| supports_moe_ops = True | ||
| if current_platform.is_cuda(): | ||
| with contextlib.suppress(ImportError): | ||
| import aphrodite.extensions.cuda._moe_C # noqa: F401 | ||
| supports_moe_ops = True | ||
| elif current_platform.is_rocm(): | ||
| with contextlib.suppress(ImportError): | ||
| import aphrodite.extensions.rocm._moe_C # noqa: F401 | ||
| supports_moe_ops = True |
There was a problem hiding this comment.
The logic for importing platform-specific extensions and setting supports_moe_ops contains duplicated code for CUDA and ROCm backends. This can be refactored to improve readability and maintainability by first determining the backend and then using a single block of code for the import logic. Using importlib.import_module would make dynamic imports cleaner.
import importlib
if not current_platform.is_tpu() and not current_platform.is_xpu():
backend = None
if current_platform.is_cuda():
backend = "cuda"
elif current_platform.is_rocm():
backend = "rocm"
elif current_platform.is_cpu():
backend = "cpu"
if backend:
try:
importlib.import_module(f"aphrodite.extensions.{backend}._C")
if backend == "rocm":
# Also register ROCm-specific ops if present
with contextlib.suppress(ImportError):
importlib.import_module("aphrodite.extensions.rocm._rocm_C")
except ImportError as e:
logger.warning("Failed to import platform-specific _C for backend {} with {!r}", backend, e)
else:
# Other platforms not handled here
pass
supports_moe_ops = False
if current_platform.is_cuda() or current_platform.is_rocm():
backend = "cuda" if current_platform.is_cuda() else "rocm"
with contextlib.suppress(ImportError):
importlib.import_module(f"aphrodite.extensions.{backend}._moe_C")
supports_moe_ops = True| botocore | ||
| datasets | ||
| ray>=2.10.0,<2.45.0 | ||
| ray>=2.10.0 |
There was a problem hiding this comment.
This PR is an attempt to consolidate all our supported hardware backends, so that we can create one wheel for all of them.
For now, it copies the compiled binaries for each of the 3 backends that compile C++ code to
aphrodite/extensions, and those are dispatched based on the detected hardware device.PyTorch on each of the 3 backends needs to be downloaded from a specific index url, so perhaps we'll need to add that in the setup.py based on the detected target device. Other than that, this should be pretty straightforward.
Currently only tested with CUDA, will need more testing on CPU and ROCm.