Skip to content

feat: add MUSA support for FlexKV#126

Open
superleo wants to merge 1 commit into
taco-project:mainfrom
superleo:main
Open

feat: add MUSA support for FlexKV#126
superleo wants to merge 1 commit into
taco-project:mainfrom
superleo:main

Conversation

@superleo
Copy link
Copy Markdown

Description

This PR adds support for MUSA as an alternative backend to CUDA in FlexKV.

Summary

  • Backend abstraction: Introduces gpu_backend.py and gpu_runtime.py so Python code uses a single dispatch layer instead of scattered torch.cuda calls. This keeps the CUDA path unchanged and allows MUSA to be added without #ifdef in existing CUDA sources.

  • MUSA C++ extension: Adds a parallel MUSA implementation under csrc/musa/:

    • Transfer kernels (transfer_musa.mu)
    • GDS manager and layout transform for MUSA
    • Thread groups for transfer and GDS
    • Python bindings (bindings_musa.cpp)
  • Build system: Adds build_config.py and updates setup.py to support conditional MUSA builds via FLEXKV_USE_MUSA=1. CUDA and MUSA extensions can be built independently.

  • Integration: Updates memory_handle.py, worker.py, allocator.py, and the vLLM/TensorRT-LLM adapters to use the GPU runtime abstraction.

  • Documentation: Adds docs/musa/musa_support_system_design.md and docs/musa/musa_test_plan.md.

  • Tests: Adds tests for backend dispatch, GPU runtime, MUSA build, and MUSA transfer.

Design principles

  • Same API shape as CUDA (musa* types/functions, mcc compiler)
  • No changes to existing CUDA code paths
  • Backend abstraction first, then MUSA wiring

Testing

  • tests/test_gpu_backend_dispatch.py – backend selection
image
  • tests/test_gpu_runtime.py – runtime abstraction
image
  • tests/test_musa_build.py – MUSA build (when FLEXKV_USE_MUSA=1)
image
  • tests/test_transfer_musa.py – MUSA transfer behavior
image

- Add GPU backend abstraction layer (gpu_backend.py, gpu_runtime.py) for
  dispatching between CUDA and MUSA
- Implement MUSA C++ extension (csrc/musa/):
  transfer kernels, GDS manager, layout transform, thread groups
- Add build_config.py and extend setup.py for conditional MUSA build
  (FLEXKV_USE_MUSA=1)
- Modify memory_handle.py, worker.py, allocator.py to use gpu_runtime
  for backend-agnostic stream/device/memory operations
- Update vLLM and TensorRT-LLM adapters for backend dispatch
- Add requirements-musa.txt and MUSA build/test documentation
- Add tests: gpu_backend_dispatch, gpu_runtime, musa_build, transfer_musa
@YconquestY
Copy link
Copy Markdown
Collaborator

Hi @superleo. We appreciate your contribution :) Please wait for a moment. We are designing official abstraction and API for integrating variaous AI accelerators.

cc @linhu-nv @feiqiangs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants