Skip to content

[Proposal] Universal wheel: runtime CUDA/ROCm detection to eliminate separate builds #68

@wjabbour

Description

@wjabbour

Problem

Today fastsafetensors ships two separate builds — one for CUDA, one for ROCm. This means:

  • ROCm users need a separate index URL or a git reference to get the right wheel
  • Downstream projects like vLLM have to special-case fastsafetensors in their ROCm packaging
  • ROCm ends up as a second-class citizen requiring extra install steps that CUDA users never see

There is no fundamental reason this has to be the case.

Observation

The C++ extension already loads the GPU runtime entirely at runtime via dlopen() — nothing is linked at compile time. The only reason two separate builds exist today is that the symbol names passed to dlsym() differ between CUDA ("cudaMemcpy") and ROCm ("hipMemcpy"), and those strings are currently baked in at compile time.

Proposal

Move the CUDA/ROCm selection from compile time to runtime:

  1. load_library_functions() tries dlopen("libcudart.so") first, then falls back to dlopen("libamdhip64.so")
  2. Both sets of symbol name strings are compiled into the binary
  3. At runtime, whichever library loads successfully determines which symbol names are used

The result is a single universal wheel that works on both CUDA and ROCm systems with no user configuration. One PyPI entry, no extra index URL, no special-casing in downstream projects.

Relationship to PR #67

PR #67 lays the groundwork by moving symbol names into cuda_compat.h as GPU_SYM_* macros. The runtime detection idea builds naturally on top of that — instead of selecting CUDA or ROCm symbol names at compile time via #ifdef, load_library_functions() selects them at runtime based on which library is present.

Impact

  • Single wheel on PyPI works for all users
  • vLLM and other downstream projects drop ROCm-specific fastsafetensors packaging entirely
  • No behavior change for existing CUDA or ROCm users

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions