Skip to content

[CuTeDSL] Make editable installs use exact runtime companion wheels#3204

Open
alecco wants to merge 4 commits into
NVIDIA:mainfrom
alecco:sm120-nvfp4-pr0-hygiene
Open

[CuTeDSL] Make editable installs use exact runtime companion wheels#3204
alecco wants to merge 4 commits into
NVIDIA:mainfrom
alecco:sm120-nvfp4-pr0-hygiene

Conversation

@alecco
Copy link
Copy Markdown

@alecco alecco commented May 5, 2026

This PR tightens CuTe DSL editable-install hygiene and removes a few import hazards that can make local development and test collection depend on ambient Python environment state. The main change teaches prep_editable_install.py to resolve the generated runtime payload from the downloaded nvidia-cutlass-dsl wheel’s metadata instead of copying files from whatever compatible-looking package happens to be installed in site-packages.

What changed

  • Read the downloaded nvidia-cutlass-dsl wheel METADATA and extract exact-pinned nvidia-cutlass-dsl-libs-* companion requirements.

  • Download and extract the selected runtime companion wheel into the same temporary install workspace.

  • Support runtime provider selection through:

    • default provider: base
    • CUTLASS_DSL_RUNTIME_PROVIDER=cu13
    • matching package extras when present
  • Clean stale generated editable-install payload before copying new files:

    • cutlass/_mlir
    • lib/
    • copied py.typed markers
  • Validate that the editable runtime payload contains cutlass/_mlir, lib/, and at least one shared library before writing VERSION.EDITABLE.

  • Add .gitignore entries for generated CuTe DSL editable-install artifacts.

  • Replace the test-sharding fallback device_info import with CuTe DSL’s own CUDA runtime capability helper.

  • Remove import-time sys.path mutation from the base DSL compiler module.

  • Clean up ruff-visible unused imports / exception bindings.

Why

Recent nvidia-cutlass-dsl packaging can split the metadata wheel from the generated runtime payload in companion nvidia-cutlass-dsl-libs-* wheels. Copying generated Python and shared libraries from ambient site-packages is fragile because different runtime providers can install overlapping payload paths. This PR makes editable installs reproducible by sourcing the payload from the exact companion wheel declared by the downloaded DSL wheel.

It also prevents stale generated runtime files from surviving provider or version changes, and avoids test/import behavior that depends on unrelated packages installed in the developer environment.

Validation

  • python -m py_compile python/CuTeDSL/prep_editable_install.py
  • ruff check python/CuTeDSL/prep_editable_install.py
  • git diff --check
  • temp-only probe resolved default libs-base and explicit cu13 companion wheels
  • temp-only stale payload probe verified cleanup, fresh copy, and runtime validation
  • python -m py_compile test/utils/test_sharding.py
  • ruff check --select F401 test/utils/test_sharding.py

agent added 4 commits May 5, 2026 15:57
Teach prep_editable_install.py to handle metadata-only nvidia-cutlass-dsl wheels without copying generated runtime files from arbitrary ambient site-packages. Recent nvidia-cutlass-dsl wheels can declare exact nvidia-cutlass-dsl-libs-* companion packages for the generated cutlass._mlir Python payload and runtime shared libraries; mixing files from a different installed provider or version is invalid because libs-base and libs-cu13 install overlapping payload paths.

The script now reads the downloaded nvidia-cutlass-dsl wheel METADATA, extracts the exact Requires-Dist entries for nvidia-cutlass-dsl-libs-*, downloads the selected companion wheel into the same temporary directory, and copies from the extracted downloaded wheels only. The default runtime provider is base, with CUTLASS_DSL_RUNTIME_PROVIDER=cu13 or a matching package extra available for CUDA 13 payload selection.

Before copying, editable setup removes generated runtime state from previous runs: cutlass/_mlir, lib/, and copied py.typed markers. This makes the runtime payload replacement atomic enough for editable installs and prevents stale generated Python or shared libraries from surviving a version/provider change while VERSION.EDITABLE is updated.

After copying, setup validates that cutlass/_mlir, lib/, and at least one runtime shared library are present before writing VERSION.EDITABLE. Missing runtime payloads now fail hard instead of producing an editable install that cannot import cutlass._mlir.

This keeps editable installs reproducible and aligned with the downloaded DSL wheel metadata while still supporting metadata-only packaging layouts. It also keeps the editable-install artifact ignores from the original change.

Validation: python -m py_compile python/CuTeDSL/prep_editable_install.py; ruff check python/CuTeDSL/prep_editable_install.py; git diff --check; temp-only probe resolved default libs-base and explicit cu13 companion wheels; temp-only stale payload probe verified cleanup, fresh copy, and runtime validation
Replace the pytest sharding fallback import of a top-level device_info.compute_capability module with cutlass.base_dsl.runtime.cuda.get_compute_capability_major_minor().

The previous import can resolve to unrelated third-party or environment-provided device_info packages; in our environment it resolved to an installed empty package and caused pytest collection to fail before any CuTe DSL tests ran. The CuTe DSL runtime helper already owns CUDA Driver API capability detection and is also used by the DSL environment manager, so using it here keeps test selection self-contained in the repo.

Validation: python -m py_compile test/utils/test_sharding.py; ruff check --select F401 test/utils/test_sharding.py.
Remove the base DSL compiler's import-time sys.path mutation. The module already imports _mlir through the package-relative path, so appending its own directory to sys.path is unnecessary global interpreter state.

While touching the compiler module, clean up two ruff-visible issues: avoid an unused CUDA exception binding and keep the TVM FFI availability check lint-clean while preserving the old importability semantics. enable_tvm_ffi now uses importlib.import_module("tvm_ffi") so a broken discoverable installation fails locally instead of passing a find_spec check and failing later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant