[CuTeDSL] Make editable installs use exact runtime companion wheels by alecco · Pull Request #3204 · NVIDIA/cutlass

alecco · 2026-05-05T14:04:19Z

This PR tightens CuTe DSL editable-install hygiene and removes a few import hazards that can make local development and test collection depend on ambient Python environment state. The main change teaches prep_editable_install.py to resolve the generated runtime payload from the downloaded nvidia-cutlass-dsl wheel’s metadata instead of copying files from whatever compatible-looking package happens to be installed in site-packages.

What changed

Read the downloaded nvidia-cutlass-dsl wheel METADATA and extract exact-pinned nvidia-cutlass-dsl-libs-* companion requirements.
Download and extract the selected runtime companion wheel into the same temporary install workspace.
Support runtime provider selection through:
- default provider: base
- CUTLASS_DSL_RUNTIME_PROVIDER=cu13
- matching package extras when present
Clean stale generated editable-install payload before copying new files:
- cutlass/_mlir
- lib/
- copied py.typed markers
Validate that the editable runtime payload contains cutlass/_mlir, lib/, and at least one shared library before writing VERSION.EDITABLE.
Add .gitignore entries for generated CuTe DSL editable-install artifacts.
Replace the test-sharding fallback device_info import with CuTe DSL’s own CUDA runtime capability helper.
Remove import-time sys.path mutation from the base DSL compiler module.
Clean up ruff-visible unused imports / exception bindings.

Why

Recent nvidia-cutlass-dsl packaging can split the metadata wheel from the generated runtime payload in companion nvidia-cutlass-dsl-libs-* wheels. Copying generated Python and shared libraries from ambient site-packages is fragile because different runtime providers can install overlapping payload paths. This PR makes editable installs reproducible by sourcing the payload from the exact companion wheel declared by the downloaded DSL wheel.

It also prevents stale generated runtime files from surviving provider or version changes, and avoids test/import behavior that depends on unrelated packages installed in the developer environment.

Validation

python -m py_compile python/CuTeDSL/prep_editable_install.py
ruff check python/CuTeDSL/prep_editable_install.py
git diff --check
temp-only probe resolved default libs-base and explicit cu13 companion wheels
temp-only stale payload probe verified cleanup, fresh copy, and runtime validation
python -m py_compile test/utils/test_sharding.py
ruff check --select F401 test/utils/test_sharding.py

Teach prep_editable_install.py to handle metadata-only nvidia-cutlass-dsl wheels without copying generated runtime files from arbitrary ambient site-packages. Recent nvidia-cutlass-dsl wheels can declare exact nvidia-cutlass-dsl-libs-* companion packages for the generated cutlass._mlir Python payload and runtime shared libraries; mixing files from a different installed provider or version is invalid because libs-base and libs-cu13 install overlapping payload paths. The script now reads the downloaded nvidia-cutlass-dsl wheel METADATA, extracts the exact Requires-Dist entries for nvidia-cutlass-dsl-libs-*, downloads the selected companion wheel into the same temporary directory, and copies from the extracted downloaded wheels only. The default runtime provider is base, with CUTLASS_DSL_RUNTIME_PROVIDER=cu13 or a matching package extra available for CUDA 13 payload selection. Before copying, editable setup removes generated runtime state from previous runs: cutlass/_mlir, lib/, and copied py.typed markers. This makes the runtime payload replacement atomic enough for editable installs and prevents stale generated Python or shared libraries from surviving a version/provider change while VERSION.EDITABLE is updated. After copying, setup validates that cutlass/_mlir, lib/, and at least one runtime shared library are present before writing VERSION.EDITABLE. Missing runtime payloads now fail hard instead of producing an editable install that cannot import cutlass._mlir. This keeps editable installs reproducible and aligned with the downloaded DSL wheel metadata while still supporting metadata-only packaging layouts. It also keeps the editable-install artifact ignores from the original change. Validation: python -m py_compile python/CuTeDSL/prep_editable_install.py; ruff check python/CuTeDSL/prep_editable_install.py; git diff --check; temp-only probe resolved default libs-base and explicit cu13 companion wheels; temp-only stale payload probe verified cleanup, fresh copy, and runtime validation

Replace the pytest sharding fallback import of a top-level device_info.compute_capability module with cutlass.base_dsl.runtime.cuda.get_compute_capability_major_minor(). The previous import can resolve to unrelated third-party or environment-provided device_info packages; in our environment it resolved to an installed empty package and caused pytest collection to fail before any CuTe DSL tests ran. The CuTe DSL runtime helper already owns CUDA Driver API capability detection and is also used by the DSL environment manager, so using it here keeps test selection self-contained in the repo. Validation: python -m py_compile test/utils/test_sharding.py; ruff check --select F401 test/utils/test_sharding.py.

Remove the base DSL compiler's import-time sys.path mutation. The module already imports _mlir through the package-relative path, so appending its own directory to sys.path is unnecessary global interpreter state. While touching the compiler module, clean up two ruff-visible issues: avoid an unused CUDA exception binding and keep the TVM FFI availability check lint-clean while preserving the old importability semantics. enable_tvm_ffi now uses importlib.import_module("tvm_ffi") so a broken discoverable installation fails locally instead of passing a find_spec check and failing later.

agent added 4 commits May 5, 2026 15:57

[CuTeDSL] clean up unused import

f233380

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CuTeDSL] Make editable installs use exact runtime companion wheels#3204

[CuTeDSL] Make editable installs use exact runtime companion wheels#3204
alecco wants to merge 4 commits into
NVIDIA:mainfrom
alecco:sm120-nvfp4-pr0-hygiene

alecco commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alecco commented May 5, 2026

What changed

Why

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant