[CuTeDSL] Make editable installs use exact runtime companion wheels#3204
Open
alecco wants to merge 4 commits into
Open
[CuTeDSL] Make editable installs use exact runtime companion wheels#3204alecco wants to merge 4 commits into
alecco wants to merge 4 commits into
Conversation
added 4 commits
May 5, 2026 15:57
Teach prep_editable_install.py to handle metadata-only nvidia-cutlass-dsl wheels without copying generated runtime files from arbitrary ambient site-packages. Recent nvidia-cutlass-dsl wheels can declare exact nvidia-cutlass-dsl-libs-* companion packages for the generated cutlass._mlir Python payload and runtime shared libraries; mixing files from a different installed provider or version is invalid because libs-base and libs-cu13 install overlapping payload paths. The script now reads the downloaded nvidia-cutlass-dsl wheel METADATA, extracts the exact Requires-Dist entries for nvidia-cutlass-dsl-libs-*, downloads the selected companion wheel into the same temporary directory, and copies from the extracted downloaded wheels only. The default runtime provider is base, with CUTLASS_DSL_RUNTIME_PROVIDER=cu13 or a matching package extra available for CUDA 13 payload selection. Before copying, editable setup removes generated runtime state from previous runs: cutlass/_mlir, lib/, and copied py.typed markers. This makes the runtime payload replacement atomic enough for editable installs and prevents stale generated Python or shared libraries from surviving a version/provider change while VERSION.EDITABLE is updated. After copying, setup validates that cutlass/_mlir, lib/, and at least one runtime shared library are present before writing VERSION.EDITABLE. Missing runtime payloads now fail hard instead of producing an editable install that cannot import cutlass._mlir. This keeps editable installs reproducible and aligned with the downloaded DSL wheel metadata while still supporting metadata-only packaging layouts. It also keeps the editable-install artifact ignores from the original change. Validation: python -m py_compile python/CuTeDSL/prep_editable_install.py; ruff check python/CuTeDSL/prep_editable_install.py; git diff --check; temp-only probe resolved default libs-base and explicit cu13 companion wheels; temp-only stale payload probe verified cleanup, fresh copy, and runtime validation
Replace the pytest sharding fallback import of a top-level device_info.compute_capability module with cutlass.base_dsl.runtime.cuda.get_compute_capability_major_minor(). The previous import can resolve to unrelated third-party or environment-provided device_info packages; in our environment it resolved to an installed empty package and caused pytest collection to fail before any CuTe DSL tests ran. The CuTe DSL runtime helper already owns CUDA Driver API capability detection and is also used by the DSL environment manager, so using it here keeps test selection self-contained in the repo. Validation: python -m py_compile test/utils/test_sharding.py; ruff check --select F401 test/utils/test_sharding.py.
Remove the base DSL compiler's import-time sys.path mutation. The module already imports _mlir through the package-relative path, so appending its own directory to sys.path is unnecessary global interpreter state.
While touching the compiler module, clean up two ruff-visible issues: avoid an unused CUDA exception binding and keep the TVM FFI availability check lint-clean while preserving the old importability semantics. enable_tvm_ffi now uses importlib.import_module("tvm_ffi") so a broken discoverable installation fails locally instead of passing a find_spec check and failing later.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR tightens CuTe DSL editable-install hygiene and removes a few import hazards that can make local development and test collection depend on ambient Python environment state. The main change teaches
prep_editable_install.pyto resolve the generated runtime payload from the downloadednvidia-cutlass-dslwheel’s metadata instead of copying files from whatever compatible-looking package happens to be installed insite-packages.What changed
Read the downloaded
nvidia-cutlass-dslwheelMETADATAand extract exact-pinnednvidia-cutlass-dsl-libs-*companion requirements.Download and extract the selected runtime companion wheel into the same temporary install workspace.
Support runtime provider selection through:
baseCUTLASS_DSL_RUNTIME_PROVIDER=cu13Clean stale generated editable-install payload before copying new files:
cutlass/_mlirlib/py.typedmarkersValidate that the editable runtime payload contains
cutlass/_mlir,lib/, and at least one shared library before writingVERSION.EDITABLE.Add
.gitignoreentries for generated CuTe DSL editable-install artifacts.Replace the test-sharding fallback
device_infoimport with CuTe DSL’s own CUDA runtime capability helper.Remove import-time
sys.pathmutation from the base DSL compiler module.Clean up ruff-visible unused imports / exception bindings.
Why
Recent
nvidia-cutlass-dslpackaging can split the metadata wheel from the generated runtime payload in companionnvidia-cutlass-dsl-libs-*wheels. Copying generated Python and shared libraries from ambientsite-packagesis fragile because different runtime providers can install overlapping payload paths. This PR makes editable installs reproducible by sourcing the payload from the exact companion wheel declared by the downloaded DSL wheel.It also prevents stale generated runtime files from surviving provider or version changes, and avoids test/import behavior that depends on unrelated packages installed in the developer environment.
Validation
python -m py_compile python/CuTeDSL/prep_editable_install.pyruff check python/CuTeDSL/prep_editable_install.pygit diff --checklibs-baseand explicitcu13companion wheelspython -m py_compile test/utils/test_sharding.pyruff check --select F401 test/utils/test_sharding.py