Commit 5efdadc
fix(docker): bypass nvcr base-image poisoned cache for cutlass-dsl
nvcr.io/nvidia/pytorch:26.02-py3's pre-populated pip cache contains
an nvcr-built nvidia-cutlass-dsl-libs-base==4.4.1 wheel whose
cute/arch/__init__.py is 9 bytes shorter than PyPI's public 4.4.1
wheel and omits the top-level ProxyKind / SharedSpace re-export
that flash_attn.cute requires. Plain `pip install
'nvidia-cutlass-dsl[cu13]==4.4.1'` hits the bad cached wheel via
pip's extra-resolution code path, even with --no-cache-dir.
Switch to --no-deps + the three cutlass-dsl subpackages spelled
out explicitly — that routes pip through the simpler explicit-args
install path where the cache trap doesn't apply. Re-pin all three
subpackages on the bundled `pip install` too, otherwise other
packages' deps (quack-kernels, apache-tvm-ffi) cascade and bump
cutlass-dsl to a mismatched newer minor.
The verify-line `python -c "from cutlass.cute.arch import
ProxyKind, SharedSpace"` fail-fasts the build if the upgrade
ever stops taking effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Runchu Zhao <zhaorunchu@gmail.com>1 parent 2c996e5 commit 5efdadc
1 file changed
Lines changed: 12 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
48 | 45 | | |
49 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| |||
0 commit comments