Skip to content

Commit 5efdadc

Browse files
z52527claude
andcommitted
fix(docker): bypass nvcr base-image poisoned cache for cutlass-dsl
nvcr.io/nvidia/pytorch:26.02-py3's pre-populated pip cache contains an nvcr-built nvidia-cutlass-dsl-libs-base==4.4.1 wheel whose cute/arch/__init__.py is 9 bytes shorter than PyPI's public 4.4.1 wheel and omits the top-level ProxyKind / SharedSpace re-export that flash_attn.cute requires. Plain `pip install 'nvidia-cutlass-dsl[cu13]==4.4.1'` hits the bad cached wheel via pip's extra-resolution code path, even with --no-cache-dir. Switch to --no-deps + the three cutlass-dsl subpackages spelled out explicitly — that routes pip through the simpler explicit-args install path where the cache trap doesn't apply. Re-pin all three subpackages on the bundled `pip install` too, otherwise other packages' deps (quack-kernels, apache-tvm-ffi) cascade and bump cutlass-dsl to a mismatched newer minor. The verify-line `python -c "from cutlass.cute.arch import ProxyKind, SharedSpace"` fail-fasts the build if the upgrade ever stops taking effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Runchu Zhao <zhaorunchu@gmail.com>
1 parent 2c996e5 commit 5efdadc

1 file changed

Lines changed: 12 additions & 11 deletions

File tree

docker/Dockerfile

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,21 +39,22 @@ RUN if [ "${TRITONSERVER_BUILD}" = "1" ]; then \
3939
# Megatron-LM core_v0.13.1: contains d9608004f which gates
4040
# ChainedOptimizer.count_zeros_fp32 on log_num_zeros_in_grad (~4 ms/step
4141
# saving on HSTU bf16); 0.12.x always pays the cost.
42-
#
43-
# nvidia-cutlass-dsl: base image ships 4.3.0 with a .pth that survives
44-
# pip uninstall, so we rm -rf the tree before installing 4.4.x (which
45-
# adds ProxyKind / SharedSpace, required by flash_attn.cute). The final
46-
# `python -c "from cutlass.cute.arch ..."` fails the build immediately
47-
# if the upgrade ever stops taking effect.
42+
# nvidia-cutlass-dsl: --no-deps + --no-cache-dir + 3 explicit subpackages
43+
# avoids a poisoned base-image pip cache wheel; keep the re-pins on the
44+
# bundled install too.
4845
RUN pip uninstall -y nvidia-cutlass-dsl nvidia-cutlass-dsl-libs-base nvidia-cutlass-dsl-libs-cu13 || true && \
4946
rm -rf /usr/local/lib/python3.12/dist-packages/nvidia_cutlass_dsl* && \
47+
pip install --no-cache-dir --no-deps \
48+
'nvidia-cutlass-dsl==4.4.1' \
49+
'nvidia-cutlass-dsl-libs-base==4.4.1' \
50+
'nvidia-cutlass-dsl-libs-cu13==4.4.1' && \
51+
python -c "from cutlass.cute.arch import ProxyKind, SharedSpace" && \
5052
git clone -b core_v0.13.1 https://github.com/NVIDIA/Megatron-LM.git megatron-lm && \
5153
pip install --no-deps -e ./megatron-lm && \
52-
pip install torchx gin-config torchmetrics==1.0.3 typing-extensions iopath pyvers \
53-
cloudpickle triton==3.6.0 'nvidia-cutlass-dsl[cu13]==4.4.1' \
54-
'quack-kernels>=0.3.3' 'apache-tvm-ffi>=0.1.6' torch-c-dlpack-ext \
55-
--no-cache pre-commit && \
56-
python -c "from cutlass.cute.arch import ProxyKind, SharedSpace"
54+
pip install --no-cache-dir torchx gin-config torchmetrics==1.0.3 typing-extensions iopath pyvers \
55+
cloudpickle triton==3.6.0 \
56+
'nvidia-cutlass-dsl==4.4.1' 'nvidia-cutlass-dsl-libs-base==4.4.1' 'nvidia-cutlass-dsl-libs-cu13==4.4.1' \
57+
'quack-kernels>=0.3.3' 'apache-tvm-ffi>=0.1.6' torch-c-dlpack-ext pre-commit
5758

5859
# -- Layer 3: FBGEMM (long build, own layer for caching) ---
5960
RUN pip install --no-cache-dir setuptools-git-versioning scikit-build && \

0 commit comments

Comments
 (0)