Skip to content

ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant#164

Merged
JustinPerlman merged 2 commits into
mainfrom
jperlman/matrix-sccache-fix
May 20, 2026
Merged

ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant#164
JustinPerlman merged 2 commits into
mainfrom
jperlman/matrix-sccache-fix

Conversation

@JustinPerlman
Copy link
Copy Markdown
Contributor

Summary

Pass cache-key: ${{ matrix.tag-suffix }} to the build workflow so each cuda variant of the vllm-tensorizer matrix gets its own BuildKit registry cache slot, instead of fighting over a single shared one.

The problem

Observed in build: the cuda13.2 variant builds in ~3 min while cuda12.9 builds take ~2 hours, even after multiple commits in-PR and with sccache reporting 100% hit rate on cuda12.9.

Root cause: build.yml computes the BuildKit registry cache reference as ${arch}-${image-name}[-${cache-key}]. We weren't passing cache-key, so both matrix variants pushed to and pulled from the same slot: amd64-vllm-tensorizer.

Each matrix run, both variants pulled the most recently pushed cache. That cache's builder-base layer was built FROM the other cuda's base image, so the digest didn't match and BuildKit had to rebuild every layer from scratch. Once nvcc was invoked, sccache found everything in S3 (hence the 100% hit rate) — but fetching tens of thousands of .o files one-by-one from S3 still takes hours.

The fix

cache-key: ${{ matrix.tag-suffix }}

Yields per-variant cache refs:
- ${registry}/buildcache:amd64-vllm-tensorizer-v0.21.0-cuda13.2.1-ubuntu24.04
- ${registry}/buildcache:amd64-vllm-tensorizer-v0.21.0-cuda12.9.1-ubuntu24.04

After merge, the first cuda12.9 build is still slow (cold BuildKit slot), but subsequent commits that don't touch the builder stages will hit the per-variant slot and finish in minutes like cuda13.2 does today.

@JustinPerlman JustinPerlman requested a review from Eta0 May 20, 2026 17:53
@JustinPerlman JustinPerlman self-assigned this May 20, 2026
@JustinPerlman JustinPerlman requested a review from a team as a code owner May 20, 2026 17:53
Comment thread .github/workflows/vllm-tensorizer.yml Outdated
image-name: vllm-tensorizer
folder: vllm-tensorizer
tag-suffix: ${{ matrix.tag-suffix }}
cache-key: ${{ matrix.tag-suffix }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key should be the CUDA/Ubuntu version; it shouldn't include the vLLM commit (which is included in the tag-suffix). You won't be building from multiple vLLM versions in a matrix, and if you go forward, you are more than likely not going back, so you would want to use what you can from the last vLLM commit's build's cache, and then go from there.

@github-actions
Copy link
Copy Markdown

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26179428631
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-7d94427-v0.21.0-cuda13.2.1-ubuntu24.04

@github-actions
Copy link
Copy Markdown

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26179428631
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-7d94427-v0.21.0-cuda12.9.1-ubuntu24.04

@github-actions
Copy link
Copy Markdown

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26180903764
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-45e7639-v0.21.0-cuda13.2.1-ubuntu24.04

@JustinPerlman JustinPerlman merged commit b2d0717 into main May 20, 2026
8 checks passed
@JustinPerlman JustinPerlman deleted the jperlman/matrix-sccache-fix branch May 20, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants