ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant by JustinPerlman · Pull Request #164 · coreweave/ml-containers

JustinPerlman · 2026-05-20T17:53:17Z

Summary

Pass cache-key: ${{ matrix.tag-suffix }} to the build workflow so each cuda variant of the vllm-tensorizer matrix gets its own BuildKit registry cache slot, instead of fighting over a single shared one.

The problem

Observed in build: the cuda13.2 variant builds in ~3 min while cuda12.9 builds take ~2 hours, even after multiple commits in-PR and with sccache reporting 100% hit rate on cuda12.9.

Root cause: build.yml computes the BuildKit registry cache reference as ${arch}-${image-name}[-${cache-key}]. We weren't passing cache-key, so both matrix variants pushed to and pulled from the same slot: amd64-vllm-tensorizer.

Each matrix run, both variants pulled the most recently pushed cache. That cache's builder-base layer was built FROM the other cuda's base image, so the digest didn't match and BuildKit had to rebuild every layer from scratch. Once nvcc was invoked, sccache found everything in S3 (hence the 100% hit rate) — but fetching tens of thousands of .o files one-by-one from S3 still takes hours.

The fix

cache-key: ${{ matrix.tag-suffix }}

Yields per-variant cache refs:
- ${registry}/buildcache:amd64-vllm-tensorizer-v0.21.0-cuda13.2.1-ubuntu24.04
- ${registry}/buildcache:amd64-vllm-tensorizer-v0.21.0-cuda12.9.1-ubuntu24.04

After merge, the first cuda12.9 build is still slow (cold BuildKit slot), but subsequent commits that don't touch the builder stages will hit the per-variant slot and finish in minutes like cuda13.2 does today.

Eta0 · 2026-05-20T17:57:24Z

      image-name: vllm-tensorizer
      folder: vllm-tensorizer
      tag-suffix: ${{ matrix.tag-suffix }}
+      cache-key: ${{ matrix.tag-suffix }}


The cache key should be the CUDA/Ubuntu version; it shouldn't include the vLLM commit (which is included in the tag-suffix). You won't be building from multiple vLLM versions in a matrix, and if you go forward, you are more than likely not going back, so you would want to use what you can from the last vLLM commit's build's cache, and then go from there.

… commit

github-actions · 2026-05-20T19:48:04Z

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26179428631
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-7d94427-v0.21.0-cuda13.2.1-ubuntu24.04

github-actions · 2026-05-20T19:52:00Z

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26179428631
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-7d94427-v0.21.0-cuda12.9.1-ubuntu24.04

github-actions · 2026-05-20T20:16:07Z

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/26180903764
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-matrix-sccache-fix-45e7639-v0.21.0-cuda13.2.1-ubuntu24.04

ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant

7d94427

JustinPerlman requested a review from Eta0 May 20, 2026 17:53

JustinPerlman self-assigned this May 20, 2026

JustinPerlman requested a review from a team as a code owner May 20, 2026 17:53

Eta0 requested changes May 20, 2026

View reviewed changes

ci(vllm-tensorizer): Scope cache key to CUDA/Ubuntu version, not vLLM…

45e7639

… commit

Eta0 approved these changes May 20, 2026

View reviewed changes

JustinPerlman merged commit b2d0717 into main May 20, 2026
8 checks passed

JustinPerlman deleted the jperlman/matrix-sccache-fix branch May 20, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant#164

ci(vllm-tensorizer): Separate BuildKit cache slots per matrix variant#164
JustinPerlman merged 2 commits into
mainfrom
jperlman/matrix-sccache-fix

JustinPerlman commented May 20, 2026

Uh oh!

Eta0 May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JustinPerlman commented May 20, 2026

Summary

The problem

The fix

Uh oh!

Eta0 May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants