Skip to content

feat(vllm-tensorizer): Bump vLLM to v0.20.2 on CUDA 13.2 / Ubuntu 24.04#160

Open
JustinPerlman wants to merge 12 commits into
mainfrom
jperlman/vllm0.20.2
Open

feat(vllm-tensorizer): Bump vLLM to v0.20.2 on CUDA 13.2 / Ubuntu 24.04#160
JustinPerlman wants to merge 12 commits into
mainfrom
jperlman/vllm0.20.2

Conversation

@JustinPerlman
Copy link
Copy Markdown
Contributor

@JustinPerlman JustinPerlman commented May 12, 2026

Summary

  • Bump vLLM to v0.20.2
  • Upgrade base images from CUDA 12.9.1 / Ubuntu 22.04 to CUDA 13.2.1 / Ubuntu 24.04 (both builder and final)

Ubuntu 24.04 compatibility fixes

  • Remove python3-pip from apt in builder-base and add rm -f /usr/lib/python3.*/EXTERNALLY-MANAGED before pip bootstrap — on Ubuntu 24.04, apt-installed pip has no RECORD file and blocks pip self-upgrade
  • Purge python3-jwt in the final base stage before pip installs — same root cause: Debian-managed PyJWT has no RECORD file and blocks vLLM's dependency resolution
  • Fix cuda-python version spec from ~=${CUDA_VERSION} to ~=${CUDA_VERSION%.*} — patch-level CUDA versions (e.g. 13.2.1) don't match available cuda-python releases; strip to major.minor
  • Install wheel package in lmcache-builder and restore it to builder-base pip install

Relevant information: vllm-project/vllm@6c964bd

@JustinPerlman JustinPerlman self-assigned this May 12, 2026
@JustinPerlman JustinPerlman requested a review from a team as a code owner May 12, 2026 19:30
@github-actions
Copy link
Copy Markdown

@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/25751418629
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:jperlman-vllm0.20.2-cc65ad3-v0.20.2

@JustinPerlman JustinPerlman requested review from abatilo and ritazh May 12, 2026 19:47
Copy link
Copy Markdown
Contributor

@abatilo abatilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me but I'd feel better if @Eta0 could take a peek

@JustinPerlman
Copy link
Copy Markdown
Contributor Author

JustinPerlman commented May 12, 2026

This seems reasonable to me but I'd feel better if @Eta0 could take a peek

Fair enough lol

@JustinPerlman JustinPerlman requested a review from Eta0 May 12, 2026 19:51
@alexeldeib
Copy link
Copy Markdown
Contributor

Pure 13.2, no matrix with 12.9? 🫣I would really like having both options…if it’s a giant pain on vllm side it’s fine, but I think you then need to validate this actually works on b40/rtxp6000 with latest supported/installed drivers cw ships

@alexeldeib
Copy link
Copy Markdown
Contributor

I am still not aware of a cuda + driver combo that has decent support and works as expected, but haven’t followed too closely lately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants