Skip to content

vllm: 0.16.0 -> 0.19.0#498040

Open
CertainLach wants to merge 8 commits into
NixOS:masterfrom
CertainLach:push-lklxouywkrnv
Open

vllm: 0.16.0 -> 0.19.0#498040
CertainLach wants to merge 8 commits into
NixOS:masterfrom
CertainLach:push-lklxouywkrnv

Conversation

@CertainLach
Copy link
Copy Markdown
Member

Diff: vllm-project/vllm@releases/v0.16.0...v0.17.0
Changelog: https://github.com/vllm-project/vllm/releases/tag/v0.17.0

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Only tested on rocm, strix halo

@CertainLach
Copy link
Copy Markdown
Member Author

opentelemetry-api updated because opentelemetry-semantic-conventions-ai wants newer versions

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR's base branch is set to master, but this PR causes 4377 rebuilds.
It is therefore considered a mass rebuild.
Please change the base branch to the right base branch for your changes (probably staging).

@CertainLach
Copy link
Copy Markdown
Member Author

Oh, maybe I should split opentelemetry changes into another PR. Relaxing dependencies for opentelemetry-semantic-conventions-ai doesn't help, as it fails at runtime otherwise.

@nixpkgs-ci nixpkgs-ci Bot added 8.has: package (new) This PR adds a new package 8.has: package (update) This PR updates a package to a newer version 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-linux: 2501-5000 This PR causes many rebuilds on Linux and should target the staging branches. 10.rebuild-darwin: 1001-2500 This PR causes many rebuilds on Darwin and should most likely target the staging branches. labels Mar 8, 2026
@CertainLach CertainLach changed the base branch from master to staging March 8, 2026 22:05
@nixpkgs-ci nixpkgs-ci Bot closed this Mar 8, 2026
@nixpkgs-ci nixpkgs-ci Bot reopened this Mar 8, 2026
@CertainLach
Copy link
Copy Markdown
Member Author

For opentelemetry changes, might switch to depend on #489017 instead

@GaetanLepage
Copy link
Copy Markdown
Contributor

Thanks for the PR. However, I would prefer if its scope would be narrower.

@nixpkgs-ci nixpkgs-ci Bot added 10.rebuild-darwin: 2501-5000 This PR causes many rebuilds on Darwin and should target the staging branches. 6.topic: python Python is a high-level, general-purpose programming language. and removed 10.rebuild-darwin: 1001-2500 This PR causes many rebuilds on Darwin and should most likely target the staging branches. labels Mar 8, 2026
@CertainLach
Copy link
Copy Markdown
Member Author

I agree, I only aggregated all the required changes for now, and will split them into PRs

@GaetanLepage
Copy link
Copy Markdown
Contributor

I agree, I only aggregated all the required changes for now, and will split them into PRs

Thanks!

@CertainLach
Copy link
Copy Markdown
Member Author

CertainLach commented Mar 8, 2026

Dependency graph for this PR:

image

Will create a separate PR for amd-quark once #498053 and #498052 are merged

Opentelemetry PRs: #498050 #498051

Also kaldi-native-fbank: #498056

@CertainLach
Copy link
Copy Markdown
Member Author

amd-quark pr: #498069

github-actions[bot]

This comment was marked as outdated.

@pkieltyka
Copy link
Copy Markdown

as I recall I was hitting tool call warnings and errors and needed to upgrade to package in order to resolve those.

I think upgrading in this PR is a good idea

@d-goldin
Copy link
Copy Markdown
Contributor

d-goldin commented Apr 19, 2026

So, this is how I got it to work with CUDA for now. PR briefly explains why I think it's safe enough to drop those deps for now.

@CertainLach: CertainLach#3

At the same time though, I must say that 0.19 seems to work a bit worse for me, startup times are longer despite already pre-compiled/filled caches. Also sometimes during startup some background worker seems to die after waiting for something else. But I have no reason to believe this has anything to do with our PR here, nothing in the logs points to it. Rather some new memory profiling features or similar. Also sometimes doesn't react well to clean SIGTERM in my llama-swap setup.

So I'll have to dig around a bit deeper to see, for my personal setup and use.

graham33 pushed a commit to graham33/nixos-dgx-spark that referenced this pull request Apr 19, 2026
Ports the vLLM 0.19.0 package from NixOS/nixpkgs#498040 into the
overlay, along with new dependencies (kaldi-native-fbank,
opentelemetry-semantic-conventions-ai) and bumped opentelemetry packages.

Key changes:
- vLLM 0.19.0 with Qwen3.5, Gemma 4, and many other new model archs
- triton-kernels v3.6.0
- Cap MAX_JOBS to 4 for CUDA compilation (nvcc OOMs at -j=20 on Spark)
- Remove flashinfer-cubin and nvidia-cudnn-frontend from runtime deps
  (not packaged, optional)
- New packages: kaldi-native-fbank, opentelemetry-semantic-conventions-ai

Tested: builds and runs on DGX Spark (aarch64-linux, CUDA 13.2, SM 12.0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stefanboca
Copy link
Copy Markdown
Contributor

stefanboca commented Apr 23, 2026

For mistral-common, I've opened #512667. vllm 0.19.0 depends on mistral-common>=1.10.0, and It looks like vllm 0.20.0 is about to be release which depends on mistral-common>=1.11.0.

@d-goldin @CertainLach might I suggest using pythonRemoveDeps instead of patching the requirements files? see here. ex:

  pythonRemoveDeps = [
    "flashinfer-cubin"
    "nvidia-cudnn-frontend"

    # QuACK and Cutlass DSL seem to be added only for FA4
    # which in our case handles its own deps
    "nvidia-cutlass-dsl"
    "quack-kernels"
  ];

github-actions[bot]

This comment was marked as outdated.

@CertainLach
Copy link
Copy Markdown
Member Author

CertainLach commented Apr 24, 2026

Oof. The HEAD of this branch is correct for vllm deployment, and I have also included mistral-common as a parent for this request, but github doesn't seem to like to see this pointed to staging instead of master, and we can't point it to master since we have opentelemetry changes unmerged. Thus lots of unnecessary commits

Should I point it to master for now?.. It should be pointed to master anyway, the only reason it is being pointed to staging is that we have opentelemetry changes here, and those do require staging

What a mess...

This PR being pointed to master is not correct, but since it is stacked, pointing it to staging makes github very not happy and it might call lots of reviewers here. Should I mark it as a draft until we sort out the situation with python opentelemetry packages?

@CertainLach CertainLach changed the base branch from staging to master April 24, 2026 00:32
@nixpkgs-ci nixpkgs-ci Bot closed this Apr 24, 2026
@nixpkgs-ci nixpkgs-ci Bot reopened this Apr 24, 2026
@github-actions github-actions Bot dismissed their stale review April 24, 2026 00:32

Review dismissed automatically

@CertainLach CertainLach force-pushed the push-lklxouywkrnv branch 3 times, most recently from 8dc0790 to dc73532 Compare April 24, 2026 00:46
@nixpkgs-ci nixpkgs-ci Bot requested a review from bgamari April 24, 2026 01:02
stefanboca and others added 8 commits April 24, 2026 03:39
Diff: mistralai/mistral-common@v1.8.8...v1.11.0

1.11.0 is published on pypi, but for whatever reason is not listed in
github releases.
All of the opentelemetry-instrumentation-requests tests are hardcoding
requests version, and since requests package in nixpkgs is newer than
expected by the package - all of the tests fail. This should be fixed
upstream, I do not see a good way to patch that at nixpkgs side.
- Bumping triton to a newer version, the older one didn't
  work for me with 0.17
- Drops quarck-kernels and cuteDSL from dependencies.
  From what I can tell those are only needed for FA4
  and would also require some nvidia blobs. We are at FA2
  right now, so this shouldn't remove any functionality
  that was present before
- Adding NCCL to wrapper args, for better UX
Vllm also wants bash for aiter
@d-goldin
Copy link
Copy Markdown
Contributor

d-goldin commented Apr 25, 2026

@d-goldin @CertainLach might I suggest using pythonRemoveDeps instead of patching the requirements files? see here. ex:

  pythonRemoveDeps = [
    "flashinfer-cubin"
    "nvidia-cudnn-frontend"

    # QuACK and Cutlass DSL seem to be added only for FA4
    # which in our case handles its own deps
    "nvidia-cutlass-dsl"
    "quack-kernels"
  ];

Will try to switch to it. Not sure why I did it the other way around, maybe it didn't work with the structure they have with the requirements/* folder.

Edit: Tried, seems to work fine. So yeah, cleaner to do it that way. Will add a commit later.

@leo60228
Copy link
Copy Markdown
Member

leo60228 commented May 6, 2026

For mistral-common, I've opened #512667. vllm 0.19.0 depends on mistral-common>=1.10.0, and It looks like vllm 0.20.0 is about to be release which depends on mistral-common>=1.11.0.

It seems like vllm 0.16.0 already has this dependency. I have the same issue on nixos-unstable.

@peperunas
Copy link
Copy Markdown
Contributor

peperunas commented May 7, 2026

Is there anything I can help with to speed-up the PR merge?

@CertainLach
Copy link
Copy Markdown
Member Author

Is there anything I can help with to speed-up the PR merge?

This PR merging is blocked on opentelemetry python packages update anyway, and I have no idea what to do with them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: python Python is a high-level, general-purpose programming language. 8.has: package (new) This PR adds a new package 8.has: package (update) This PR updates a package to a newer version 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-darwin: 2501-5000 This PR causes many rebuilds on Darwin and should target the staging branches. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 2501-5000 This PR causes many rebuilds on Linux and should target the staging branches.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants