vllm: 0.16.0 -> 0.19.0#498040
Conversation
|
opentelemetry-api updated because opentelemetry-semantic-conventions-ai wants newer versions |
There was a problem hiding this comment.
The PR's base branch is set to master, but this PR causes 4377 rebuilds.
It is therefore considered a mass rebuild.
Please change the base branch to the right base branch for your changes (probably staging).
|
Oh, maybe I should split opentelemetry changes into another PR. Relaxing dependencies for opentelemetry-semantic-conventions-ai doesn't help, as it fails at runtime otherwise. |
d2d5fff to
5b12a24
Compare
|
For opentelemetry changes, might switch to depend on #489017 instead |
|
Thanks for the PR. However, I would prefer if its scope would be narrower. |
|
I agree, I only aggregated all the required changes for now, and will split them into PRs |
Thanks! |
5b12a24 to
224d2d6
Compare
|
amd-quark pr: #498069 |
1158b57 to
45815df
Compare
|
as I recall I was hitting tool call warnings and errors and needed to upgrade to package in order to resolve those. I think upgrading in this PR is a good idea |
|
So, this is how I got it to work with CUDA for now. PR briefly explains why I think it's safe enough to drop those deps for now. At the same time though, I must say that 0.19 seems to work a bit worse for me, startup times are longer despite already pre-compiled/filled caches. Also sometimes during startup some background worker seems to die after waiting for something else. But I have no reason to believe this has anything to do with our PR here, nothing in the logs points to it. Rather some new memory profiling features or similar. Also sometimes doesn't react well to clean SIGTERM in my llama-swap setup. So I'll have to dig around a bit deeper to see, for my personal setup and use. |
Ports the vLLM 0.19.0 package from NixOS/nixpkgs#498040 into the overlay, along with new dependencies (kaldi-native-fbank, opentelemetry-semantic-conventions-ai) and bumped opentelemetry packages. Key changes: - vLLM 0.19.0 with Qwen3.5, Gemma 4, and many other new model archs - triton-kernels v3.6.0 - Cap MAX_JOBS to 4 for CUDA compilation (nvcc OOMs at -j=20 on Spark) - Remove flashinfer-cubin and nvidia-cudnn-frontend from runtime deps (not packaged, optional) - New packages: kaldi-native-fbank, opentelemetry-semantic-conventions-ai Tested: builds and runs on DGX Spark (aarch64-linux, CUDA 13.2, SM 12.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
For mistral-common, I've opened #512667. vllm 0.19.0 depends on mistral-common>=1.10.0, and It looks like vllm 0.20.0 is about to be release which depends on mistral-common>=1.11.0. @d-goldin @CertainLach might I suggest using pythonRemoveDeps = [
"flashinfer-cubin"
"nvidia-cudnn-frontend"
# QuACK and Cutlass DSL seem to be added only for FA4
# which in our case handles its own deps
"nvidia-cutlass-dsl"
"quack-kernels"
]; |
6a06753 to
f19541c
Compare
|
Oof. The HEAD of this branch is correct for vllm deployment, and I have also included mistral-common as a parent for this request, but github doesn't seem to like to see this pointed to staging instead of master, and we can't point it to master since we have opentelemetry changes unmerged. Thus lots of unnecessary commits Should I point it to master for now?.. It should be pointed to master anyway, the only reason it is being pointed to staging is that we have opentelemetry changes here, and those do require staging What a mess... This PR being pointed to master is not correct, but since it is stacked, pointing it to staging makes github very not happy and it might call lots of reviewers here. Should I mark it as a draft until we sort out the situation with python opentelemetry packages? |
f19541c to
d5c9bc9
Compare
8dc0790 to
dc73532
Compare
Diff: mistralai/mistral-common@v1.8.8...v1.11.0 1.11.0 is published on pypi, but for whatever reason is not listed in github releases.
All of the opentelemetry-instrumentation-requests tests are hardcoding requests version, and since requests package in nixpkgs is newer than expected by the package - all of the tests fail. This should be fixed upstream, I do not see a good way to patch that at nixpkgs side.
- Bumping triton to a newer version, the older one didn't work for me with 0.17 - Drops quarck-kernels and cuteDSL from dependencies. From what I can tell those are only needed for FA4 and would also require some nvidia blobs. We are at FA2 right now, so this shouldn't remove any functionality that was present before - Adding NCCL to wrapper args, for better UX
Vllm also wants bash for aiter
dc73532 to
ba2de00
Compare
Will try to switch to it. Not sure why I did it the other way around, maybe it didn't work with the structure they have with the Edit: Tried, seems to work fine. So yeah, cleaner to do it that way. Will add a commit later. |
It seems like vllm 0.16.0 already has this dependency. I have the same issue on nixos-unstable. |
|
Is there anything I can help with to speed-up the PR merge? |
This PR merging is blocked on opentelemetry python packages update anyway, and I have no idea what to do with them |

Diff: vllm-project/vllm@releases/v0.16.0...v0.17.0
Changelog: https://github.com/vllm-project/vllm/releases/tag/v0.17.0
Things done
passthru.tests.nixpkgs-reviewon this PR. See nixpkgs-review usage../result/bin/.Only tested on rocm, strix halo