-
-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ROCm][CI] Fix fused RMS norm FP8 quant test on MI250 (gfx90a)
rocm
Related to AMD ROCm
#39047
opened Apr 5, 2026 by
AndreasKaratzas
•
Draft
Support FP8 block-quant TP when intermediate size per rank is not divisible by block_n
#39046
opened Apr 5, 2026 by
wzhao18
Loading…
5 tasks
fix(reasoning): prevent streaming end-token desync in base and other parsers
deepseek
Related to DeepSeek models
#39044
opened Apr 5, 2026 by
kaiisfree
Loading…
3 tasks
[SM120] Add b12x MoE and dense FP4 GEMM backends for Blackwell (+26% decode)
nvidia
#39042
opened Apr 5, 2026 by
voipmonitor
Loading…
5 of 6 tasks
[QeRL] Reduce memory usage for wide MoEs via inplace quantization
#39041
opened Apr 5, 2026 by
kylesayrs
Loading…
[Perf] Enable custom allreduce on PCIe-only multi-GPU topologies
#39040
opened Apr 5, 2026 by
voipmonitor
Loading…
3 of 4 tasks
[Bugfix] Improve DCP/PCP error messages with actionable backend guidance
bug
Something isn't working
v1
#39036
opened Apr 5, 2026 by
Pawansingh3889
Loading…
[vLLM IR] Cache the fx_replacement to avoid re-tracing the same impl
#39034
opened Apr 5, 2026 by
gcanlin
Loading…
5 tasks
[Bugfix][MoE] Fix hardcoded SharedExperts output buffer size for DBO ubatches
bug
Something isn't working
#39033
opened Apr 5, 2026 by
Gregory-Pereira
Loading…
NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config
#39032
opened Apr 5, 2026 by
netanel-haber
•
Draft
[Core] Per-group BlockPool for hybrid Mamba/attention models
v1
#39031
opened Apr 5, 2026 by
arbi-dev
Loading…
4 of 5 tasks
nano_nemotron_vl: fix tensor device mismatch exception when video profiling
ready
ONLY add when PR is ready to merge/full CI is needed
#39029
opened Apr 5, 2026 by
netanel-haber
Loading…
Gemma4 multi-turn, tool calling, and reasoning fixes
documentation
Improvements or additions to documentation
frontend
tool-calling
Add structure to Related to CPU backends
documentation
Improvements or additions to documentation
intel-gpu
Related to Intel GPU
nvidia
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
rocm
Related to AMD ROCm
requirements/ directory
ci/build
cpu
#39024
opened Apr 5, 2026 by
hmellor
Loading…
[MoE][Fix] Fix DeepEP HT hardcoded per_act_token_quant=False
#39023
opened Apr 5, 2026 by
thc1006
Loading…
2 tasks
[Perf] Remove per-step KV offload touch, touch once at request_finished
kv-connector
v1
#39021
opened Apr 5, 2026 by
kfirtoledo
Loading…
2 tasks
fix(attention): fix Gemma4 support for old gpus like Turing
v1
#39018
opened Apr 5, 2026 by
lisp19
Loading…
[MoE] BF16 Triton MoE Perf regression - restore low latency path
ready
ONLY add when PR is ready to merge/full CI is needed
#39016
opened Apr 5, 2026 by
milesial
Loading…
[vLLM IR] rework gemma_rms_norm
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39014
opened Apr 5, 2026 by
ZJY0516
Loading…
5 tasks
Refactor move experts
ci/build
documentation
Improvements or additions to documentation
nvidia
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
#39013
opened Apr 5, 2026 by
Jackmin801
Loading…
1 task
Fix async spec decode TOCTOU race and underflow on aborted requests
v1
#39012
opened Apr 5, 2026 by
gagandhakrey
Loading…
Update MusicFlamingo and add AudioFlamingoNext
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
new-model
Requests to new models
#39011
opened Apr 5, 2026 by
lashahub
Loading…
4 of 5 tasks
[MoE] Move remaining PrepareAndFinalize to prepare finalize folder
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39009
opened Apr 5, 2026 by
Jackmin801
Loading…
1 task
[MoE] Move GPT OSS Triton kernel experts into fused_moe/experts/
documentation
Improvements or additions to documentation
gpt-oss
Related to GPT-OSS models
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39007
opened Apr 5, 2026 by
Jackmin801
Loading…
3 tasks done
Previous Next
ProTip!
no:milestone will show everything without a milestone.