-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[None][fix] Add missing allow_partial_loading param to CuteDSL and ConfigurableMoE load_weights
#12761
opened Apr 4, 2026 by
qiaoxj07
Loading…
2 tasks done
feat: add Prometheus metrics collection for gRPC server mode
Community want to contribute
PRs initiated from Community
#12760
opened Apr 4, 2026 by
ConnorLi96
Loading…
1 task
[#12477][feat] AutoDeploy: Mistral4 Eagle Support
#12759
opened Apr 4, 2026 by
govind-ramnarayan
•
Draft
1 task
[https://nvbugs/5997534][fix] AutoDeploy: Skip Eagle3 One Model Test on pre-Hopper
#12757
opened Apr 4, 2026 by
govind-ramnarayan
Loading…
1 task done
[None][fix] Draft KV cache should not allocate host memory
Community want to contribute
PRs initiated from Community
#12756
opened Apr 3, 2026 by
Shang-Pin
Loading…
1 task
feat: add standard gRPC health service for Kubernetes native probes
Community want to contribute
PRs initiated from Community
#12752
opened Apr 3, 2026 by
ConnorLi96
Loading…
1 task
Respect AutoDeploy trust_remote_code
Community want to contribute
PRs initiated from Community
#12751
opened Apr 3, 2026 by
jmecom
Loading…
[https://nvbugs/5940460][fix] Harden FP8 quant fusion matching after …
#12750
opened Apr 3, 2026 by
dhansen-nvidia
Loading…
1 task done
[#12699][feat] AutoDeploy: Support Piecewise CG for VLMs
#12749
opened Apr 3, 2026 by
nvchenghaoz
Loading…
[https://nvbugs/5911304][fix] Add URL validation and request hardening for media input loading
#12748
opened Apr 3, 2026 by
yibinl-nvidia
Loading…
1 task done
feat: support multiple model names in --served_model_name
Community want to contribute
PRs initiated from Community
#12746
opened Apr 3, 2026 by
nvyutwu
Loading…
5 tasks
[feat] AutoDeploy: Support torch-cudagraph for Eagle
#12745
opened Apr 3, 2026 by
govind-ramnarayan
•
Draft
1 task
[None][feat] AutoDeploy: Gemma4 multimodal support with custom attention mask
#12744
opened Apr 3, 2026 by
bmarimuthu-nv
•
Draft
3 of 4 tasks
[None][feat] AutoDeploy: Custom attn mask support for attention backends
#12742
opened Apr 3, 2026 by
bmarimuthu-nv
•
Draft
1 task done
[None][feat] Optimize qwen3.5 decode delta kernel
#12740
opened Apr 3, 2026 by
nv-guomingz
Loading…
1 task done
[None][feat] retune causalConv1d fwd dispatch for varlen and short sequences
#12739
opened Apr 3, 2026 by
nv-guomingz
Loading…
1 task done
[None][feat] Add bf16 trtllm-gen moe support through flashinfer.
#12738
opened Apr 3, 2026 by
nv-guomingz
Loading…
1 task done
[None][feat] reuse triton slicing kernel for GDN prefill transpose
#12737
opened Apr 3, 2026 by
nv-guomingz
Loading…
1 task done
[None][feat] fix mamba metadata prefill bubble in chunked prefill serving
#12736
opened Apr 3, 2026 by
nv-guomingz
Loading…
1 task done
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.