-
Notifications
You must be signed in to change notification settings - Fork 186
Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) #1636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
4bcefb7
6b7558c
bd3c94c
f441f9f
4bc5592
1b0afeb
14a1bb3
242ab88
e23a541
6118a76
5adfeb3
ad529fb
c2381b7
b09619e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3354,3 +3354,10 @@ | |
| description: | ||
| - "Add MTP speculative-decoding sibling for dsv4-fp4-mi355x-vllm (model: deepseek-ai/DeepSeek-V4-Pro) on vllm/vllm-openai-rocm:v0.22.0, per vllm-project/vllm#43385" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1630 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b200-trt | ||
| - dsv4-fp4-b300-trt | ||
| description: | ||
| - "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX | ||
|
Check warning on line 3363 in perf-changelog.yaml
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The new Extended reasoning...What the bug is. The diff appends a new entry to Why this is a real issue. Every other recent entry in the same file follows the convention of using the real PR number — the five entries immediately above this one link to Impact. This does not affect the actual image bump or any sweep behavior — the runtime is unchanged. The damage is to the changelog's documentation/audit value: anyone trying to find the originating PR for these two config-key changes from the changelog hits a dead 404, and any tooling that parses Why existing checks didn't prevent it. There appears to be no schema validation that rejects Fix. Replace the placeholder with the real PR number: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636Step-by-step proof.
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any official nvidia RC that works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image is from dsv4 branch: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/deepseek_v4
Main dsv4 failing DPA: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26786937394