Skip to content

Commit 7b9843d

Browse files
authored
[NVIDIA][GB300] update DSR1 FP8 GB300 TRTLLM image to latest (#1767)
* dsr1-fp8-gb300-dynamo-trt: pin image to tensorrtllm-runtime:1.3.0-dev.1-cuda13 * dsr1-fp8-gb300-dynamo-trt: pin image to tensorrtllm-runtime:1.3.0-dev.1-cuda13, fix gsm8k accuracy * perf changelog update * change runner * perf change log * fix perf change log * Enhance gsm8k accuracy fix description in changelog Updated description for gsm8k accuracy fix to include config updates. * Refine accuracy fix description for DSR1 TRTLLM Updated description for gsm8k accuracy fix and config updates.
1 parent d99c824 commit 7b9843d

2 files changed

Lines changed: 12 additions & 2 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6587,10 +6587,10 @@ dsr1-fp4-gb300-dynamo-sglang:
65876587
dp-attn: true
65886588

65896589
dsr1-fp8-gb300-dynamo-trt:
6590-
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
6590+
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-dev.1-cuda13
65916591
model: deepseek-ai/DeepSeek-R1-0528
65926592
model-prefix: dsr1
6593-
runner: gb300
6593+
runner: gb300-nv
65946594
precision: fp8
65956595
framework: dynamo-trt
65966596
multinode: true

perf-changelog.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3909,3 +3909,13 @@
39093909
description:
39103910
- "Use the Marlin MoE backend for MiniMax-M3 B200/B300 TP-only vLLM configurations by adding --moe-backend marlin when expert parallelism is disabled."
39113911
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1809
3912+
3913+
- config-keys:
3914+
- dsr1-fp8-gb300-dynamo-trt
3915+
description:
3916+
- "Fix gsm8k accuracy at 88% instead of 95% for a single point."
3917+
- "In previous submission, there was an numeric issue causing accuracy degradation and performance anomaly in some MTP points at certain concurrency."
3918+
- "This issue is now fixed in the latest TRTLLM release."
3919+
- "Also update all configs for DSR1 TRTLLM FP8 to reflect latest released image usage"
3920+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1767
3921+

0 commit comments

Comments
 (0)