Skip to content

Commit 14a1bb3

Browse files
Oseltamivirclaude
andcommitted
Point DSv4 B200/B300 TRT (non-MTP) at the SWA-scratch-fix image
Bump dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d (TRT-LLM feat/deepseek_v4 @ 084cf2ba + kv_cache_manager_v2 fix). This resolves the engine crash on attention-DP context/generation reverts at high concurrency (the b300 8k1k conc>=512 "LLM is shutting down" hang). The -mtp variants stay on feat-deepseek_v4-9aa3715. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1b0afeb commit 14a1bb3

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1801,7 +1801,7 @@ dsv4-fp4-b200-vllm-agentic:
18011801
- { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
18021802

18031803
dsv4-fp4-b200-trt:
1804-
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6
1804+
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d
18051805
model: deepseek-ai/DeepSeek-V4-Pro
18061806
model-prefix: dsv4
18071807
runner: b200-dsv4
@@ -3049,7 +3049,7 @@ dsv4-fp4-b300-vllm-agentic:
30493049
- { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [128, 256, 512] }
30503050

30513051
dsv4-fp4-b300-trt:
3052-
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6
3052+
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d
30533053
model: deepseek-ai/DeepSeek-V4-Pro
30543054
model-prefix: dsv4
30553055
runner: b300

perf-changelog.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3372,7 +3372,7 @@
33723372
- dsv4-fp4-b200-trt
33733373
- dsv4-fp4-b300-trt
33743374
description:
3375-
- "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6"
3375+
- "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d (TRT-LLM feat/deepseek_v4 @ 084cf2ba + kv_cache_manager_v2 fix: free SWA scratch slots on shrink instead of asserting, which crashed the engine on attention-DP context/generation reverts at high concurrency, e.g. b300 8k1k conc>=512)"
33763376
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636
33773377

33783378
- config-keys:

0 commit comments

Comments
 (0)