Skip to content

Commit 0c4bf82

Browse files
[Klaud Cold] Update qwen3.5-bf16-b200-sglang (+mtp) SGLang image to v0.5.12-cu130 (#1446)
* Update qwen3.5-bf16-b200-sglang (+mtp +agentic) SGLang image to v0.5.12-cu130 The recipe was pinned to lmsysorg/sglang:nightly-dev-20260216-d3bae71e — 86 days old. Bumps all three variants of the qwen3.5-bf16-b200-sglang recipe family to the stable v0.5.12-cu130 release tag (already used by other b200 sglang recipes on main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: fill in pr-link for #1446 in perf-changelog * Drop qwen3.5-bf16-b200-sglang-agentic from this PR Comment out the agentic-coding sibling recipe and remove it from the changelog entry. The agentic flow is blocked by an e2e-tests.yml / benchmark-tmpl.yml artifact-name mismatch (downloads agentic_* but uploads as bmk_agentic_*) and shouldn't be exercised here. PR now only bumps the non-agentic and -mtp variants. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c824b1a commit 0c4bf82

2 files changed

Lines changed: 25 additions & 20 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2052,7 +2052,7 @@ dsv4-fp4-b300-sglang-mtp:
20522052
- { tp: 4, ep: 1, conc-start: 4, conc-end: 32, spec-decoding: mtp }
20532053

20542054
qwen3.5-bf16-b200-sglang:
2055-
image: lmsysorg/sglang:nightly-dev-20260216-d3bae71e
2055+
image: lmsysorg/sglang:v0.5.12-cu130
20562056
model: Qwen/Qwen3.5-397B-A17B
20572057
model-prefix: qwen3.5
20582058
runner: b200
@@ -2071,7 +2071,7 @@ qwen3.5-bf16-b200-sglang:
20712071
- { tp: 8, ep: 1, conc-start: 4, conc-end: 64 }
20722072

20732073
qwen3.5-bf16-b200-sglang-mtp:
2074-
image: lmsysorg/sglang:nightly-dev-20260216-d3bae71e
2074+
image: lmsysorg/sglang:v0.5.12-cu130
20752075
model: Qwen/Qwen3.5-397B-A17B
20762076
model-prefix: qwen3.5
20772077
runner: b200
@@ -2089,24 +2089,22 @@ qwen3.5-bf16-b200-sglang-mtp:
20892089
search-space:
20902090
- { tp: 8, ep: 1, conc-start: 4, conc-end: 64, spec-decoding: mtp }
20912091

2092-
# Diverged from qwen3.5-bf16-b200-sglang (agentic-coding sibling). Metadata is
2093-
# identical to origin/main's qwen3.5-bf16-b200-sglang; the split exists because this
2094-
# PR adds an agentic-coding scenarios block that differs from main
2095-
# (either main had none or had a different conc/offload sweep).
2096-
# The original qwen3.5-bf16-b200-sglang entry stays byte-identical to origin/main.
2097-
qwen3.5-bf16-b200-sglang-agentic:
2098-
image: lmsysorg/sglang:nightly-dev-20260216-d3bae71e
2099-
model: Qwen/Qwen3.5-397B-A17B
2100-
model-prefix: qwen3.5
2101-
runner: b200
2102-
precision: bf16
2103-
framework: sglang
2104-
multinode: false
2105-
scenarios:
2106-
agentic-coding:
2107-
- duration: 1800
2108-
search-space:
2109-
- { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }
2092+
# agentic-coding sibling — temporarily disabled, blocked by e2e-tests.yml
2093+
# artifact-name mismatch (downloads `agentic_*` but benchmark-tmpl.yml uploads
2094+
# as `bmk_agentic_*`). Re-enable once that workflow is aligned.
2095+
# qwen3.5-bf16-b200-sglang-agentic:
2096+
# image: lmsysorg/sglang:v0.5.12-cu130
2097+
# model: Qwen/Qwen3.5-397B-A17B
2098+
# model-prefix: qwen3.5
2099+
# runner: b200
2100+
# precision: bf16
2101+
# framework: sglang
2102+
# multinode: false
2103+
# scenarios:
2104+
# agentic-coding:
2105+
# - duration: 1800
2106+
# search-space:
2107+
# - { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }
21102108

21112109
qwen3.5-fp8-b200-sglang:
21122110
image: lmsysorg/sglang:nightly-dev-20260422-de962f32

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2667,3 +2667,10 @@
26672667
- "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
26682668
- "Temporarily disable agentic-coding scenario (blocked by e2e-tests.yml artifact-name mismatch)"
26692669
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1415
2670+
2671+
- config-keys:
2672+
- qwen3.5-bf16-b200-sglang
2673+
- qwen3.5-bf16-b200-sglang-mtp
2674+
description:
2675+
- "Update SGLang image from nightly-dev-20260216-d3bae71e (86d old) to v0.5.12-cu130"
2676+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1446

0 commit comments

Comments
 (0)