Skip to content

Commit 165d625

Browse files
KLAUD_DEBUG: B300 is sm_103 (not sm_120) + cross-link upstream issue (#1479)
Two corrections to §4 (B300 sglang v0.5.12 regressions): 1. **Arch fix.** B300 (Blackwell Ultra datacenter) is compute capability 10.3 / `sm_103`, NOT `sm_120`. sm_120 is for consumer Blackwell (RTX 50 series / GB20x dies). This had propagated through agent diagnoses and into upstream issue sgl-project/sglang#25563 (already corrected there). 2. **§4c reframe.** sm_103 is *nominally inside* the asserted range `sm_100 <= arch <= sm_110f` (since 100 <= 103 <= 110), so the assertion failure is more interesting than "outside the range" — best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (sm_100, sm_100f, sm_110, sm_110f) and sm_103/sm_103a isn't in that list. Also cross-linked sgl-project/sglang#25563 under §4b (filed earlier this session for the EAGLE draft graph capture crash on GLM-5-NVFP4 at bs=128 — same B300 v0.5.12 regression family). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0c4bf82 commit 165d625

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

KLAUD_DEBUG.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Seen on: #1460 (dsv4-fp8-h200-sglang+mtp).
6666

6767
## 4. Upstream sglang v0.5.12 B300 regressions
6868

69-
Two distinct upstream regressions on NVIDIA B300 (Blackwell, `sm_120`) shipped in `lmsysorg/sglang:v0.5.12-cu130`:
69+
Three distinct upstream regressions on NVIDIA B300 (Blackwell Ultra, `sm_103` — compute capability 10.3) shipped in `lmsysorg/sglang:v0.5.12-cu130`. (sm_120 is for *consumer* Blackwell / RTX 50 series, not B300 — don't propagate that.)
7070

7171
### 4a. DeepGemm TMA-descriptor crash (GLM-5-FP8)
7272
**Symptom:** CUDA graph capture aborts with `CUDA_ERROR_ILLEGAL_ADDRESS (700)` at `/deepgemm/csrc/.../runtime_utils.hpp:143` on the **first batch size** for **every TP rank**. Server never serves a prompt.
@@ -86,17 +86,17 @@ Filed upstream: sgl-project/sglang#25551. Seen on #1421.
8686
2. Comment out the MTP/EAGLE scenarios on B300 in the recipe.
8787
3. Pin to v0.5.11-cu130.
8888

89-
Seen on #1420.
89+
Filed upstream: sgl-project/sglang#25563. Seen on #1420.
9090

9191
### 4c. flash_attn SM-arch assertion (qwen3.5-bf16)
9292
**Symptom:** All 4 TP workers AssertionError on first forward pass:
9393
```
9494
File "/opt/venv/.../sglang/srt/layers/attention/flashattention_backend.py:..."
9595
assert sm_100 <= arch <= sm_110f
9696
```
97-
B300 is `sm_120`, outside the asserted range. Server never becomes healthy; warmup times out at 600s.
97+
B300 is `sm_103` (compute capability 10.3, Blackwell Ultra) — which is *nominally inside* the asserted `sm_100..sm_110f` range, yet the assertion still fires. Best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (e.g. `sm_100`, `sm_100f`, `sm_110`, `sm_110f`) and `sm_103` / `sm_103a` isn't in that explicit list. Server never becomes healthy; warmup times out at 600s.
9898

99-
**Fix:** Needs sglang image with flash_attn supporting `sm_120` — no local workaround. Pin to v0.5.11-cu130 in the meantime.
99+
**Fix:** Needs an sglang image with `flash_attn` that recognises `sm_103` / `sm_103a` — no local workaround. Pin to `v0.5.11-cu130` in the meantime.
100100

101101
Seen on #1422.
102102

0 commit comments

Comments
 (0)