KLAUD_DEBUG: B300 is sm_103 (not sm_120) + cross-link upstream issue (#1479)

functionstackx · claude · web-flow · commit 165d6252dc16 · 2026-05-18T00:16:09.000-04:00
Two corrections to §4 (B300 sglang v0.5.12 regressions): 1. **Arch fix.** B300 (Blackwell Ultra datacenter) is compute capability 10.3 / `sm_103`, NOT `sm_120`. sm_120 is for consumer Blackwell (RTX 50 series / GB20x dies). This had propagated through agent diagnoses and into upstream issue sgl-project/sglang#25563 (already corrected there). 2. **§4c reframe.** sm_103 is *nominally inside* the asserted range `sm_100 <= arch <= sm_110f` (since 100 <= 103 <= 110), so the assertion failure is more interesting than "outside the range" — best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (sm_100, sm_100f, sm_110, sm_110f) and sm_103/sm_103a isn't in that list. Also cross-linked sgl-project/sglang#25563 under §4b (filed earlier this session for the EAGLE draft graph capture crash on GLM-5-NVFP4 at bs=128 — same B300 v0.5.12 regression family). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/KLAUD_DEBUG.md b/KLAUD_DEBUG.md
@@ -66,7 +66,7 @@ Seen on: #1460 (dsv4-fp8-h200-sglang+mtp).
 
 ## 4. Upstream sglang v0.5.12 B300 regressions
 
-Two distinct upstream regressions on NVIDIA B300 (Blackwell, `sm_120`) shipped in `lmsysorg/sglang:v0.5.12-cu130`:
+Three distinct upstream regressions on NVIDIA B300 (Blackwell Ultra, `sm_103` — compute capability 10.3) shipped in `lmsysorg/sglang:v0.5.12-cu130`. (sm_120 is for *consumer* Blackwell / RTX 50 series, not B300 — don't propagate that.)
 
 ### 4a. DeepGemm TMA-descriptor crash (GLM-5-FP8)
 **Symptom:** CUDA graph capture aborts with `CUDA_ERROR_ILLEGAL_ADDRESS (700)` at `/deepgemm/csrc/.../runtime_utils.hpp:143` on the **first batch size** for **every TP rank**. Server never serves a prompt.
@@ -86,17 +86,17 @@ Filed upstream: sgl-project/sglang#25551. Seen on #1421.
 2. Comment out the MTP/EAGLE scenarios on B300 in the recipe.
 3. Pin to v0.5.11-cu130.
 
-Seen on #1420.
+Filed upstream: sgl-project/sglang#25563. Seen on #1420.
 
 ### 4c. flash_attn SM-arch assertion (qwen3.5-bf16)
 **Symptom:** All 4 TP workers AssertionError on first forward pass:
 ```
 File "/opt/venv/.../sglang/srt/layers/attention/flashattention_backend.py:..."
   assert sm_100 <= arch <= sm_110f
 ```
-B300 is `sm_120`, outside the asserted range. Server never becomes healthy; warmup times out at 600s.
+B300 is `sm_103` (compute capability 10.3, Blackwell Ultra) — which is *nominally inside* the asserted `sm_100..sm_110f` range, yet the assertion still fires. Best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (e.g. `sm_100`, `sm_100f`, `sm_110`, `sm_110f`) and `sm_103` / `sm_103a` isn't in that explicit list. Server never becomes healthy; warmup times out at 600s.
 
-**Fix:** Needs sglang image with flash_attn supporting `sm_120` — no local workaround. Pin to v0.5.11-cu130 in the meantime.
+**Fix:** Needs an sglang image with `flash_attn` that recognises `sm_103` / `sm_103a` — no local workaround. Pin to `v0.5.11-cu130` in the meantime.
 
 Seen on #1422.