Skip to content

Commit 0fa33cb

Browse files
unamedkrclaude
andcommitted
docs(tier): Qwen3.6-27B investigation log — narrow remaining scope
Updated tier benchmark doc with what we validated and ruled out in this session, narrowing the unknown surface for next session: VALIDATED (no issue here): - Tensor names match A3B exactly - All shapes consistent with config (qkv 10240 = 16×128×2 + 48×128) - GGUF metadata correct (rope, eps, conv_kernel) - Layer pattern matches llama-debug (16 attn at L3,7,...,63) - ssm_a values sensible (-0.34 to -0.004 at L0) - is_moe correctly false (num_experts=0) - QK-norm not the issue (TQ_FORCE_QK_NORM=1 didn't help) - DN_LLAMACPP_PORT correctly auto-enabled REMAINING (next session needs): - Element-level sub-op trace at L0 pos=0 - Compare ours vs llama-debug per named tensor - First materially divergent sub-op = the bug QUICK PATHS TO TRY FIRST: - Q4_K dequant validation on hidden_dim 5120 (unusual size) - TQ_DELTANET_FP32=1 bypass to localize quant vs forward bug - Smaller quant (UD-IQ2_M) to test Tier 3 persistence Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1a3807f commit 0fa33cb

1 file changed

Lines changed: 20 additions & 0 deletions

File tree

docs/tier_benchmark_2026_04_25.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,26 @@ Max rel_diff: **3.87** (L3). For comparison:
6464
2. MoE-only code paths firing on dense model (load message "Fused MoE kernels ready" appeared on dense model — likely harmless but worth confirming).
6565
3. Tensor layout interpretation differences in 27B GGUF.
6666

67+
**Validated this session (2026-04-25)**:
68+
- Tensor names match A3B exactly (`attn_qkv`, `attn_gate`, `ssm_a/alpha/beta/dt/norm/out/conv1d`)
69+
- All tensor shapes consistent with config (qkv split = 16×128×2 + 48×128 = 10240 ✓)
70+
- GGUF metadata matches expected (rope sections [11,11,10,0], eps 1e-7, ssm.conv_kernel=4)
71+
- Layer pattern (16 attn at L3,7,11,...,63) matches llama-debug
72+
- ssm_a values in sensible range (-0.34 to -0.004 at L0, similar A3B pattern)
73+
- `is_moe = (num_experts > 0) = false` ✓ (MoE code paths gated correctly)
74+
- TQ_FORCE_QK_NORM=1 does NOT help (max rel_diff stays at 3.87 — not a QK-norm issue)
75+
- TQ_DN_LLAMACPP_PORT auto-enabled (DeltaNet detect via delta_n_heads > 0) ✓
76+
77+
**Remaining investigation** (next session, multi-hour):
78+
- Element-level sub-op trace at L0 pos=0: qkv-proj output, conv1d output, Q/K/V split, L2-norm, decay, delta, state update, output, ssm_norm
79+
- Compare each sub-op against llama-debug's `cb(...)` named tensors at L0
80+
- First materially divergent sub-op identifies the bug
81+
82+
**Quick paths to try first** (not done this session):
83+
- Q4_K dequant validation — 27B has hidden_dim 5120; Q4_K block size is 256 (5120/256=20 blocks). For previous models hidden 2048/3072/4096 (8/12/16 blocks). 5120 is unusual — verify dequant produces correct values.
84+
- TQ_DELTANET_FP32=1 to bypass DN quant entirely and compare
85+
- Run on smaller quant (UD-IQ2_M 10.1 GB) to see if Tier 3 persists across bit-widths
86+
6787
**Memory**: at 16.8 GB Q4_K_M model size on 16 GB RAM Mac, evaluation is impractical (constant swap, ~0.3 tok/s, -n 30 test took 15+ min). For users wanting to test 27B, smaller quants are available:
6888
- UD-IQ2_M: 10.1 GB (recommended for 16 GB RAM)
6989
- UD-Q2_K_XL: 11.0 GB

0 commit comments

Comments
 (0)