docs(tier): Qwen3.6-27B element-level diff — outlier channel pattern

unamedkr · claude · unamedkr · commit 287336450816 · 2026-04-25T23:19:26.000+09:00
Continued investigation this session:

VALIDATED additional ruled-outs:
  TQ_DELTANET_FP32=1 → same Tier 3, max 3.87
  → Quantization NOT the cause
  → Forward pass arithmetic bug confirmed

ELEMENT-LEVEL L0 diff (BOS-aligned, both engines prefilled BOS+Hello):
  pos 0 (BOS), first 3 elements:
    ours:  [-0.055,  0.355, -0.790]
    llama: [-0.110, -0.039,  0.036]
    elem 0: 2× magnitude, same sign
    elem 1: SIGN FLIP + 9× magnitude
    elem 2: SIGN FLIP + 22× magnitude

PATTERN: outlier channels — specific dimensions blown up while
overall sum stays manageable (23.5 vs 6.8). Classic signature of:
  - Mis-aligned norm weight (boundary issue at hidden=5120?)
  - Missing embed_scale = sqrt(hidden_dim)
  - QKV/conv1d channel split shifted by some offset

Updated docs/tier_benchmark_2026_04_25.md with concrete next
investigation step: dump first 20 elements of each named tensor
at L0 (post_embed, attn_norm_out, qkv_proj, conv1d, q/k/v split,
l2norm, decay, delta, state, output, ssm_norm). First materially
divergent named tensor identifies the bug location.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tier_benchmark_2026_04_25.md b/docs/tier_benchmark_2026_04_25.md
@@ -81,9 +81,26 @@ Max rel_diff: **3.87** (L3). For comparison:
 
 **Quick paths to try first** (not done this session):
 - Q4_K dequant validation — 27B has hidden_dim 5120; Q4_K block size is 256 (5120/256=20 blocks). For previous models hidden 2048/3072/4096 (8/12/16 blocks). 5120 is unusual — verify dequant produces correct values.
-- TQ_DELTANET_FP32=1 to bypass DN quant entirely and compare
+- TQ_DELTANET_FP32=1 to bypass DN quant entirely and compare ✓ TESTED — same Tier 3, max 3.87. Quant is NOT the cause.
 - Run on smaller quant (UD-IQ2_M 10.1 GB) to see if Tier 3 persists across bit-widths
 
+**Element-level L0 comparison** (2026-04-25, after BOS alignment confirmed):
+```
+position 0 (BOS), L0 first 3 elements:
+  ours:   [-0.0548,  0.3547, -0.7901]
+  llama:  [-0.1097, -0.0390,  0.0355]
+  diff:   [ 2× off, SIGN FLIP, SIGN FLIP + 22× magnitude ]
+```
+
+Sum-level diff was 247% (23.5 vs 6.77). Element-level shows OUTLIER CHANNELS pattern — specific dimensions blown up while sum stays manageable.
+
+**Outlier-channel pattern suggests**:
+- Specific norm weight reading mis-aligned (e.g., attn_norm dim 5120 — boundary issue?)
+- Embedding lookup scaling factor missing (some Qwen variants have embed_scale = sqrt(hidden_dim))
+- Specific projection (qkv split offsets, conv1d channel split) shifting dim assignments
+
+**Concrete next investigation step**: dump first 20 elements of each named tensor at L0 (post_embed, attn_norm_out, qkv_proj_out, conv1d_out, q_split, k_split, v_split, q_l2norm, k_l2norm, gate_silu, delta_state, delta_out, ssm_norm_out, residual). First materially-divergent step localizes the bug.
+
 **Memory**: at 16.8 GB Q4_K_M model size on 16 GB RAM Mac, evaluation is impractical (constant swap, ~0.3 tok/s, -n 30 test took 15+ min). For users wanting to test 27B, smaller quants are available:
 - UD-IQ2_M: 10.1 GB (recommended for 16 GB RAM)
 - UD-Q2_K_XL: 11.0 GB