Skip to content

Commit 2873364

Browse files
unamedkrclaude
andcommitted
docs(tier): Qwen3.6-27B element-level diff — outlier channel pattern
Continued investigation this session: VALIDATED additional ruled-outs: TQ_DELTANET_FP32=1 → same Tier 3, max 3.87 → Quantization NOT the cause → Forward pass arithmetic bug confirmed ELEMENT-LEVEL L0 diff (BOS-aligned, both engines prefilled BOS+Hello): pos 0 (BOS), first 3 elements: ours: [-0.055, 0.355, -0.790] llama: [-0.110, -0.039, 0.036] elem 0: 2× magnitude, same sign elem 1: SIGN FLIP + 9× magnitude elem 2: SIGN FLIP + 22× magnitude PATTERN: outlier channels — specific dimensions blown up while overall sum stays manageable (23.5 vs 6.8). Classic signature of: - Mis-aligned norm weight (boundary issue at hidden=5120?) - Missing embed_scale = sqrt(hidden_dim) - QKV/conv1d channel split shifted by some offset Updated docs/tier_benchmark_2026_04_25.md with concrete next investigation step: dump first 20 elements of each named tensor at L0 (post_embed, attn_norm_out, qkv_proj, conv1d, q/k/v split, l2norm, decay, delta, state, output, ssm_norm). First materially divergent named tensor identifies the bug location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0fa33cb commit 2873364

1 file changed

Lines changed: 18 additions & 1 deletion

File tree

docs/tier_benchmark_2026_04_25.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,26 @@ Max rel_diff: **3.87** (L3). For comparison:
8181

8282
**Quick paths to try first** (not done this session):
8383
- Q4_K dequant validation — 27B has hidden_dim 5120; Q4_K block size is 256 (5120/256=20 blocks). For previous models hidden 2048/3072/4096 (8/12/16 blocks). 5120 is unusual — verify dequant produces correct values.
84-
- TQ_DELTANET_FP32=1 to bypass DN quant entirely and compare
84+
- TQ_DELTANET_FP32=1 to bypass DN quant entirely and compare ✓ TESTED — same Tier 3, max 3.87. Quant is NOT the cause.
8585
- Run on smaller quant (UD-IQ2_M 10.1 GB) to see if Tier 3 persists across bit-widths
8686

87+
**Element-level L0 comparison** (2026-04-25, after BOS alignment confirmed):
88+
```
89+
position 0 (BOS), L0 first 3 elements:
90+
ours: [-0.0548, 0.3547, -0.7901]
91+
llama: [-0.1097, -0.0390, 0.0355]
92+
diff: [ 2× off, SIGN FLIP, SIGN FLIP + 22× magnitude ]
93+
```
94+
95+
Sum-level diff was 247% (23.5 vs 6.77). Element-level shows OUTLIER CHANNELS pattern — specific dimensions blown up while sum stays manageable.
96+
97+
**Outlier-channel pattern suggests**:
98+
- Specific norm weight reading mis-aligned (e.g., attn_norm dim 5120 — boundary issue?)
99+
- Embedding lookup scaling factor missing (some Qwen variants have embed_scale = sqrt(hidden_dim))
100+
- Specific projection (qkv split offsets, conv1d channel split) shifting dim assignments
101+
102+
**Concrete next investigation step**: dump first 20 elements of each named tensor at L0 (post_embed, attn_norm_out, qkv_proj_out, conv1d_out, q_split, k_split, v_split, q_l2norm, k_l2norm, gate_silu, delta_state, delta_out, ssm_norm_out, residual). First materially-divergent step localizes the bug.
103+
87104
**Memory**: at 16.8 GB Q4_K_M model size on 16 GB RAM Mac, evaluation is impractical (constant swap, ~0.3 tok/s, -n 30 test took 15+ min). For users wanting to test 27B, smaller quants are available:
88105
- UD-IQ2_M: 10.1 GB (recommended for 16 GB RAM)
89106
- UD-Q2_K_XL: 11.0 GB

0 commit comments

Comments
 (0)