Commit 0829285
docs(tier): Qwen3.6-27B llama sub-op tensor reference + split verified
Captured llama-debug's named sub-op tensors at L0 of Qwen3.6-27B:
attn_norm-N = MUL(norm × attn_norm.weight)
conv_input-N = concat(conv_states, qkv_mixed_transposed) shape {5, 10240}
conv_output_raw-N = SSM_CONV(input, conv1d.weight)
conv_output_silu-N = SILU(conv_output_raw)
q_conv-N VIEW {128, 16, n_tokens} ← offset 0
k_conv-N VIEW {128, 16, n_tokens} ← offset 16×128=2048
v_conv_predelta-N VIEW {128, 48, n_tokens} ← offset 2×16×128=4096
VERIFIED: Q/K/V split offsets match ours exactly. Both engines
extract Q at byte 0, K at byte 2048, V at byte 4096 of the 10240-dim
conv output. Channel layout is identical.
Yet element-level diff at L0 element 2 is 22× magnitude with sign
flip. Bug must be in:
- ssm_conv1d weight load/use (shape {4, 10240}; A3B uses {4, 8192})
- L2_NORM op (we may differ from ggml_l2_norm)
- input_layernorm boundary handling at hidden=5120
- BOS handling (verified: both engines DO add BOS, so not this)
Updated docs/tier_benchmark_2026_04_25.md with the named-tensor
reference for next-session paired-diff investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 2873364 commit 0829285
1 file changed
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
104 | 127 | | |
105 | 128 | | |
106 | 129 | | |
| |||
0 commit comments