Skip to content

Commit dd42d51

Browse files
yrougyclaude
andcommitted
kv-cache : disable attn_rot_k/v when SPLIT_MODE_TENSOR is active
The Hadamard rotation path (ggml_mul_mat_aux) reshapes the K/V tensor from [n_embd_head, n_head_kv, n_tokens] to a 2-D layout for matmul, then restores it. The split-axis tracker in ggml-backend-meta.cpp does not follow this reshape correctly when the source split axis falls on the collapsed dimension, producing an AXIS_1 tag on the values passed to ggml_set_rows, which then trips an assertion. Disabling attn_rot when tensor parallelism is in use sidesteps the incompatibility without changing inference quality: the Hadamard pre-rotation is a lossless precision aid, not a model requirement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 1c220db commit dd42d51

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

src/llama-kv-cache.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,12 +289,14 @@ llama_kv_cache::llama_kv_cache(
289289

290290
attn_rot_k =
291291
!attn_rot_disable &&
292+
model.split_mode() != LLAMA_SPLIT_MODE_TENSOR &&
292293
n_embd_head_k_all > 0 &&
293294
ggml_is_quantized(type_k) &&
294295
hparams.n_embd_head_k() % 64 == 0;
295296

296297
attn_rot_v =
297298
!attn_rot_disable &&
299+
model.split_mode() != LLAMA_SPLIT_MODE_TENSOR &&
298300
n_embd_head_v_all > 0 &&
299301
ggml_is_quantized(type_v) &&
300302
hparams.n_embd_head_v() % 64 == 0;

0 commit comments

Comments
 (0)