Skip to content

Commit 1073622

Browse files
TheTomclaude
andcommitted
fix: add TURBO2_0 to flash_attn auto-enable check
turbo2 V cache failed with "failed to create context" because the auto-enable predicate only listed turbo3/turbo4. Without auto-enable, the subsequent quantized-V-requires-FA check hard-fails. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0009301 commit 1073622

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

src/llama-context.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3002,8 +3002,8 @@ llama_context * llama_init_from_model(
30023002

30033003
// TurboQuant cache types require flash attention — auto-enable if disabled
30043004
if (params.flash_attn_type == LLAMA_FLASH_ATTN_TYPE_DISABLED &&
3005-
(params.type_k == GGML_TYPE_TURBO3_0 || params.type_k == GGML_TYPE_TURBO4_0 ||
3006-
params.type_v == GGML_TYPE_TURBO3_0 || params.type_v == GGML_TYPE_TURBO4_0)) {
3005+
(params.type_k == GGML_TYPE_TURBO2_0 || params.type_k == GGML_TYPE_TURBO3_0 || params.type_k == GGML_TYPE_TURBO4_0 ||
3006+
params.type_v == GGML_TYPE_TURBO2_0 || params.type_v == GGML_TYPE_TURBO3_0 || params.type_v == GGML_TYPE_TURBO4_0)) {
30073007
LLAMA_LOG_WARN("%s: turbo cache types require flash_attn — enabling automatically\n", __func__);
30083008
params.flash_attn_type = LLAMA_FLASH_ATTN_TYPE_ENABLED;
30093009
}

0 commit comments

Comments
 (0)