Skip to content

Commit 44656e1

Browse files
committed
bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.
1 parent 8a4cc1a commit 44656e1

2 files changed

Lines changed: 6 additions & 3 deletions

File tree

third_party/xllm_atb_layers

Submodule xllm_atb_layers updated from d6aa214 to 1147537

xllm/core/runtime/worker_impl.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -288,9 +288,12 @@ bool WorkerImpl::allocate_kv_cache(
288288
} else {
289289
// Full attention layer: allocate key_cache and value_cache only
290290
#if defined(USE_NPU)
291+
// Keep runtime allocation format consistent with capacity estimation in
292+
// llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache.
293+
const auto& model_type = context_.get_model_args().model_type();
291294
aclFormat npu_format_type =
292-
context_.get_model_args().model_type() == "deepseek_v3" &&
293-
FLAGS_enable_prefix_cache
295+
((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&
296+
FLAGS_enable_prefix_cache)
294297
? ACL_FORMAT_FRACTAL_NZ
295298
: ACL_FORMAT_ND;
296299
key_cache = at_npu::native::npu_format_cast(

0 commit comments

Comments
 (0)