Skip to content

Commit 1c85bdc

Browse files
unamedkrclaude
andcommitted
feat(loader): warn loudly when arch=deepseek2 (MLA not yet supported)
Phase 1 of the MLA work confirmed DeepSeek V2/V3 / Coder-V2 models load through our generic GGUF reader without complaint, but the forward pass produces multilingual garbage because attn_kv_a_mqa / attn_kv_b are treated as standard wk / wv. Add a one-time loud warning at load time so users do not mistake the garbage output for a quantization artifact. Points at the Phase 2 entry plan in docs/research/mla_support_plan.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2ddd289 commit 1c85bdc

1 file changed

Lines changed: 17 additions & 0 deletions

File tree

src/engine/tq_model.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3975,6 +3975,23 @@ tq_model_t* tq_load_gguf(const char* path) {
39753975
c->is_moe ? ", MoE" : "",
39763976
c->hidden_dim, c->n_heads, c->n_kv_heads, c->vocab_size);
39773977

3978+
/* deepseek2 (DeepSeek V2/V3, Coder-V2) uses Multi-head Latent Attention
3979+
* (MLA): the GGUF reader and quant kernels handle the model's tensors
3980+
* fine, but our standard attention forward path treats attn_kv_a_mqa /
3981+
* attn_kv_b as if they were wk / wv. That produces multilingual garbage
3982+
* tokens — see docs/research/mla_support_plan.md (Phase 1 results,
3983+
* 2026-04-26) for the architectural details and the Phase 2 entry plan.
3984+
* Print a loud one-time warning so users do not mistake garbage output
3985+
* for a quantization artifact. */
3986+
if (strcmp(gguf->arch, "deepseek2") == 0) {
3987+
fprintf(stderr,
3988+
"tq_load_gguf: WARNING — arch 'deepseek2' uses MLA "
3989+
"(Multi-head Latent Attention).\n"
3990+
" Our forward pass does NOT yet implement MLA decompression.\n"
3991+
" Output WILL be incoherent until Phase 2 lands.\n"
3992+
" See docs/research/mla_support_plan.md for the roadmap.\n");
3993+
}
3994+
39783995
/* Hard-fail when no attention layers were detected. Without this,
39793996
* the forward pass runs against zero-initialized weights → garbage.
39803997
* This was the root cause of the Phi-3 first-time experience bug:

0 commit comments

Comments
 (0)