Skip to content

Commit 714cd4c

Browse files
unamedkrclaude
andcommitted
fix(tokenizer): respect GGUF tokenizer.ggml.add_bos_token (R8)
Generalises R7 from Qwen3.6 family-only heuristic to a model-agnostic GGUF metadata read. Adds tq_tokenizer_t.add_bos_token tristate field: 1 = explicit true -1 = explicit false (suppress BOS prepend even if vocab lookup would have enabled it) 0 = unset (fall through to existing heuristics) tq_load_tokenizer_from_gguf parses tokenizer.ggml.add_bos_token (bool) and sets the field accordingly. tq_generate.c BOS-decision block consults the tristate before any heuristic, so models that explicitly declare add_bos_token=false (both Qwen3.6-27B Q4_K_M and 35B-A3B-UD-IQ4_XS do, per direct GGUF metadata read) are honoured regardless of vocab content. Verification (clean rebuild, Metal=ON): - 35B-A3B IQ4_XS quantum: 149 tok natural EOS (matches baseline, Tier 2 — restored from 94 rep loop seen with R1 BOS auto-enable) - add_bos=-1 logged at load time confirms the metadata path fires Note: an earlier incremental rebuild after the struct change produced an ABI mismatch (102 rep loop instead of 149 EOS). Always do a clean rebuild after touching tokenizer struct layout. Also rediscovered: src/engine/tq_moe.c uses `goto moe_shared_expert` outside the `#ifdef TQ_HAS_METAL` block but the label is inside, so non-Metal configurations fail to build — keep TQ_BUILD_METAL=ON for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 475799a commit 714cd4c

3 files changed

Lines changed: 29 additions & 4 deletions

File tree

include/turboquant/tq_engine.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,11 @@ typedef struct {
459459
int* sorted_indices;
460460
/* Merge table: pairs of token IDs that merge into a result */
461461
int* merge_pairs; /* [n_merges * 3]: (token_a, token_b, result_id) */
462+
/* GGUF metadata flag: tokenizer.ggml.add_bos_token
463+
* 1 = explicitly true (force BOS prepend)
464+
* -1 = explicitly false (suppress BOS prepend regardless of vocab lookup)
465+
* 0 = unset (use heuristic vocab lookup) */
466+
int add_bos_token;
462467
} tq_tokenizer_t;
463468

464469
/* ============================================================

src/engine/tq_generate.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -333,9 +333,17 @@ int tq_generate(tq_model_t* model, tq_tokenizer_t* tokenizer,
333333
* Gemma 3/4: model_type==1, BOS=2 (required)
334334
* Phi-3 / LLaMA 2: vocab has <s> as BOS (required)
335335
* LLaMA 3: BOS=128000 (<|begin_of_text|>) — tq_encode lookup chain handles it
336-
* Qwen3.5 / GPT-2 BPE: no native BOS, skip */
336+
* Qwen3.5 / GPT-2 BPE: no native BOS, skip
337+
*
338+
* Precedence: GGUF metadata `tokenizer.ggml.add_bos_token` wins over
339+
* heuristics. -1 = explicit false (suppress), +1 = explicit true,
340+
* 0 = unset (fall through to heuristic). */
337341
int add_bos = 0;
338-
if (model->config.model_type == 1) {
342+
if (tokenizer->add_bos_token == -1) {
343+
/* explicit false — Qwen3.6 27B/35B-A3B path; skip everything */
344+
} else if (tokenizer->add_bos_token == 1) {
345+
add_bos = 1;
346+
} else if (model->config.model_type == 1) {
339347
add_bos = 1; /* Gemma: always prepend BOS=2 */
340348
} else {
341349
/* Auto-detect BOS: check if vocab contains <s> (LLaMA 2, Phi-3)

src/engine/tq_tokenizer.c

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -950,8 +950,20 @@ tq_tokenizer_t* tq_load_tokenizer_from_gguf(const void* gguf_ctx_ptr) {
950950
}
951951
}
952952

953-
fprintf(stderr, "tq_load_tokenizer_from_gguf: loaded %d tokens (max_len=%d)\n",
954-
tok->vocab_size, tok->max_token_len);
953+
/* tokenizer.ggml.add_bos_token (bool, optional)
954+
* true → tok->add_bos_token = +1 (force BOS prepend)
955+
* false → tok->add_bos_token = -1 (suppress BOS prepend)
956+
* unset → 0 (let tq_generate fall back to vocab heuristic) */
957+
int64_t add_bos_idx = tq_gguf_find_key(gguf, "tokenizer.ggml.add_bos_token");
958+
if (add_bos_idx >= 0) {
959+
const tq_gguf_kv_t* kv = &gguf->kv[add_bos_idx];
960+
if (kv->type == TQ_GGUF_TYPE_BOOL) {
961+
tok->add_bos_token = kv->value.bool_val ? 1 : -1;
962+
}
963+
}
964+
965+
fprintf(stderr, "tq_load_tokenizer_from_gguf: loaded %d tokens (max_len=%d) add_bos=%d\n",
966+
tok->vocab_size, tok->max_token_len, tok->add_bos_token);
955967
return tok;
956968
}
957969

0 commit comments

Comments
 (0)