Commit e0110e1
rubik
fix(basic_llm_processor): prevent duplicate BOS token in Llama-3/3.1 chat
When using `model.chat()` with Llama-3/3.1 models, the framework
inadvertently prepends two `<|begin_of_text|>` (BOS, token ID 128000)
tokens to the prompt_token_ids. This shifts the RoPE positional
encodings by 1, causing the greedy decoding output to diverge
significantly from HuggingFace.
Root cause:
The Llama-3/3.1 chat template explicitly includes `<|begin_of_text|>`
at the start of the rendered string. Later, `BasicLLMProcessor.__call__`
passes this string to `self.tokenizer(prompt)`, which defaults to
`add_special_tokens=True`. Since `LlamaTokenizerFast` initializes with
`add_bos_token=True` by default, the tokenizer automatically prepends
a second BOS token via its Rust backend PostProcessor.
Fix:
Explicitly pass `add_special_tokens=False` to the tokenizer calls in
`BasicLLMProcessor.__call__`. Since the chat template is already
responsible for adding necessary special tokens, the tokenizer should
only perform pure text-to-ID mapping.1 parent b2eccc2 commit e0110e1
1 file changed
Lines changed: 10 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
15 | 22 | | |
16 | | - | |
| 23 | + | |
17 | 24 | | |
18 | 25 | | |
19 | 26 | | |
20 | 27 | | |
21 | | - | |
| 28 | + | |
22 | 29 | | |
23 | 30 | | |
24 | 31 | | |
25 | 32 | | |
26 | | - | |
| 33 | + | |
27 | 34 | | |
28 | 35 | | |
29 | 36 | | |
| |||
0 commit comments