You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(tokenizer): fall back to direct fast-tokenizer load when model config build fails
`AutoTokenizer.from_pretrained` eagerly constructs the *model* config to
resolve the tokenizer class — even for a plain `PreTrainedTokenizerFast`.
That construction runs HF's RoPE validator, which rejects configs carrying
nested `rope_parameters` (e.g. poolside/Laguna-XS.2: `full_attention` /
`sliding_attention` blocks with no top-level `rope_theta`) when the config
is built outside vLLM's `patch_rope_parameters`. The resulting `KeyError`
escapes (AutoTokenizer only catches `ValueError`/`OSError`) and kills the
tokenizer load — a modeling-only concern breaking something the tokenizer
never needed.
renderers needs the tokenizer, not the model. When `AutoTokenizer` fails
while building the config, fall back to loading the repo's self-contained
`tokenizer.json` directly via `PreTrainedTokenizerFast`, which never touches
the model config. The fallback runs under the fastokens patch, so models
like Laguna keep the Rust fast-path speedup. Custom `auto_map` tokenizers
and repos without a fast tokenizer are left to surface the original error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments