Skip to content

Commit 5be34e6

Browse files
committed
to_hf.py: rescue chat_template.jinja before deleting it
Modern HuggingFace transformers (~4.42+) moves long chat_template strings out of tokenizer_config.json into a separate chat_template.jinja file to keep the JSON readable. Qwen3-1.7B's 4168-char template triggers this split; Nemotron-Nano's shorter template stays inline. The old code deleted chat_template.jinja before reading tokenizer_config.json, assuming the inline copy was always complete. For Qwen3 that meant the exported checkpoint shipped with an empty chat_template -- vLLM's apply_chat_template returned a prompt without the <|audio|> placeholder, which broke multimodal prompt replacement (Failed to apply prompt replacement for mm_items['audio'][0]). Now read chat_template.jinja, inline it into tokenizer_config.json when non-empty, and only then delete the file. Nemotron's inline-only path is unchanged because .jinja doesn't get written for small templates. Made-with: Cursor
1 parent 6f8a26f commit 5be34e6

1 file changed

Lines changed: 11 additions & 4 deletions

File tree

examples/speechlm2/to_hf.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,15 +183,22 @@ def prepare_for_vllm(output_dir: str, model_cfg: dict) -> None:
183183
if _AUDIO_TOKEN not in tok.get_vocab():
184184
tok.add_special_tokens({"additional_special_tokens": [_AUDIO_TOKEN]})
185185
tok.save_pretrained(str(output_dir))
186-
# A separate chat_template.jinja file, if present, overrides the inline copy
187-
# in tokenizer_config.json. Remove it so tokenizer_config.json wins.
186+
# Newer transformers writes long chat templates to a separate
187+
# ``chat_template.jinja`` file instead of inlining them in
188+
# ``tokenizer_config.json`` (Qwen3's 4k-char template triggers this,
189+
# Nemotron's shorter one stays inline). Read whichever is populated,
190+
# inline it into tokenizer_config.json, and delete the .jinja file so
191+
# downstream tooling sees a single canonical location.
192+
tok_cfg_path = output_dir / "tokenizer_config.json"
193+
tok_cfg = json.loads(tok_cfg_path.read_text())
188194
jinja_file = output_dir / "chat_template.jinja"
189195
if jinja_file.exists():
196+
jinja_from_file = jinja_file.read_text()
197+
if jinja_from_file.strip():
198+
tok_cfg["chat_template"] = jinja_from_file
190199
jinja_file.unlink()
191200
# Normalize extra_special_tokens: transformers writes our added audio token
192201
# as a list, but HF/vLLM loaders expect a dict keyed by semantic name.
193-
tok_cfg_path = output_dir / "tokenizer_config.json"
194-
tok_cfg = json.loads(tok_cfg_path.read_text())
195202
tok_cfg["extra_special_tokens"] = {"audio_token": _AUDIO_TOKEN}
196203
# Some reasoning backbones (e.g. nemotron-nano-v3) ship a chat_template whose
197204
# default ``enable_thinking`` is ``True``; our SpeechLM fine-tuning renders

0 commit comments

Comments
 (0)