Skip to content

Commit 5a2bf7f

Browse files
abrichrclaude
andauthored
fix: disable Qwen3.5 thinking mode in TRL generation (#249)
Root cause of persistent garbage output: Qwen3.5-9B's chat template inserts <think> which activates internal reasoning mode. The model produces opaque thinking tokens (# # # # #) instead of DSL actions. Fix: pass enable_thinking=False to apply_chat_template. Falls back to stripping <think> from rendered text if the kwarg is not supported. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent dd0dd45 commit 5a2bf7f

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

openadapt_evals/training/trl_rollout.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -514,9 +514,28 @@ def generate_fn(screenshot_bytes: bytes, instruction: str):
514514

515515
import torch
516516

517-
text_input = processor.apply_chat_template(
518-
messages, tokenize=False, add_generation_prompt=True
517+
# Disable thinking mode: Qwen3.5's chat template inserts
518+
# <think> which activates internal reasoning tokens (the
519+
# "# # # # #" garbage). We need DSL output, not thinking.
520+
# Try enable_thinking=False first; if not supported, strip
521+
# <think> from the rendered text.
522+
chat_kwargs = dict(
523+
tokenize=False, add_generation_prompt=True,
519524
)
525+
try:
526+
text_input = processor.apply_chat_template(
527+
messages, enable_thinking=False, **chat_kwargs,
528+
)
529+
except TypeError:
530+
# Older processor doesn't support enable_thinking kwarg
531+
text_input = processor.apply_chat_template(
532+
messages, **chat_kwargs,
533+
)
534+
535+
# Belt-and-suspenders: strip <think> tag if it slipped through
536+
if "<think>" in text_input:
537+
logger.info("Stripping <think> tag from prompt to disable thinking mode")
538+
text_input = text_input.replace("<think>\n", "").replace("<think>", "")
520539

521540
# Comprehensive prompt diagnostics on first call.
522541
# This logs everything needed to debug prompt construction:

0 commit comments

Comments
 (0)