Skip to content

fix: disable Qwen3.5 thinking mode in TRL generation#249

Merged
abrichr merged 1 commit into
mainfrom
fix/disable-qwen-thinking-mode
Mar 29, 2026
Merged

fix: disable Qwen3.5 thinking mode in TRL generation#249
abrichr merged 1 commit into
mainfrom
fix/disable-qwen-thinking-mode

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 29, 2026

Summary

Root cause found. Qwen3.5-9B's chat template inserts <think> which activates internal reasoning mode. The model produces opaque thinking tokens (# # # # #) instead of DSL actions (Thought: ...\nAction: CLICK(...)).

Fix: Pass enable_thinking=False to apply_chat_template. Falls back to stripping <think> from rendered text if the kwarg is not supported (older processors).

This explains every TRL garbage output report — the model was in thinking mode the entire time.

Test plan

  • 32 TRL tests pass
  • Client re-test — should see DSL output instead of # # # #

🤖 Generated with Claude Code

Root cause of persistent garbage output: Qwen3.5-9B's chat template
inserts <think> which activates internal reasoning mode. The model
produces opaque thinking tokens (# # # # #) instead of DSL actions.

Fix: pass enable_thinking=False to apply_chat_template. Falls back to
stripping <think> from rendered text if the kwarg is not supported.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 5a2bf7f into main Mar 29, 2026
1 check passed
abrichr added a commit that referenced this pull request Mar 29, 2026
Stripping <think> from rendered text was insufficient — TRL or the
processor may re-apply the template, re-inserting the tags. The fix:
patch processor.chat_template and processor.tokenizer.chat_template
on first rollout call, removing <think>/<think> from the Jinja
template itself. This ensures no code path can re-insert thinking mode.

Also strips </think> (was missed in #249).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abrichr added a commit that referenced this pull request Mar 29, 2026
Stripping <think> from rendered text was insufficient — TRL or the
processor may re-apply the template, re-inserting the tags. The fix:
patch processor.chat_template and processor.tokenizer.chat_template
on first rollout call, removing <think>/<think> from the Jinja
template itself. This ensures no code path can re-insert thinking mode.

Also strips </think> (was missed in #249).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant