fix: disable Qwen3.5 thinking mode in TRL generation by abrichr · Pull Request #249 · OpenAdaptAI/openadapt-evals

abrichr · 2026-03-29T22:04:16Z

Summary

Root cause found. Qwen3.5-9B's chat template inserts <think> which activates internal reasoning mode. The model produces opaque thinking tokens (# # # # #) instead of DSL actions (Thought: ...\nAction: CLICK(...)).

Fix: Pass enable_thinking=False to apply_chat_template. Falls back to stripping <think> from rendered text if the kwarg is not supported (older processors).

This explains every TRL garbage output report — the model was in thinking mode the entire time.

Test plan

32 TRL tests pass
Client re-test — should see DSL output instead of # # # #

🤖 Generated with Claude Code

Root cause of persistent garbage output: Qwen3.5-9B's chat template inserts <think> which activates internal reasoning mode. The model produces opaque thinking tokens (# # # # #) instead of DSL actions. Fix: pass enable_thinking=False to apply_chat_template. Falls back to stripping <think> from rendered text if the kwarg is not supported. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stripping <think> from rendered text was insufficient — TRL or the processor may re-apply the template, re-inserting the tags. The fix: patch processor.chat_template and processor.tokenizer.chat_template on first rollout call, removing <think>/<think> from the Jinja template itself. This ensures no code path can re-insert thinking mode. Also strips </think> (was missed in #249). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stripping <think> from rendered text was insufficient — TRL or the processor may re-apply the template, re-inserting the tags. The fix: patch processor.chat_template and processor.tokenizer.chat_template on first rollout call, removing <think>/<think> from the Jinja template itself. This ensures no code path can re-insert thinking mode. Also strips </think> (was missed in #249). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abrichr merged commit 5a2bf7f into main Mar 29, 2026
1 check passed

abrichr mentioned this pull request Mar 29, 2026

fix: patch chat_template to remove think tags at the source #250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: disable Qwen3.5 thinking mode in TRL generation#249

fix: disable Qwen3.5 thinking mode in TRL generation#249
abrichr merged 1 commit into
mainfrom
fix/disable-qwen-thinking-mode

abrichr commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 29, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant