Commit 5a2bf7f
fix: disable Qwen3.5 thinking mode in TRL generation (#249)
Root cause of persistent garbage output: Qwen3.5-9B's chat template
inserts <think> which activates internal reasoning mode. The model
produces opaque thinking tokens (# # # # #) instead of DSL actions.
Fix: pass enable_thinking=False to apply_chat_template. Falls back to
stripping <think> from rendered text if the kwarg is not supported.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent dd0dd45 commit 5a2bf7f
1 file changed
+21
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
514 | 514 | | |
515 | 515 | | |
516 | 516 | | |
517 | | - | |
518 | | - | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
519 | 524 | | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
520 | 539 | | |
521 | 540 | | |
522 | 541 | | |
| |||
0 commit comments