Commit aa8799f
runtime: raise max_new_tokens 768 -> 1500 for verdict completion headroom
768 was too tight — the second turn (analyzing tool result + emitting
verdict) needs ~700-1200 tokens for think + structured output. Model
was truncating mid-think and never reaching the <verdict> block.
1500 gives the model enough room without making turns absurdly long.
At RK3588 NPU's ~5-7 tps thinking-mode rate, that's 3-5 minutes per
turn. Combined with all earlier fixes (no set_chat_template, inlined
system, low temperature, GC-safe ctypes, role-based input), the
model should now complete full multi-turn diagnostic flows.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent dbd15ff commit aa8799f
1 file changed
Lines changed: 8 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
352 | 352 | | |
353 | 353 | | |
354 | 354 | | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
366 | 363 | | |
367 | 364 | | |
368 | 365 | | |
| |||
0 commit comments