Skip to content

Commit d857c4d

Browse files
ehsan6shaclaude
andcommitted
runtime: lower temperature 0.6->0.3 + top_k 20->5 for XML adherence
Lab observation 2026-05-27: the model emitted 'diag/summary' as plain text instead of the trained '<tool_call>{...}</tool_call>' XML. At temp=0.6/top_k=20 the sampling is loose enough that the model drifts into narrative prose mode despite the structured- output training. Tightening to temp=0.3/top_k=5 favours the most- likely next-token (which, for the trained pattern, is the XML tag) without going fully greedy (top_k=1 would risk verbose deterministic loops). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 0b8e2db commit d857c4d

1 file changed

Lines changed: 10 additions & 2 deletions

File tree

src/runtime/rkllm_runtime.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -363,8 +363,16 @@ def init_model(
363363
# streaming SSE path that emits tokens as they arrive (current
364364
# generate() blocks until the full turn completes).
365365
max_new_tokens: int = 768,
366-
temperature: float = 0.6,
367-
top_k: int = 20,
366+
# Lower temperature + tighter top_k for STRUCTURED-OUTPUT
367+
# adherence. Lab observation 2026-05-27: at temp=0.6/top_k=20
368+
# the model produced narrative prose ("diag/summary") instead
369+
# of the trained XML format ("<tool_call>{...}</tool_call>").
370+
# The training distribution heavily favours XML in the relevant
371+
# positions; lower entropy = the model follows that mode more
372+
# consistently. Trade-off is slightly less variety in reasoning
373+
# prose, acceptable for a structured tool-calling task.
374+
temperature: float = 0.3,
375+
top_k: int = 5,
368376
top_p: float = 0.8,
369377
) -> None:
370378
p = RKLLMParam()

0 commit comments

Comments
 (0)