Skip to content

Commit 0b8e2db

Browse files
ehsan6shaclaude
andcommitted
runtime: configure system prompt via set_chat_template (clean Qwen3 markers, no forced think)
Lab test of inline-system-into-user-message showed the model producing coherent narration prose ('Calling diag/summary to get an overall picture') but NOT the <tool_call> XML the parsers expect. The model fell out of structured-output mode because the SYSTEM rules arrived as part of the user-role content rather than in the canonical <|im_start|>system\\n... slot — the model's training (174 examples with system in proper slot every time) expects that slot to contain the rules. Switching to rkllm_set_chat_template(handle, sys_wrapped, prefix, postfix) with the canonical Qwen 3 markers (no forced <think>\\n postfix — that broke generation on commit 609debc). The runtime configures it as the model's session template; the model sees SYSTEM in the proper slot and the trained patterns for XML output activate. c_char_p lifetime: byte buffers held in named locals (sys_bytes / prefix_bytes / postfix_bytes) per the GC bug fixed in commit dec8471. 55/55 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 3e6448d commit 0b8e2db

1 file changed

Lines changed: 26 additions & 18 deletions

File tree

src/runtime/rkllm_runtime.py

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1106,25 +1106,33 @@ async def run_troubleshoot(
11061106
original_user_prompt = prompt
11071107

11081108
# Per-turn role-based content for v1.2.3.
1109-
# Turn 0: role="user", content = system rules + user prompt as a
1110-
# single user-role message (since v1.2.3 doesn't expose a
1111-
# "system" role on RKLLMInput and rkllm_set_chat_template caused
1112-
# garbled output in lab tests — the runtime's parsing of the
1113-
# full <|im_start|>system\n...<|im_end|>\n template seems
1114-
# fragile when the system content is long). Inlining the system
1115-
# rules as "Instructions:\n[rules]\n\nRequest:\n[user]" is the
1116-
# most reliable channel — the 174-example fine-tune included
1117-
# the SYSTEM_PROMPT_TEMPLATE in every training example, so the
1118-
# model recognises these rules through training even when they
1119-
# arrive as user content.
1120-
# Turn 1+: role="tool" with the JSON tool response. The runtime
1121-
# appends to the existing KV cache via keep_history=1.
1122-
first_turn_content = (
1123-
f"Instructions:\n{system_prompt}\n\n"
1124-
f"Request: {prompt}"
1125-
)
1109+
#
1110+
# Inject system prompt via rkllm_set_chat_template with the
1111+
# canonical Qwen 3 markers (no <think>\n added — that broke
1112+
# generation in the lab on commit 609debc). The runtime wraps
1113+
# our system in <|im_start|>system\n...<|im_end|>\n, each user
1114+
# turn in <|im_start|>user\n...<|im_end|>\n, and opens
1115+
# assistant via <|im_start|>assistant\n. <|im_end|> stays as
1116+
# the per-turn stop token via the runtime's built-in handling.
1117+
try:
1118+
self._runtime._lib.rkllm_set_chat_template.restype = ctypes.c_int
1119+
self._runtime._lib.rkllm_set_chat_template.argtypes = [
1120+
ctypes.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p,
1121+
]
1122+
# Hold byte buffers in locals (ctypes c_char_p doesn't own).
1123+
sys_bytes = (
1124+
f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
1125+
).encode("utf-8")
1126+
prefix_bytes = b"<|im_start|>user\n"
1127+
postfix_bytes = b"<|im_end|>\n<|im_start|>assistant\n"
1128+
self._runtime._lib.rkllm_set_chat_template(
1129+
self._runtime._handle, sys_bytes, prefix_bytes, postfix_bytes,
1130+
)
1131+
except Exception as e: # noqa: BLE001
1132+
logger.warning("rkllm_set_chat_template failed: %s", e)
1133+
11261134
next_role: str = "user"
1127-
next_content: str = first_turn_content
1135+
next_content: str = prompt
11281136
next_keep_history: int = 0 # 0 on first turn; 1 thereafter
11291137

11301138
for turn in range(MAX_TURNS):

0 commit comments

Comments
 (0)