Skip to content

Commit 609debc

Browse files
ehsan6shaclaude
andcommitted
runtime: inject system prompt via rkllm_set_chat_template (not inlined in user msg)
Lab smoke test after the role-based refactor showed the model's first thought event regurgitating the entire SYSTEM_PROMPT_TEMPLATE — because the prior fix inlined the system rules into the user message as 'System: [rules]\\n\\nUser request: [prompt]'. The model treated the rules as additional context to acknowledge / restate. Proper fix: configure session-level chat template via rkllm_set_chat_template(handle, system, prefix, postfix) so the runtime injects the system in the canonical <|im_start|>system\\n... <|im_end|> slot. The model recognises it as instructions, not as something to repeat back. Trade-off documented earlier — calling set_chat_template disables the runtime's automatic enable_thinking handling. We work around by including '<think>\\n' in the postfix, so every assistant turn starts inside a think block: postfix = '<|im_end|>\\n<|im_start|>assistant\\n<think>\\n' This still keeps <|im_end|> as the stop-token signal (the first token of the postfix), so the runtime correctly terminates each turn at <|im_end|> — no more fictitious continuations. 55/55 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 6bf700a commit 609debc

1 file changed

Lines changed: 28 additions & 12 deletions

File tree

src/runtime/rkllm_runtime.py

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,21 +1093,37 @@ async def run_troubleshoot(
10931093
original_user_prompt = prompt
10941094

10951095
# Per-turn role-based content for v1.2.3.
1096-
# Turn 0: role="user", content = SYSTEM_PROMPT_TEMPLATE concatenated
1097-
# with the user's actual prompt. v1.2.3's RKLLMInput role enum
1098-
# doesn't document a "system" role, and rkllm_set_chat_template
1099-
# would disable thinking-mode handling, so we inline the system
1100-
# rules into the first user message. The model was trained with
1101-
# the full system prompt visible in every example, so it still
1102-
# recognises the rules even via this less-structured channel.
1096+
# Turn 0: role="user", content = user's actual prompt only. The
1097+
# SYSTEM prompt is configured ONCE via rkllm_set_chat_template
1098+
# below so the runtime injects it in the proper <|im_start|>
1099+
# system\n...<|im_end|> slot — the model sees it as
1100+
# instructions, not as part of the user's request, and doesn't
1101+
# regurgitate it back in its first thought event.
11031102
# Turn 1+: role="tool" with the JSON tool response. The runtime
11041103
# appends to the existing KV cache via keep_history=1.
1105-
first_turn_content = (
1106-
f"{system_prompt}\n\n"
1107-
f"User request: {prompt}"
1108-
)
1104+
# Configure session-specific chat template — uses our system
1105+
# prompt + Qwen 3 markers + `<think>\n` postfix to force
1106+
# thinking-mode (the auto-thinking flag is disabled when
1107+
# set_chat_template is called per runtime warning).
1108+
try:
1109+
self._runtime._lib.rkllm_set_chat_template.restype = ctypes.c_int
1110+
self._runtime._lib.rkllm_set_chat_template.argtypes = [
1111+
ctypes.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p,
1112+
]
1113+
system_wrapped = (
1114+
f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
1115+
).encode("utf-8")
1116+
prefix = b"<|im_start|>user\n"
1117+
# postfix closes user turn + opens assistant + forces think
1118+
postfix = b"<|im_end|>\n<|im_start|>assistant\n<think>\n"
1119+
self._runtime._lib.rkllm_set_chat_template(
1120+
self._runtime._handle, system_wrapped, prefix, postfix,
1121+
)
1122+
except Exception as e: # noqa: BLE001
1123+
logger.warning("rkllm_set_chat_template failed: %s", e)
1124+
11091125
next_role: str = "user"
1110-
next_content: str = first_turn_content
1126+
next_content: str = prompt
11111127
next_keep_history: int = 0 # 0 on first turn; 1 thereafter
11121128

11131129
for turn in range(MAX_TURNS):

0 commit comments

Comments
 (0)