Skip to content

Commit c919a1d

Browse files
ehsan6shaclaude
andcommitted
runtime: suppress hallucinated verdicts + require root_cause + de-leak system prompt
Three fixes for the "wrong verdict + no recovery affordance" pattern observed in the lab 2026-05-28 not-earning transcript: - Model emitted severity:red "Internet slow + clock unsynced" while both tool_results in the transcript had payload:null (ok=False). Pattern-copied directly from the system prompt's GOOD example ("Discovery unreachable + clock unsynced") with words rearranged. - The verdict had no root_cause, but the schema allowed it, so the app's synthetic-detector never triggered the Try-Again button. - The thought event leaked the entire system prompt back as model output, including the "93ms" number from the BAD example which was then stuffed into a fake tool-response shape. Fixes: 1. system prompt — delete the BAD/GOOD example block AND the FULL EXAMPLE three-turn flow. Per advisor: a 1.7B model pattern-copies structures, not just numbers; swapping 93ms for a placeholder would just copy the placeholder. Inverse rules ("NEVER use markdown headings", "Read tool_response field by field") are already explicit elsewhere; the 174-example fine-tune teaches the format. Add a stronger FIRST-ACTION directive: "Your VERY FIRST output token MUST start a <tool_call> block. Do NOT repeat these instructions back." Add explicit VERDICT RULES section requiring root_cause AND specifying the insufficient_data fallback for no-data sessions. 2. parse_verdict — require root_cause (was optional). Rejects when missing or empty; the run_troubleshoot loop's no-verdict path then synthesizes a proper insufficient_data verdict the app can detect. 3. run_troubleshoot — no-evidence guardrail. Track successful_tool_count across turns (increments only when tool_result.ok is True AND payload is not None). Before yielding a verdict, if successful_tool_count is 0 AND the model's root_cause is not already insufficient_data, replace the verdict with the insufficient_data synthetic verdict + drop any recommendations (no evidence == no actions). Also tighten the fula-ota SSE schema to require root_cause on the wire (was schema-optional) so the contract matches the parser. Two new tests pin the guardrail behaviour (overrides on no data; passes through on healthy session). The existing force-verdict test was updated to include a successful tool call so its intent (directive triggers a model verdict) is preserved without tripping the guardrail. Net 60 tests, all passing. Companion change in apps/box (separate commit) adds insufficient_data to SYNTHETIC_VERDICT_CODES + a friendlier copy for SCHEMA_VIOLATION_RECOVERED so the Try-Again CTA surfaces in both newly-covered paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 58d2eae commit c919a1d

2 files changed

Lines changed: 232 additions & 54 deletions

File tree

src/runtime/rkllm_runtime.py

Lines changed: 77 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -604,28 +604,6 @@ def destroy(self) -> None:
604604
- `fula.service`, `uniondrive.service` — host systemd services
605605
If `diag/containers` returns the above names, that is the COMPLETE list — there is no kubelet, no kube-proxy, no etcd, no apiserver. If the user reports "not earning", look at the actual diag/heartbeat + diag/relay + diag/wireguard signals you see in tool results, NOT at hypothetical Kubernetes components.
606606
607-
# BAD vs GOOD examples
608-
609-
❌ BAD — prose recommendations get NO Approve button, user can take no action:
610-
611-
### Tier 2 Actions:
612-
1. **ntp.resync** - Resync the clock.
613-
2. **docker.restart container=ipfs_host** - Restart the container.
614-
615-
✅ GOOD — each recommendation is its own XML block:
616-
617-
<recommendation>{{"action_name":"ntp.resync","args":{{}},"reasoning":"Clock is unsynced.","confidence":0.85,"tier":2}}</recommendation>
618-
<recommendation>{{"action_name":"docker.restart","args":{{"container":"ipfs_host"}},"reasoning":"ipfs_host restart-looping.","confidence":0.75,"tier":2}}</recommendation>
619-
620-
❌ BAD — making up a field:
621-
622-
"Time Status: Clock offset is significant (93 ms)"
623-
(when tool_response actually said `time.status: green, synced: true` and the 93 was `internet.latency_ms_avg`)
624-
625-
✅ GOOD — quote what you actually read:
626-
627-
"time.status is green (synced=true). internet.latency_ms_avg is 93ms — that's network latency, not a clock offset."
628-
629607
# AVAILABLE TOOLS (read-only)
630608
631609
{tool_list}
@@ -639,26 +617,27 @@ def destroy(self) -> None:
639617
- restart_fula — no args (tier 2)
640618
- reset — no args (tier 3, destructive — only after everything else)
641619
642-
# FULL EXAMPLE — three-turn flow
620+
# RUNBOOK EXCERPTS
643621
644-
Turn 1 (user said "device feels slow"):
645-
I'll start with a system summary.
646-
<tool_call>{{"name":"diag/summary","arguments":{{}}}}</tool_call>
622+
{runbook_excerpt}
647623
648-
Turn 2 (after <tool_response> showing overall=red, internet=red, time=red):
649-
Internet and time are both red. Drilling in on internet.
650-
<tool_call>{{"name":"diag/internet","arguments":{{}}}}</tool_call>
624+
# FIRST ACTION
651625
652-
Turn 3 (after <tool_response> showing dns_ok=true, https_discovery_ok=false):
653-
Discovery is unreachable; the clock being unsynced makes it worse. Re-sync NTP first.
654-
<verdict>{{"summary":"Discovery unreachable + clock unsynced.","severity":"red","root_cause":"discovery_https_unreachable"}}</verdict>
655-
<recommendation>{{"action_name":"ntp.resync","args":{{}},"reasoning":"Many discovery checks rely on accurate timestamps; resync first.","confidence":0.8,"tier":2}}</recommendation>
626+
Your VERY FIRST output token MUST start a <tool_call> block. Do NOT
627+
repeat these instructions back. Do NOT acknowledge. Do NOT explain
628+
what you are about to do. Start with `<tool_call>{{"name":"diag/`.
656629
657-
# RUNBOOK EXCERPTS
630+
# VERDICT RULES
658631
659-
{runbook_excerpt}
632+
After tool calls, emit ONE <verdict> with ALL of summary, severity,
633+
AND root_cause. NEVER omit root_cause. NEVER restate an example
634+
verdict from your training — base every verdict on the actual
635+
<tool_response> data you saw in THIS conversation.
660636
661-
Be terse. Start with diag/summary unless the user named a specific symptom. Two or three tool calls, then finalize with a <verdict>."""
637+
If your tool calls all returned `ok:false` or no diagnostic data,
638+
your verdict MUST be:
639+
<verdict>{{"summary":"Could not gather diagnostic data; please try again.","severity":"yellow","root_cause":"insufficient_data"}}</verdict>
640+
"""
662641

663642

664643
# Directive injected as a user message when the model has stalled
@@ -856,12 +835,22 @@ def parse_verdict(raw_text: str) -> Optional[dict]:
856835
return None
857836
summary = obj.get("summary")
858837
severity = obj.get("severity")
838+
root_cause = obj.get("root_cause")
859839
if not isinstance(summary, str) or severity not in ("green", "yellow", "red"):
860840
return None
861-
out = {"summary": summary[:500], "severity": severity}
862-
if isinstance(obj.get("root_cause"), str):
863-
out["root_cause"] = obj["root_cause"][:200]
864-
return out
841+
# root_cause is now REQUIRED — lab 2026-05-28: Qwen3-1.7B was emitting
842+
# verdicts without root_cause, the SSE schema accepted them (root_cause
843+
# was schema-optional), and the app's synthetic-verdict detector
844+
# therefore never offered the Try-Again button. Rejecting here forces
845+
# the run_troubleshoot loop into its no-verdict fallback path which
846+
# synthesizes a proper verdict with a root_cause the app can detect.
847+
if not isinstance(root_cause, str) or not root_cause.strip():
848+
return None
849+
return {
850+
"summary": summary[:500],
851+
"severity": severity,
852+
"root_cause": root_cause[:200],
853+
}
865854

866855

867856
def parse_recommendations(raw_text: str) -> list[dict]:
@@ -1150,6 +1139,16 @@ async def run_troubleshoot(
11501139
# fallback). Diagnostic data is already in KV cache from prior
11511140
# turns, so the retry only needs to format the conclusion.
11521141
next_thinking: bool = self._enable_thinking
1142+
# successful_tool_count: incremented per tool_result with ok=True
1143+
# AND payload not None. Used by the no-data guardrail below the
1144+
# tool-call loop to suppress speculative verdicts (lab transcript
1145+
# 2026-05-28 not-earning scenario: model emitted
1146+
# "Internet slow + clock unsynced" verdict despite both tool
1147+
# calls returning ok=False payload=None — pattern-copied the
1148+
# GOOD example from the system prompt). Zero successful tools
1149+
# means we have no actual evidence for any severity-red claim;
1150+
# override with insufficient_data.
1151+
successful_tool_count: int = 0
11531152

11541153
for turn in range(MAX_TURNS):
11551154
# Last-chance: at MAX_TURNS-1 without a verdict, send the
@@ -1276,6 +1275,13 @@ async def run_troubleshoot(
12761275
tool_responses_for_context.append(
12771276
f"<tool_response>{json.dumps({'name': tc['tool'], 'result': result}, separators=(',', ':'))}</tool_response>"
12781277
)
1278+
# Track successful evidence count for the no-data
1279+
# verdict guardrail (see successful_tool_count
1280+
# initialization comment + the guardrail block
1281+
# before the `if verdict and not emitted_verdict`
1282+
# yield below).
1283+
if result is not None:
1284+
successful_tool_count += 1
12791285
else:
12801286
tr_event = {
12811287
"type": "tool_result",
@@ -1309,6 +1315,37 @@ async def run_troubleshoot(
13091315
"content": next_content,
13101316
})
13111317

1318+
# No-evidence guardrail: if the model produced a verdict
1319+
# but NO tool call has ever returned actual data (all ok=False
1320+
# or all payload=None), suppress the speculative claim and
1321+
# replace with insufficient_data. Lab 2026-05-28: the
1322+
# not-earning scenario hit exactly this — model emitted
1323+
# `severity:red, summary:"Internet slow + clock unsynced"`
1324+
# while both tool_results in the transcript had payload:null
1325+
# (ok=False). Without this guardrail the user gets a
1326+
# confident-but-fabricated diagnosis. The override leaves the
1327+
# door open for the Try-Again button via the
1328+
# `insufficient_data` root_cause (apps/box's
1329+
# SYNTHETIC_VERDICT_CODES). Skip the override when the model
1330+
# ALREADY chose insufficient_data — that's the honest case.
1331+
if (
1332+
verdict
1333+
and not emitted_verdict
1334+
and successful_tool_count == 0
1335+
and verdict.get("root_cause") != "insufficient_data"
1336+
):
1337+
logger.warning(
1338+
"verdict_no_evidence_override original_severity=%s original_root_cause=%s",
1339+
verdict.get("severity"),
1340+
verdict.get("root_cause"),
1341+
)
1342+
verdict = {
1343+
"summary": "Could not gather diagnostic data; please try again.",
1344+
"severity": "yellow",
1345+
"root_cause": "insufficient_data",
1346+
}
1347+
recommendations = [] # never propose actions without evidence
1348+
13121349
# Emit verdict (once)
13131350
if verdict and not emitted_verdict:
13141351
emitted_verdict = True

0 commit comments

Comments
 (0)