Skip to content

Commit 848d50a

Browse files
mios-devclaude
andcommitted
strip stacked <details> from polish + ban false-success reporting
Operator chat 2026-05-18: "open browser to youtube AND research future tech" -> polished answer had: * TWO stacked <details type="reasoning"> blocks (one the pipe wrapped, one the polish model added on its own) * Falsely reported both steps complete when only ONE tool call (a failing web_extract) actually ran -- no mios-open-url, no web_search Two fixes: 1. _strip_reasoning_leaks now also strips <details>...</details> blocks (new _DETAILS_BLOCK_RE, applied before the existing <think>/<reasoning> strip). The pipe wraps agent thinking in ITS OWN <details type="reasoning"> ABOVE the polished answer; the polish model must NEVER emit its own. Unit-tested 6/6: complete <details>, bare <details>, <details> with attributes, stacked <details>+<think>, plus the regression case ("plain answer no details"). 2. Polish system prompt gains explicit rules: * "NEVER emit <details> in your output. The pipe wraps agent thinking in its own block above your answer. Adding another one stacks them and the operator sees two expand-arrows." * "NEVER report an action as 'successful' / 'completed' / 'opened' / 'launched' / 'posted' / 'sent' unless RAW OUTPUT contains the matching tool_result with success:true. If a planned step did not run or did not succeed, SAY SO -- 'Step 2 (web_search) did not run' or 'Step 1 (mios-open-url) returned exit 1: <err>'." Quotes the YouTube/web-search false-success as the case study. mios-owui-install-pipe re-ran -> OWUI db function.content carries the new polish prompt + strip logic. Live restart confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f5db8f0 commit 848d50a

1 file changed

Lines changed: 30 additions & 3 deletions

File tree

usr/share/mios/owui/pipes/mios_agent_pipe.py

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -566,6 +566,19 @@ async def _tail_watcher(
566566
" numbers, statuses, app names, registry coords, ports, sizes,\n"
567567
" timestamps, package names. If a field isn't in RAW OUTPUT,\n"
568568
" don't write it.\n"
569+
"- NEVER report an action as 'successful' / 'completed' / 'opened' /\n"
570+
" 'launched' / 'posted' / 'sent' unless RAW OUTPUT contains the\n"
571+
" matching tool_result with success:true. Operator-flagged\n"
572+
" 2026-05-18: polish claimed 'Open YouTube: Successfully opened\n"
573+
" YouTube' + 'Web Search: Proceeded with the web search' when\n"
574+
" only one (failing) web_extract call had actually run. Reporting\n"
575+
" steps that did NOT execute is a defect. If a planned step did\n"
576+
" not run or did not succeed, SAY SO -- 'Step 2 (web_search) did\n"
577+
" not run' or 'Step 1 (mios-open-url) returned exit 1: <err>'.\n"
578+
"- NEVER emit <details> in your output. The pipe wraps agent\n"
579+
" thinking in its own <details type=\"reasoning\"> block above\n"
580+
" your answer. Adding another one stacks them and the operator\n"
581+
" sees two expand-arrows. Plain markdown only.\n"
569582
"- Strip narration. Phrases like \"Let me\", \"I'll\", \"First\n"
570583
" I...\", \"Now I'll\", \"Let me check\" are FORBIDDEN in your\n"
571584
" output. The operator wants the result, not the reasoning.\n"
@@ -855,6 +868,16 @@ async def _polish_via_cpu(
855868
_LEADING_THOUGHT_RE = re.compile(
856869
r"^\s*(?:thought|thinking|reasoning)\s*\n+", re.I,
857870
)
871+
# Polish sometimes emits an additional <details type="reasoning">
872+
# block in its output, on top of the agent-thinking <details> the
873+
# pipe already wrapped. Operator-flagged 2026-05-18: chat showed
874+
# two stacked <details> blocks. The polished answer must NEVER
875+
# contain a <details>; that wrapper is the pipe's job, not the
876+
# polish model's.
877+
_DETAILS_BLOCK_RE = re.compile(
878+
r"<\s*details[^>]*>.*?<\s*/\s*details\s*>",
879+
re.S | re.I,
880+
)
858881

859882
def _strip_outer_md_fence(self, text: str) -> str:
860883
"""If the entire response is wrapped in a ```markdown ... ```
@@ -872,9 +895,13 @@ def _strip_outer_md_fence(self, text: str) -> str:
872895
return inner
873896

874897
def _strip_reasoning_leaks(self, text: str) -> str:
875-
"""Remove <think>...</think> + sibling reasoning tags that the
876-
polish model occasionally emits despite the system-prompt rule
877-
against narration. Operator-flagged 2026-05-17."""
898+
"""Remove <think>/<reasoning>/<details type="reasoning"> tags
899+
the polish model occasionally emits despite the system prompt
900+
rule against narration. Operator-flagged 2026-05-17 (think)
901+
+ 2026-05-18 (details). The pipe wraps the AGENT thinking in
902+
its own <details> block above the polished answer; the polish
903+
model must NEVER emit its own."""
904+
text = self._DETAILS_BLOCK_RE.sub("", text)
878905
text = self._THINK_TAG_RE.sub("", text)
879906
text = self._LEADING_THOUGHT_RE.sub("", text)
880907
return text.strip()

0 commit comments

Comments
 (0)