strip stacked <details> from polish + ban false-success reporting

mios-dev · claude · mios-dev · commit 848d50acb8e5 · 2026-05-18T00:29:25.000-04:00
Operator chat 2026-05-18: "open browser to youtube AND research
future tech" -&gt; polished answer had:
  * TWO stacked &lt;details type="reasoning"&gt; blocks (one the pipe
    wrapped, one the polish model added on its own)
  * Falsely reported both steps complete when only ONE tool call
    (a failing web_extract) actually ran -- no mios-open-url, no
    web_search

Two fixes:

1. _strip_reasoning_leaks now also strips &lt;details&gt;...&lt;/details&gt;
   blocks (new _DETAILS_BLOCK_RE, applied before the existing
   &lt;think&gt;/&lt;reasoning&gt; strip). The pipe wraps agent thinking in
   ITS OWN &lt;details type="reasoning"&gt; ABOVE the polished answer;
   the polish model must NEVER emit its own. Unit-tested 6/6:
   complete &lt;details&gt;, bare &lt;details&gt;, &lt;details&gt; with attributes,
   stacked &lt;details&gt;+&lt;think&gt;, plus the regression case ("plain
   answer no details").

2. Polish system prompt gains explicit rules:
   * "NEVER emit &lt;details&gt; in your output. The pipe wraps agent
     thinking in its own block above your answer. Adding another
     one stacks them and the operator sees two expand-arrows."
   * "NEVER report an action as 'successful' / 'completed' /
     'opened' / 'launched' / 'posted' / 'sent' unless RAW OUTPUT
     contains the matching tool_result with success:true. If a
     planned step did not run or did not succeed, SAY SO -- 'Step
     2 (web_search) did not run' or 'Step 1 (mios-open-url)
     returned exit 1: &lt;err&gt;'."
   Quotes the YouTube/web-search false-success as the case study.

mios-owui-install-pipe re-ran -&gt; OWUI db function.content carries
the new polish prompt + strip logic. Live restart confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/usr/share/mios/owui/pipes/mios_agent_pipe.py b/usr/share/mios/owui/pipes/mios_agent_pipe.py
@@ -566,6 +566,19 @@ async def _tail_watcher(
         "  numbers, statuses, app names, registry coords, ports, sizes,\n"
         "  timestamps, package names. If a field isn't in RAW OUTPUT,\n"
         "  don't write it.\n"
+        "- NEVER report an action as 'successful' / 'completed' / 'opened' /\n"
+        "  'launched' / 'posted' / 'sent' unless RAW OUTPUT contains the\n"
+        "  matching tool_result with success:true. Operator-flagged\n"
+        "  2026-05-18: polish claimed 'Open YouTube: Successfully opened\n"
+        "  YouTube' + 'Web Search: Proceeded with the web search' when\n"
+        "  only one (failing) web_extract call had actually run. Reporting\n"
+        "  steps that did NOT execute is a defect. If a planned step did\n"
+        "  not run or did not succeed, SAY SO -- 'Step 2 (web_search) did\n"
+        "  not run' or 'Step 1 (mios-open-url) returned exit 1: <err>'.\n"
+        "- NEVER emit <details> in your output. The pipe wraps agent\n"
+        "  thinking in its own <details type=\"reasoning\"> block above\n"
+        "  your answer. Adding another one stacks them and the operator\n"
+        "  sees two expand-arrows. Plain markdown only.\n"
         "- Strip narration. Phrases like \"Let me\", \"I'll\", \"First\n"
         "  I...\", \"Now I'll\", \"Let me check\" are FORBIDDEN in your\n"
         "  output. The operator wants the result, not the reasoning.\n"
@@ -855,6 +868,16 @@ async def _polish_via_cpu(
     _LEADING_THOUGHT_RE = re.compile(
         r"^\s*(?:thought|thinking|reasoning)\s*\n+", re.I,
     )
+    # Polish sometimes emits an additional <details type="reasoning">
+    # block in its output, on top of the agent-thinking <details> the
+    # pipe already wrapped. Operator-flagged 2026-05-18: chat showed
+    # two stacked <details> blocks. The polished answer must NEVER
+    # contain a <details>; that wrapper is the pipe's job, not the
+    # polish model's.
+    _DETAILS_BLOCK_RE = re.compile(
+        r"<\s*details[^>]*>.*?<\s*/\s*details\s*>",
+        re.S | re.I,
+    )
 
     def _strip_outer_md_fence(self, text: str) -> str:
         """If the entire response is wrapped in a ```markdown ... ```
@@ -872,9 +895,13 @@ def _strip_outer_md_fence(self, text: str) -> str:
         return inner
 
     def _strip_reasoning_leaks(self, text: str) -> str:
-        """Remove <think>...</think> + sibling reasoning tags that the
-        polish model occasionally emits despite the system-prompt rule
-        against narration. Operator-flagged 2026-05-17."""
+        """Remove <think>/<reasoning>/<details type="reasoning"> tags
+        the polish model occasionally emits despite the system prompt
+        rule against narration. Operator-flagged 2026-05-17 (think)
+        + 2026-05-18 (details). The pipe wraps the AGENT thinking in
+        its own <details> block above the polished answer; the polish
+        model must NEVER emit its own."""
+        text = self._DETAILS_BLOCK_RE.sub("", text)
         text = self._THINK_TAG_RE.sub("", text)
         text = self._LEADING_THOUGHT_RE.sub("", text)
         return text.strip()