phase-2 multi-agent: Critic Agent reflexion loop over compose draft

mios-dev · claude · mios-dev · commit 353291b25f95 · 2026-05-18T00:46:03.000-04:00
Continues the migration plan from
usr/share/mios/docs/multi-agent-architecture.md (phase 1 was
structured Compose handoff from Hermes session JSON; phase 2 adds
the actor + critic reflexion loop).

Architecture:
  refine -&gt; hermes -&gt; compose (draft)
                            -&gt; critic (reviews draft vs structured
                                       tool history; verdict JSON)
                            -&gt; [if revise] recompose with critic
                                           feedback fed back
                            -&gt; ship

Critic Agent (NEW):
* CRITIC_ENABLED valve (default True). When ON and the pipe loaded
  structured tool history from the Hermes session, the critic
  passes the {user_ask, structured_history, compose_draft} to a
  small model (qwen3:1.7b on iGPU per operator's "iGPU = micro-LLM
  only" directive). Output: JSON {verdict: approve|revise, issues:
  [...]}.
* Checks for: false success claims (claim "launched X" when the
  tool_result for that call had success=false), fabricated steps
  (claim a step completed when no tool_call for it exists), wrong
  tool attribution, missing critical info that IS in the history,
  fabricated specifics (paths/ids/numbers not anywhere in any
  tool_result).
* Format=json mode (Ollama structured output) so the parse step
  is robust.
* Fail-open: any HTTP/timeout/parse error returns {} -&gt; draft
  ships unchanged. The critic NEVER blocks the response.

Reflexion loop:
* CRITIC_MAX_ITERATIONS valve (default 1, bounded).
* If critic verdict=revise, _recompose_with_critic_feedback re-runs
  compose with the issues list appended to the system prompt: "Your
  previous draft had these issues -- FIX each one in the revised
  answer". Compose sees the previous draft + the structured history
  + the specific issues -&gt; writes a corrected answer.
* Re-strip fence + reasoning leaks on the revised text.
* CRITIC_MAX_ITERATIONS=0 means "audit-only" (run critic, never
  revise) -- useful for telemetry without behaviour change.

Status emits (sanitized per the earlier global sweep; no English
narrative, no model names):
  🧑‍⚖️ critic                       -- review started
  🧑‍⚖️ ✓ critic approve              -- draft passes ground truth
  🧑‍⚖️ ✎ critic: &lt;N&gt; issue(s) → revise -- compose re-runs
  🧑‍⚖️ ⚠ critic err → ship draft     -- failure mode (rare)

Why this matters (vs. the regex tower it replaces):
  * _KNOWN_AGENT_ERROR_RE was guessing from text whether the agent
    had really failed; the critic READS the structured success
    field directly. No false positives on real successes.
  * "NEVER report 'launched' unless tool_result success:true" ban
    list in the polish prompt was relying on the model to remember
    a rule; the critic enforces it externally.
  * The pattern is OpenAI-API-compliant (standard Chat Completions
    structured-output + tool_use messages -- works against any
    OpenAI-compatible backend, not just local Ollama).
  * Day-0 from clone: all valves, methods, and prompts ship in
    /usr/share/mios/owui/pipes/mios_agent_pipe.py (image-immutable).
    First chat with this pipe triggers Ollama to load qwen3:1.7b
    (1.4 GB; lands on iGPU when wsl2-amd.yaml CDI spec is present
    per the earlier iGPU passthrough commit).

Operator-requested cache wipe (--all) also done as the prelude:
  * 0 chats / messages / memories
  * Preserved: 1 admin user, 2 models (with params + meta.knowledge
    bindings), 2 tools (with full specs), 3 functions, the MiOS
    Documentation knowledge collection (33 files including the
    multi-agent research doc).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/usr/share/mios/owui/pipes/mios_agent_pipe.py b/usr/share/mios/owui/pipes/mios_agent_pipe.py
@@ -199,6 +199,32 @@ class Valves(BaseModel):
             default=240,
             description="If the raw agent output is shorter than this and contains no narration markers, pass through unpolished -- no value in spinning up the CPU model for a one-liner result.",
         )
+        # ── Phase-2 Critic loop (see docs/multi-agent-architecture.md)
+        # The Critic Agent reviews the compose draft against the
+        # structured tool history. If the draft claims success on a
+        # tool that failed, claims a step ran when no tool_call for
+        # it exists, or otherwise mismatches the structured truth,
+        # critic returns issues; compose revises once. Bounded loop.
+        CRITIC_ENABLED: bool = Field(
+            default=True,
+            description="After compose drafts an answer, run a small Critic Agent that reviews against the structured tool history. Catches false-success claims, missing steps, fabrications -- replaces the hardcoded KNOWN_AGENT_ERROR_RE rewrite path with a natural multi-agent reflexion loop.",
+        )
+        CRITIC_MODEL: str = Field(
+            default="qwen3:1.7b",
+            description="Small model for the critic pass. Per operator directive 2026-05-17 'iGPU's are ONLY micro-llms' -- micro-LLMs land on the AMD/Intel iGPU CDI lane when present, leaving the dGPU free for big-model work. qwen3:1.7b ~1.4 GB.",
+        )
+        CRITIC_TIMEOUT_S: int = Field(
+            default=45,
+            description="Cap the critic call. Small model + structured input + JSON-only output = sub-10s typical; 45s is the safety ceiling.",
+        )
+        CRITIC_MAX_TOKENS: int = Field(
+            default=300,
+            description="Cap critic output. JSON verdict + issue list fits comfortably.",
+        )
+        CRITIC_MAX_ITERATIONS: int = Field(
+            default=1,
+            description="Max revise cycles. 0 = run critic but never revise (audit-only). 1 = one revision pass (compose, critique, revise, done). Bounded reflexion -- never infinite loop.",
+        )
         AGENT_THINKING_LABEL: str = Field(
             default="🧠 MiOS-Hermes",
             description="The <summary> rendered above the collapsed reasoning block. Per-agent label so the operator can tell which agent (hermes / opencode / etc.) produced the thinking. Kept short + symbol-led so it reads the same across operator locales (operator directive 2026-05-17 GLOBAL SWEEP for hardcoded English).",
@@ -1017,8 +1043,183 @@ async def _polish_via_cpu(
         # Strip <think>...</think> + leading "Thought" leaks.
         polished = self._strip_reasoning_leaks(polished)
 
+        # ── Phase-2 Critic reflexion loop ──────────────────────────
+        # Only runs when we HAVE structured tool history to reason
+        # over (the critic's whole value-add is checking the draft
+        # against ground truth); skipped on the text-blob fallback
+        # path. Bounded by CRITIC_MAX_ITERATIONS.
+        if (self.valves.CRITIC_ENABLED and tool_history_json
+                and int(self.valves.CRITIC_MAX_ITERATIONS) > 0):
+            for attempt in range(int(self.valves.CRITIC_MAX_ITERATIONS)):
+                verdict = await self._critic_via_cpu(
+                    user_text, polished, tool_history_json, emitter,
+                )
+                if verdict.get("verdict") == "approve":
+                    await self._emit(emitter, "🧑‍⚖️ ✓ critic approve")
+                    break
+                issues = verdict.get("issues") or []
+                if not issues:
+                    break
+                await self._emit(emitter,
+                    f"🧑‍⚖️ ✎ critic: {len(issues)} issue(s) → revise")
+                # Re-compose with the critic's issues fed back.
+                polished = await self._recompose_with_critic_feedback(
+                    user_text, raw_output, tool_history_json, polished,
+                    issues, emitter,
+                ) or polished
+                polished = self._strip_outer_md_fence(polished)
+                polished = self._strip_reasoning_leaks(polished)
         return polished
 
+    _CRITIC_SYSTEM = (
+        "You are a Critic Agent. The Compose Agent drafted the answer\n"
+        "below. Your job: check the draft against the structured tool\n"
+        "history (authoritative ground truth). Spot:\n"
+        "  1. Claims of success on tools whose tool_result had success=false\n"
+        "  2. Claims of completion for steps NOT present in the history\n"
+        "  3. Fabricated specifics (paths/ids/numbers not in any result)\n"
+        "  4. Wrong tool attribution (saying tool X was used when Y ran)\n"
+        "  5. Missing critical info that IS in the history\n"
+        "\n"
+        "Output JSON ONLY:\n"
+        "  {\"verdict\": \"approve\" | \"revise\",\n"
+        "   \"issues\": [\"<one-line issue>\", ...]}\n"
+        "\n"
+        "If draft accurately reflects the history, verdict=\"approve\",\n"
+        "issues=[]. Otherwise verdict=\"revise\", issues=[ specific\n"
+        "actionable items the Compose Agent should fix ].\n"
+        "Be terse. NO prose preamble.\n"
+    )
+
+    async def _critic_via_cpu(
+        self,
+        user_text: str,
+        draft: str,
+        tool_history_json: str,
+        emitter: Optional[Callable[..., Awaitable[None]]],
+    ) -> dict:
+        """Critic pass over compose draft. Returns the parsed JSON
+        verdict; returns {} on any error (fail-open: skip revision,
+        ship the draft)."""
+        await self._emit(emitter, "🧑‍⚖️ critic")
+        user_msg = (
+            f"## OPERATOR ASK\n{user_text[:1500]}\n\n"
+            f"## STRUCTURED TOOL HISTORY (authoritative)\n"
+            f"{tool_history_json[:6000]}\n\n"
+            f"## DRAFT ANSWER (from Compose Agent)\n"
+            f"{draft[:4000]}\n\n"
+            "## VERDICT (JSON only):"
+        )
+        body = {
+            "model": self.valves.CRITIC_MODEL,
+            "messages": [
+                {"role": "system", "content": self._CRITIC_SYSTEM},
+                {"role": "user",   "content": user_msg},
+            ],
+            "options": {
+                "num_gpu": 0,
+                "num_predict": int(self.valves.CRITIC_MAX_TOKENS),
+                "temperature": 0.0,
+            },
+            "format": "json",
+            "keep_alive": -1,
+            "stream": False,
+        }
+        try:
+            timeout = aiohttp.ClientTimeout(total=int(self.valves.CRITIC_TIMEOUT_S))
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.post(
+                    self.valves.REFINE_ENDPOINT.rstrip("/") + "/api/chat",
+                    data=json.dumps(body).encode(),
+                    headers={"Content-Type": "application/json"},
+                ) as resp:
+                    if resp.status != 200:
+                        return {}
+                    data = await resp.json()
+        except (asyncio.TimeoutError, aiohttp.ClientError):
+            await self._emit(emitter, "🧑‍⚖️ ⚠ critic err → ship draft")
+            return {}
+        except Exception:
+            return {}
+        msg = (data.get("message") or {})
+        content = (msg.get("content") or "").strip()
+        if not content:
+            return {}
+        # Strip code fences a chatty model might add.
+        content = re.sub(r"^\s*```(?:json)?\s*\n?", "", content)
+        content = re.sub(r"\n?```\s*$", "", content)
+        try:
+            parsed = json.loads(content)
+            if isinstance(parsed, dict):
+                return parsed
+        except json.JSONDecodeError:
+            pass
+        return {}
+
+    async def _recompose_with_critic_feedback(
+        self,
+        user_text: str,
+        raw_output: str,
+        tool_history_json: str,
+        prev_draft: str,
+        issues: list,
+        emitter: Optional[Callable[..., Awaitable[None]]],
+    ) -> str:
+        """Re-run compose with the critic's specific issue list fed
+        back in. The compose model sees:
+          * Original system prompt + structured history
+          * The previous draft
+          * The critic's list of issues to fix
+        Returns the revised answer; empty string on any failure (the
+        caller keeps the prev_draft in that case)."""
+        sys_content = self._POLISH_SYSTEM.format(
+            user_prompt=user_text[:2000],
+            raw_output=raw_output[:12000],
+        )
+        sys_content += (
+            "\n\n## STRUCTURED TOOL HISTORY (authoritative)\n"
+            f"{tool_history_json[:6000]}\n"
+            "\n## CRITIC FEEDBACK on your previous draft\n"
+            "Your previous draft had these issues -- FIX each one in\n"
+            "the revised answer. Use the structured history above to\n"
+            "ground every claim. Output the revised final answer only;\n"
+            "no preamble, no 'here is the revised version'.\n\n"
+        )
+        for i, issue in enumerate(issues, 1):
+            sys_content += f"  {i}. {str(issue)[:300]}\n"
+        sys_content += f"\n## PREVIOUS DRAFT\n{prev_draft[:6000]}\n"
+        body = {
+            "model": self.valves.POLISH_MODEL,
+            "messages": [
+                {"role": "system", "content": sys_content},
+                {"role": "user", "content": "Emit the revised final answer."},
+            ],
+            "options": {
+                "num_gpu": 0,
+                "num_predict": int(self.valves.POLISH_MAX_TOKENS),
+                "temperature": 0.0,
+            },
+            "keep_alive": -1,
+            "stream": False,
+        }
+        try:
+            timeout = aiohttp.ClientTimeout(total=int(self.valves.POLISH_TIMEOUT_S))
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.post(
+                    self.valves.REFINE_ENDPOINT.rstrip("/") + "/api/chat",
+                    data=json.dumps(body).encode(),
+                    headers={"Content-Type": "application/json"},
+                ) as resp:
+                    if resp.status != 200:
+                        return ""
+                    data = await resp.json()
+        except (asyncio.TimeoutError, aiohttp.ClientError):
+            return ""
+        except Exception:
+            return ""
+        msg = (data.get("message") or {})
+        return (msg.get("content") or msg.get("thinking") or "").strip()
+
     # Match a leading ```markdown / ``` fence. The closing ``` is
     # OPTIONAL because polish sometimes truncates mid-output (token
     # cap hit on a long table) -- in that case there's an open fence