Skip to content

Commit 6707ed8

Browse files
mios-devclaude
andcommitted
phase-1 native multi-agent: structured Compose handoff from Hermes session
Operator directives 2026-05-18 stacked into one architectural shift: * "this all seems HARDCODED!!! HOW WOULD THIS MULTI_AGENTIC_REASONING WORK NATIVELY!??? RESEARCH!" * "make sure this is ALL ALSO OpenAI API COMPLIANT and COMPLETELY FUnctional on a Bootc Bootable OCI MiOS image" * "kanban too!!" * "and shared global scratpad(s) in mutable locations" * "OWUI Knowledge/Memory enabled and compression/artifacting" Research findings (full doc at usr/share/mios/docs/multi-agent- architecture.md, also auto-registered into the MiOS Documentation RAG collection at deploy time so the agent itself can cite it): The 2026 multi-agent best practice is structured tool_call / tool_result handoff (Anthropic tool_use, AWS Strands, Agno reasoning agents, LangGraph supervisor pattern, CrewAI critic- actor loop). The current MiOS pipe was the opposite: it consumed Hermes's SSE TEXT stream and re-prompted polish with the same text blob. Polish then had to GUESS what happened, hence the regex tower (think/details/fence strip + KNOWN_AGENT_ERROR_RE + "NEVER report 'launched' unless..." ban lists). Phase 1 (this commit) -- structured Compose: * Pipe.HERMES_SESSIONS_DIR = /var/lib/mios/hermes/sessions/ (shared mutable scratch, image-immutable shim path). * _load_session_tool_history finds the freshest session JSON whose mtime > the dispatch moment, returns the OpenAI-format messages slice (user + every message after). * _render_tool_history_for_compose pairs assistant tool_calls with their matching role:tool results by tool_call_id, parses the MiOS-verb `success` JSON field, renders a compact event list: [{role, tool_call_id, tool, arguments, success, result_preview}, ...] * _polish_via_cpu takes a new `dispatch_ts` arg; when set, loads the structured history and APPENDS it to the polish system prompt under a "## STRUCTURED TOOL HISTORY" heading marked authoritative for success/fail reasoning. Compose model is told to cite events by tool name + success state, and to say explicitly when a planned step did NOT run rather than fabricating. * Pipe stamps _dispatch_ts = time.time() right before "🧠 → hermes" and threads it through to the polish call. The text-blob path stays as fallback when no session JSON is loadable (older sessions, race conditions, etc.) so the chain degrades gracefully. OpenAI compliance: kept the tool_call + role:tool message shape Hermes already records (standard Chat Completions tool_use schema). Any OpenAI-API-compatible model (Claude, GPT-*, local Ollama) can consume the structured input identically. Bootc / Day-0: * /usr/share/mios/owui/pipes/mios_agent_pipe.py: image-immutable * /var/lib/mios/hermes/sessions/: created by mios-hermes- firstboot at first boot (already) * /usr/share/mios/docs/multi-agent-architecture.md: image- immutable; registered into the MiOS Documentation OWUI knowledge collection at deploy time via mios-knowledge-add --replace * No /etc/ writes required for the new path. Acknowledged in the compose system prompt: kanban_* events reflect task state, memory_save / memory_search reflect durable context, skill_view / skill_manage reflect agent self-iteration, knowledge_search hits reflect OWUI RAG corpus -- all surface as events in the structured history with the same {tool, success, result_preview} shape. Phase 2-4 plan (in the research doc): 2. Explicit Critic Agent (iGPU micro-LLM qwen3:1.7b) reviews answer vs. structured tool history; bounded reflexion loop. 3. Refine emits JSON {intent, plan, success_criteria} not the INTENT/TOOLS/DELEGATE/PLAN label prose. 4. Drop regex post-processors -- Compose reasons over structure, not text. _KNOWN_AGENT_ERROR_RE / _DETAILS_BLOCK_RE / polish ban lists become unnecessary. Live verification: research doc auto-registered into the MiOS Documentation collection (id=8c721cc0...), attached to both mios-agent + mios_agent.mios-agent model rows. Pipe re-installed into OWUI db; OWUI restarted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 848d50a commit 6707ed8

2 files changed

Lines changed: 345 additions & 14 deletions

File tree

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# MiOS multi-agent architecture — research + migration plan
2+
3+
> Operator directive 2026-05-18: "this all seems HARDCODED!!! HOW
4+
> WOULD THIS MULTI_AGENTIC_REASONING WORK NATIVELY!??? RESEARCH!"
5+
6+
This doc captures the 2026 multi-agent patterns research and the
7+
concrete migration plan to move MiOS off regex-driven post-processing
8+
and onto native structured-handoff multi-agent reasoning.
9+
10+
## Current pipe (the thing being replaced)
11+
12+
```
13+
user prompt
14+
→ MiOS-Agent pipe (OWUI function)
15+
→ CPU REFINE (qwen2.5-coder:7b) ← text-out LABELS (INTENT/TOOLS/DELEGATE/PLAN)
16+
→ Hermes orchestrator (qwen3.5:4b) ← native tool_use, OpenAI tool_call schema
17+
→ terminal / web_search / delegate_task / ...
18+
→ CPU POLISH (qwen2.5-coder:7b) ← text-in TEXT BLOB of streamed hermes output
19+
→ operator-facing markdown
20+
```
21+
22+
The break: hermes streams its raw OpenAI-format text deltas to the
23+
pipe; the pipe wraps them in a `<details>` and re-prompts polish
24+
with the SAME text blob. Polish can't see which tool_call ran, what
25+
its arguments were, whether it succeeded, what its `tool_call_id`
26+
was. So polish has to GUESS. Hence the regex post-processing layer
27+
that grew up around it:
28+
29+
* `_THINK_TAG_RE` / `_DETAILS_BLOCK_RE` / `_LEADING_THOUGHT_RE` to
30+
strip model leaks from polish output
31+
* `_KNOWN_AGENT_ERROR_RE` to detect "the agent claimed X failed"
32+
and substitute a generic rewrite
33+
* `_OUTER_FENCE_RE` to unwrap `````markdown` ... ```` ` wrappers
34+
* `_STRUCTURED_MD_RE` heuristic to skip polish on already-clean
35+
tabular output
36+
* Polish system-prompt ban lists ("NEVER report 'launched' unless
37+
RAW OUTPUT contains tool_result success:true...")
38+
39+
Every one of these is downstream pollution from the text-blob
40+
handoff. The model can't reason about structure it doesn't see.
41+
42+
## What 2026 multi-agent best-practice looks like
43+
44+
Three patterns dominate (see Sources at bottom):
45+
46+
### Planner → Executor → Critic → Aggregator
47+
[Strands](https://aws.amazon.com/blogs/machine-learning/multi-agent-collaboration-patterns-with-strands-agents-and-amazon-nova/),
48+
[Agno](https://docs.agno.com/reasoning/reasoning-agents),
49+
[LangGraph supervisor pattern](https://www.digitalapplied.com/blog/agent-architecture-patterns-taxonomy-2026).
50+
Each agent has a single responsibility; hand-off via JSON schema.
51+
52+
### Actor + Critic reflection loop
53+
Reflexion / Self-Refine. Actor generates, critic scores against
54+
explicit criteria + provides feedback, actor revises. No regex
55+
post-processing — the critic is an LLM call over structured input.
56+
57+
### Structured tool_result handoff
58+
[Anthropic tool_use](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview)
59+
+ [OpenAI Responses API](https://platform.openai.com). Every tool
60+
invocation emits `{tool_call_id, name, arguments, output, success}`.
61+
Downstream agents read fields directly. No text-mangling.
62+
63+
## The gap for MiOS
64+
65+
* Hermes ALREADY does native tool_use (the session JSON at
66+
`/var/lib/mios/hermes/sessions/session_*.json` has the structured
67+
records: `tool_calls: [{function:{name, arguments}, id}]` paired
68+
with `role: tool` messages containing `tool_call_id` + content +
69+
`success: true|false`).
70+
* `delegate_task` ALREADY gives Hermes supervisor-worker fan-out.
71+
* `skill_view` / `skill_manage` give the agent introspection +
72+
self-modification.
73+
74+
What's MISSING:
75+
76+
1. **Structured handoff from Hermes to the compose layer.** The
77+
pipe consumes Hermes's SSE text stream and loses every
78+
tool_call boundary. Polish operates on text → has to guess.
79+
2. **No explicit Critic agent.** When something goes wrong, no
80+
structured "verdict + failed_steps" object — just the model's
81+
own self-flagellation in prose.
82+
3. **Refine emits LABELS not JSON.** INTENT/TOOLS/DELEGATE/PLAN
83+
are text fields; downstream agents string-parse them.
84+
85+
## Migration plan (incremental)
86+
87+
### Phase 1 (this commit): structured Compose input
88+
* Pipe reads the active Hermes session JSON at end-of-stream.
89+
* Extracts `[{tool_call_id, tool_name, arguments, output, success}]`.
90+
* Passes the LIST plus the user prompt to compose (was: polish) as
91+
a structured JSON blob in the system prompt.
92+
* Compose system prompt rewritten to reason over structured input
93+
(no regex; "step 1 used `mios-find` and returned success=true
94+
with output `<...>`; step 2 ran `web_extract` which returned
95+
success=false with error `<...>`; therefore the final answer
96+
reports step 1 succeeded and step 2 did not run").
97+
* The text-blob path stays as fallback if the session JSON is
98+
unavailable; the structured path takes precedence.
99+
100+
### Phase 2: Critic Agent
101+
* After Compose drafts a final answer, Critic Agent (qwen3:1.7b on
102+
iGPU per micro-LLM directive) scores it against the structured
103+
tool history: does the answer's success/fail claims match each
104+
tool_result's `success` field? Are all planned steps accounted
105+
for?
106+
* If verdict is "revise", Critic returns specific `failed_assertions:
107+
[...]`; Compose revises.
108+
* Loop bounded at 2 iterations (compose, critique, revise, done).
109+
110+
### Phase 3: Refine emits JSON
111+
* Refine prompt rewritten to output JSON conforming to a schema:
112+
`{intent: str, plan: [{tool: str, args: dict, success_criteria:
113+
str}], delegate: bool}`.
114+
* Compose reads `plan[].success_criteria` to evaluate each
115+
tool_result against the operator's actual intent (not just
116+
exit-code success).
117+
* This enables the Critic to give targeted feedback: "step 2's
118+
success_criteria was 'returns a URL'; tool_result was the
119+
search-only error -- step did not meet criteria".
120+
121+
### Phase 4: Drop regex post-processors
122+
* `_KNOWN_AGENT_ERROR_RE`, `_DETAILS_BLOCK_RE`, the polish
123+
ban-list lines, `_STRUCTURED_MD_RE` — all become unnecessary
124+
because Compose reasons over structure, not text.
125+
* `_strip_outer_md_fence` stays (it's a literal formatter fix,
126+
not behavioural).
127+
128+
## Phase 1 implementation note (shipped alongside this doc)
129+
130+
`mios_agent_pipe.py` Pipe class gains `_load_session_tool_history`
131+
that finds the session JSON matching the current chat by
132+
mtime-proximity. The compose call is gated: if structured input
133+
loaded, use the new compose-from-structure prompt; otherwise fall
134+
back to the legacy text-blob polish (so older sessions / failed
135+
session loads degrade gracefully).
136+
137+
## Sources
138+
139+
- [Multi-Agent Orchestration: Pattern Language 2026 — Digital Applied](https://www.digitalapplied.com/blog/multi-agent-orchestration-patterns-producer-consumer)
140+
- [Agent Architecture Patterns: 2026 Taxonomy Guide](https://www.digitalapplied.com/blog/agent-architecture-patterns-taxonomy-2026)
141+
- [Multi-Agent System Patterns: A Unified Guide — mjgmario / Medium](https://medium.com/@mjgmario/multi-agent-system-patterns-a-unified-guide-to-designing-agentic-architectures-04bb31ab9c41)
142+
- [Multi-Agent collaboration patterns with Strands Agents and Amazon Nova — AWS](https://aws.amazon.com/blogs/machine-learning/multi-agent-collaboration-patterns-with-strands-agents-and-amazon-nova/)
143+
- [Hierarchical Planner AI Agent with Open-Source LLMs and Structured Multi-Agent Reasoning — MarkTechPost](https://www.marktechpost.com/2026/02/27/a-coding-implementation-to-build-a-hierarchical-planner-ai-agent-using-open-source-llms-with-tool-execution-and-structured-multi-agent-reasoning/)
144+
- [Reasoning Agents — Agno docs](https://docs.agno.com/reasoning/reasoning-agents)
145+
- [Tool use with Claude — Claude API Docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview)
146+
- [Building Effective AI Agents — Anthropic](https://resources.anthropic.com/building-effective-ai-agents)
147+
- [Anthropic introduces "dreaming," learning across sessions — VentureBeat](https://venturebeat.com/technology/anthropic-introduces-dreaming-a-system-that-lets-ai-agents-learn-from-their-own-mistakes)
148+
- [AI Trends 2026: Test-Time Reasoning and the Rise of Reflective Agents — HuggingFace](https://huggingface.co/blog/aufklarer/ai-trends-2026-test-time-reasoning-reflective-agen)
149+
- [Customize agent workflows with Strands Agents — AWS](https://aws.amazon.com/blogs/machine-learning/customize-agent-workflows-with-advanced-orchestration-techniques-using-strands-agents/)
150+
- [LangGraph vs CrewAI vs AutoGen: Complete Multi-Agent Orchestration Guide for 2026](https://pockit.tools/blog/langgraph-crewai-autogen-multi-agent-orchestration-guide/)

0 commit comments

Comments
 (0)