lines.push(` rules:`);
lines.push(` - { when: "task.type == 'classify'", use: "cerebras/llama-3.1-70b" }`);
lines.push(` - { when: "context.tokens > 200000", use: "google/gemini-2.5-pro" }`);
- lines.push(` - { when: "task.type == 'code'", use: "moonshot/kimi-k2.5" }`);
- lines.push(` - { when: "task.explicit_opt_in == 'sonnet'", use: "anthropic/claude-sonnet-4-5" }`);
+ lines.push(` - { when: "task.type == 'code'", use: "moonshot/kimi" }`);
+ lines.push(` - { when: "task.explicit_opt_in == 'sonnet'", use: "anthropic/claude-sonnet" }`);
lines.push(` - { else: true, use: "google/gemini-2.5-flash" }`);
lines.push(``);
}
diff --git a/part12-web-dashboard.md b/part12-web-dashboard.md
index 37b6385..a269d92 100644
--- a/part12-web-dashboard.md
+++ b/part12-web-dashboard.md
@@ -1,6 +1,6 @@
# Part 12: The Local Web Dashboard (Stop Editing YAML)
-*New in Hermes v0.9.0 (2026.4.13). The easiest way to run Hermes — a full browser-based control panel for everything you used to do in the terminal.*
+*Introduced in v0.9 and substantially upgraded through v0.12. The dashboard is now a browser-based control panel plus an embedded real Hermes TUI, not just a YAML editor.*
---
@@ -8,16 +8,18 @@
Before v0.9, managing Hermes meant: edit `config.yaml`, export env vars, grep through logs, and use the CLI to inspect sessions. Great for power users. Terrible for anyone new.
-The new **web dashboard** (`hermes dashboard`) replaces all of that with a single browser UI:
+The **web dashboard** (`hermes dashboard`) replaces most of that with a single browser UI:
-- Live status of the gateway and all 16 platform adapters
+- Live status of the gateway and all built-in/plugin platform adapters
+- Browser Chat backed by the real `hermes --tui`
- Form-based editor for every config field (all 150+ of them, auto-discovered from `DEFAULT_CONFIG`)
+- Models tab for main + auxiliary model configuration
- API key manager for providers, tools, and platforms
- Full-text search across past sessions (FTS5)
- Log tailer with level/component filters
- Usage and cost analytics (daily token + cost breakdown, per-model)
- Cron job management
-- Skills and toolsets browser with enable/disable toggles
+- Skills, Curator, plugins, and toolsets browser with enable/disable toggles
Everything runs on `127.0.0.1` — no data leaves your machine.
@@ -33,13 +35,13 @@ That's it. It starts a local server and opens `http://127.0.0.1:9119` in your de
### Install the Dependencies (One Time)
-The dashboard uses FastAPI + Uvicorn + a React frontend:
+The dashboard uses FastAPI + Uvicorn + a React frontend. The Chat tab also needs PTY support:
```bash
-pip install hermes-agent[web]
+pip install 'hermes-agent[web,pty]'
```
-If you installed with `hermes-agent[all]`, you're already done. The frontend auto-builds on first launch if `npm` is available.
+If you installed with `hermes-agent[all]`, you're already done. The `web` extra brings FastAPI/Uvicorn; `pty` lets the Chat tab spawn `hermes --tui` behind a pseudo-terminal on Linux/macOS/WSL. The frontend auto-builds on first launch if `npm` is available.
### Options
@@ -48,6 +50,8 @@ If you installed with `hermes-agent[all]`, you're already done. The frontend aut
| `--port` | `9119` | Port to serve on |
| `--host` | `127.0.0.1` | Bind address |
| `--no-open` | — | Don't auto-open the browser |
+| `--insecure` | off | Permit non-localhost binding; dangerous without a proxy/auth |
+| `--tui` | off | Enable the in-browser Chat tab; also available via `HERMES_DASHBOARD_TUI=1` |
```bash
# Custom port
@@ -77,6 +81,18 @@ Live overview that auto-refreshes every 5 seconds:
This is the page you leave open on a second monitor.
+### Chat
+
+The Chat tab embeds the actual `hermes --tui` process through xterm.js. That matters: slash commands, approval prompts, clarify/sudo/secret prompts, skins, markdown streaming, tool-call cards, `/resume`, `/steer`, `/queue`, and TUI fixes appear here automatically because the dashboard is not maintaining a second chat implementation.
+
+Requirements:
+
+- Node.js for the Ink TUI bundle
+- `ptyprocess` via `pip install 'hermes-agent[pty]'`
+- POSIX PTY support: Linux, macOS, or WSL; native Windows Python is not supported for the embedded PTY
+
+Tip: launch from the Sessions page with the play icon to resume a past session directly into `/chat?resume=`.
+
### Config
Form-based editor for `config.yaml`. Fields are auto-discovered from `DEFAULT_CONFIG` and grouped into tabs:
@@ -88,6 +104,8 @@ Form-based editor for `config.yaml`. Fields are auto-discovered from `DEFAULT_CO
- **delegation** — subagent limits, reasoning effort
- **memory** — provider, context injection settings
- **approvals** — dangerous command mode (`ask` / `yolo` / `deny`)
+- **plugins** — enabled/disabled plugin allowlists
+- **curator** — schedule, pruning thresholds, pinned/archived behavior
Dropdowns for known-value fields (terminal backend, skin, approval mode). Toggles for booleans. Text inputs for everything else.
@@ -145,6 +163,17 @@ Usage and cost, computed from session history. Pick a time window (7 / 30 / 90 d
If you're on the Nous Portal Tool Gateway (Part 13), gateway tool usage shows up here too.
+### Models
+
+Use this page before you edit routing YAML by hand. It exposes:
+
+- Main model/provider selection
+- Auxiliary models for compression, vision, title generation, session search, and curator
+- Remote OpenRouter/Nous picker data when available
+- Per-model usage analytics so "cheap default, expensive opt-in" stays honest
+
+This is the fastest way to stop wasting your best model on background summaries.
+
### Cron
Create and manage scheduled agent prompts.
@@ -166,6 +195,23 @@ Browse, search, and toggle every skill and toolset.
- **Toggle** — enable/disable individual skills per session
- **Toolsets** — separate section showing built-in toolsets (file, web, browser), with active/inactive state, setup requirements, and the list of tools each one provides
+### Plugins
+
+Plugins ship disabled. Use the dashboard to review what was discovered from bundled, user, project, pip, and Nix sources before enabling anything with hooks/tools.
+
+Good first enables:
+
+- `observability/langfuse` — trace LLM/tool calls to Langfuse
+- `spotify` — native playback/queue/search tools
+- `google_meet` — join, transcribe, speak, and follow up on Meet calls
+- `hermes-achievements` — dashboard achievements from real session history
+
+Project-local plugins under `.hermes/plugins/` should stay disabled unless you trust the repository.
+
+### Curator
+
+v0.12 adds Curator controls for skill-library hygiene: run dry-runs, inspect proposed archives/merges, pin important skills, and review archived skills before restoring or deleting. See [Part 5](./part5-creating-skills.md#curator-v012-keep-the-skill-library-from-rotting) and [Part 22](./part22-latest-power-moves.md#1-turn-on-curator-before-your-skill-library-becomes-noise).
+
---
## `/reload` — Pick Up `.env` Changes Live
diff --git a/part13-tool-gateway.md b/part13-tool-gateway.md
index d24ec1f..b4dbab2 100644
--- a/part13-tool-gateway.md
+++ b/part13-tool-gateway.md
@@ -1,6 +1,6 @@
# Part 13: The Nous Tool Gateway (One Subscription, Four Tools, Zero Extra Keys)
-*New in Hermes v0.10.0 (2026.4.16). If you have a paid Nous Portal subscription, you already have web search, image generation, text-to-speech, and browser automation — you just haven't turned them on yet.*
+*If you have a paid Nous Portal subscription, you already have web search, image generation, text-to-speech, and browser automation — you just haven't turned them on yet.*
---
diff --git a/part14-fast-mode-watchers.md b/part14-fast-mode-watchers.md
index 909b098..f69fd0c 100644
--- a/part14-fast-mode-watchers.md
+++ b/part14-fast-mode-watchers.md
@@ -1,6 +1,6 @@
# Part 14: Fast Mode & Background Watchers
-*New in Hermes v0.9.0 (2026.4.13). Two small features with outsized impact: priority-tier inference on OpenAI and Anthropic, and real-time pattern matching on background process output.*
+*Priority-tier inference, live background-process events, and the newer TUI controls that keep long sessions steerable instead of stuck.*
---
@@ -10,7 +10,7 @@
Both OpenAI and Anthropic run **priority processing queues** for latency-sensitive traffic. Higher cost per token, but dramatically lower p50 and p99 latency — especially under load on reasoning models.
-`/fast` toggles that priority tier per session. On supported models (GPT-5.4, Codex, Claude Opus 4.6, Claude Sonnet 4), flipping it on injects `service_tier: "priority"` into every outgoing request.
+`/fast` toggles that priority tier per session. On supported OpenAI/Codex and Anthropic models, flipping it on injects `service_tier: "priority"` into outgoing requests.
### When to Use It
@@ -65,6 +65,29 @@ Priority tier is more expensive per token. Watch the **Analytics** tab in the da
---
+## `/steer`, `/queue`, and Background Turns
+
+The newer TUI makes long-running work much easier to control:
+
+| Command | Use it when | Pattern |
+|---------|-------------|---------|
+| `/steer ` | The agent is mid-run but drifting | "Continue, but don't edit generated files" |
+| `/queue ` | You want the next task to start after the current one | "After tests pass, summarize the risk" |
+| `/background ` | Fire off work without blocking the main chat | "Research alternatives while I keep coding" |
+| `/busy` | You want to inspect what Hermes is doing | Check active runs/subagents |
+| `/indicator` | The spinner/activity feed is too loud or too quiet | Toggle busy indicator style |
+
+Best practice:
+
+1. Use `/steer` for **constraints**, not brand-new goals.
+2. Use `/queue` for dependent follow-ups.
+3. Use `/background` for independent research or monitoring.
+4. If the run touches files, keep follow-up prompts specific enough that Hermes can avoid clobbering its own edits.
+
+This is the practical replacement for repeatedly interrupting and restating the whole task.
+
+---
+
## Background Process Monitoring (`watch_patterns`)
### The Problem This Fixes
diff --git a/part15-new-platforms.md b/part15-new-platforms.md
index 18d96c6..634c6a8 100644
--- a/part15-new-platforms.md
+++ b/part15-new-platforms.md
@@ -1,12 +1,12 @@
-# Part 15: New Messaging Platforms (iMessage, WeChat, Android)
+# Part 15: Messaging Platforms (iMessage, WeChat, QQBot, Yuanbao, Teams, Android)
-*Hermes v0.9.0 (2026.4.13) — the "everywhere" release. Three new surfaces that dramatically expand where Hermes can run and who can talk to it.*
+*Hermes' gateway is now a plugin host. v0.9 made Hermes "everywhere"; v0.11/v0.12 added QQBot, Tencent Yuanbao, and Microsoft Teams as the first plugin-shipped platform.*
---
-## The 16-Platform Lineup
+## The 18+ Platform Lineup
-As of v0.9, the gateway ships adapters for:
+As of v0.12, the gateway ships built-in adapters plus plugin-shipped platforms:
| Platform | Mode | Notes |
|----------|------|-------|
@@ -17,6 +17,9 @@ As of v0.9, the gateway ships adapters for:
| **iMessage (BlueBubbles)** | Webhook | **New in v0.9** |
| **Weixin (WeChat personal)** | Long-poll | **New in v0.9** |
| **WeCom (Enterprise WeChat)** | Webhook | **New in v0.9** |
+| **QQBot** | WebSocket/Webhook | Added after the original v0.9 platform sweep |
+| **Tencent Yuanbao** | Native gateway | **New in v0.12**, text + media delivery |
+| **Microsoft Teams** | Plugin | **New in v0.12**, first plugin-shipped gateway platform |
| Signal | REST via signal-cli | Self-hosted bridge |
| DingTalk | Webhook | Corporate IM, China/APAC |
| Feishu / Lark | Webhook | Corporate IM, ByteDance |
@@ -33,8 +36,31 @@ All of them respect:
- Tool Gateway routing (Part 13)
- Cron delivery targets
- The shared session database (Part 7)
+- Pre-dispatch plugin hooks
-This part covers the three brand-new adapters plus **Android / Termux** — running the agent itself on a phone.
+This part covers the v0.9 adapters, the newer v0.12 surfaces, and **Android / Termux** — running the agent itself on a phone.
+
+## 2026 Update: QQBot, Yuanbao, and Teams
+
+### QQBot
+
+Use QQBot when your community already lives in QQ and you want the same approval/session model as Telegram or Discord. Treat QQ groups as untrusted input by default: keep allowlists tight, require approval for filesystem/network tools, and use [Part 19](./part19-security-playbook.md) for prompt-injection hardening.
+
+### Tencent Yuanbao
+
+Yuanbao is now a native gateway adapter with text and media delivery. It belongs in the same bucket as Weixin/WeCom: powerful in China/APAC workflows, but operationally different from Western SaaS bots. Verify media size limits and identity mapping before using it for production approvals.
+
+### Microsoft Teams Plugin
+
+Teams proves the v0.12 gateway-plugin architecture: new platforms no longer need to land inside `gateway/platforms/` to be usable. Enable only trusted platform plugins:
+
+```bash
+hermes plugins list
+hermes plugins enable teams
+hermes gateway setup
+```
+
+Keep project-local plugins disabled unless the repository is trusted (`HERMES_ENABLE_PROJECT_PLUGINS=true` is intentionally opt-in).
---
diff --git a/part16-backup-debug.md b/part16-backup-debug.md
index dd6692f..b0db9e2 100644
--- a/part16-backup-debug.md
+++ b/part16-backup-debug.md
@@ -1,6 +1,6 @@
# Part 16: Backup, Import, and `/debug` — Your Recovery Kit
-*New in Hermes v0.9.0 and v0.10.0. Two long-missing features finally shipped: first-class backup/import of your whole Hermes install, and a built-in diagnostic bundler you can share in bug reports.*
+*First-class backup/import, debug bundles, update preflights, and the hardening details you need before you let Hermes run unattended.*
---
@@ -138,12 +138,12 @@ sessions.db
### The New Diagnostic Flow
-When something goes weird, the old flow was: grep through `~/.hermes/logs/`, paste 800 lines into a GitHub issue, hope you got the right ones. The v0.10 flow is:
+When something goes weird, the old flow was: grep through `~/.hermes/logs/`, paste 800 lines into a GitHub issue, hope you got the right ones. The modern flow is:
```text
You → /debug
Collecting diagnostics…
- ✓ Agent version: v0.10.0 (v2026.4.16)
+ ✓ Agent version: v0.12.0 (v2026.4.30)
✓ Platform: Linux 6.8.0 / Python 3.12.3
✓ Gateway: running (3 adapters connected)
✓ Last 200 lines of agent.log
@@ -225,10 +225,39 @@ Preserves detail relevant to the topic and aggressively compresses everything el
---
-## Security Hardening (v0.9 + v0.10 Notes)
+## Security Hardening Notes
A handful of hardening changes landed in the "everywhere" + "gateway" releases worth calling out explicitly:
+### v0.12 hardline blocklist
+
+Hermes now has a hardline blocklist for commands that should not be recoverable through casual approval prompts. Keep your own denylist too, but do not rely on "the model will know this is dangerous" for commands that delete homes, scrape credentials, or hit metadata services.
+
+Useful custom denylist additions:
+
+```yaml
+security:
+ approval:
+ denylist:
+ - 'rm\s+-rf\s+(/|~|\$HOME)'
+ - 'curl\s+.+\|\s*(sh|bash)'
+ - '169\.254\.169\.254'
+ - 'cat\s+~?/?\.?ssh/'
+ - 'aws\s+s3\s+sync\s+.+\s+s3://'
+ - 'ssh-keyscan'
+```
+
+### `hermes update --check` before upgrades
+
+Before a major upgrade:
+
+```bash
+hermes update --check
+hermes backup
+```
+
+The preflight catches obvious incompatibilities and the backup gives you a rollback point for `HERMES_HOME`.
+
### Webhook secrets validated on startup
Every webhook-based adapter (Telegram, BlueBubbles, WeCom, Feishu, WeChat, generic Webhook) now validates its signing secret at gateway startup. A missing/empty/weak secret produces a startup error instead of silently accepting forged requests.
@@ -280,4 +309,4 @@ You've now seen the full April 2026 feature surface:
- [Part 14 — Fast Mode & Background Watchers](./part14-fast-mode-watchers.md)
- [Part 15 — New Platforms (iMessage, WeChat, Android)](./part15-new-platforms.md)
-If you installed fresh on v0.10.0 and walked through [Part 1](./part1-setup.md) and this series, you're running the most capable Hermes configuration to date.
+If you installed fresh on v0.12.0 and walked through [Part 1](./part1-setup.md) and this series, you're running the most capable Hermes configuration to date.
diff --git a/part17-mcp-servers.md b/part17-mcp-servers.md
index 3e9d289..9501fee 100644
--- a/part17-mcp-servers.md
+++ b/part17-mcp-servers.md
@@ -107,13 +107,15 @@ Without a `tools_allowlist`, every tool the server exposes is available.
These are the ones that pay for themselves within a day:
+> **2026 reality check:** MCP is also a supply-chain boundary. Prefer official servers, pin package versions, restrict filesystem roots, and keep `allow_sampling: false` unless the server genuinely needs to call an LLM.
+
| Server | What it adds | Why you want it |
|--------|--------------|-----------------|
| **@modelcontextprotocol/server-github** | Issues, PRs, repo search, branch diffs | Hermes becomes a code-aware teammate |
| **@modelcontextprotocol/server-filesystem** | Scoped file reads/writes/search | Safer than giving terminal access |
| **@modelcontextprotocol/server-postgres** | Read-only SQL | Answer "what's in the db?" without exposing DSN |
| **@modelcontextprotocol/server-sqlite** | Local SQLite analysis | Great for log files, analytics snapshots |
-| **@modelcontextprotocol/server-puppeteer** | Browser automation | Complement to the Tool Gateway's Browser Use |
+| **@modelcontextprotocol/server-puppeteer** | Browser automation | Complement to the Tool Gateway's Browser Use; sandbox it tightly |
| **@modelcontextprotocol/server-memory** | Knowledge-graph memory | Pairs with [Part 3 LightRAG](./part3-lightrag-setup.md) for redundancy |
| **mcp.mem0.ai** | Hosted long-term memory | Cross-device memory across Hermes + Claude Code |
| **Cloudflare Observability MCP** | Query your Worker logs/analytics | If you run anything on Cloudflare |
@@ -124,7 +126,7 @@ These are the ones that pay for themselves within a day:
| **@browserbase/mcp** | Headless browser-as-a-service | Scraping sites Firecrawl can't handle |
| **@chroma-core/chroma-mcp** | ChromaDB vectors | Works alongside LightRAG |
-For the full catalog, see [modelcontextprotocol.io/servers](https://modelcontextprotocol.io/servers) and the `awesome-mcp-servers` list on GitHub.
+For the full catalog, see the [MCP Registry](https://registry.modelcontextprotocol.io/) and the `awesome-mcp-servers` list on GitHub.
---
diff --git a/part18-coding-agents.md b/part18-coding-agents.md
index b7ad961..812a97d 100644
--- a/part18-coding-agents.md
+++ b/part18-coding-agents.md
@@ -12,7 +12,7 @@ Hermes is excellent at reasoning, memory, conversation, and workflow. It is *not
|-------|-----------|------------|
| **Claude Code** | Strongest at large refactors, test writing, PR reviews | Pro/Max OAuth or `ANTHROPIC_API_KEY` |
| **Codex** (OpenAI) | Fast feedback loop, great at bug hunts, small edits | OAuth via `openai` CLI or `OPENAI_API_KEY` |
-| **Gemini CLI** | 1M context — unbeatable for "read the whole repo" tasks | OAuth via `gemini auth` (free tier generous) |
+| **Gemini CLI** | 1M context — unbeatable for "read the whole repo" tasks | OAuth via `gemini auth`; Hermes' own Gemini OAuth covers normal model-provider use |
| **OpenCode** (anomalyco) | Open-source, routes to GLM/Kimi/MiMo cheaply | Bring any provider key |
| **Aider** | Surgical git-based edits, smallest token footprint | Bring any provider key |
@@ -33,7 +33,7 @@ codex auth login
# Gemini CLI
npm install -g @google/gemini-cli
-gemini auth # Free tier: 1500 req/day
+gemini auth # Only needed when delegating to Gemini CLI itself
# OpenCode (Go variant preferred for Hermes)
curl -fsSL https://opencode.ai/install.sh | bash
@@ -101,7 +101,7 @@ Each specialist has a sweet spot. Let Hermes route:
| Bug reproduction + fix in a single file | Codex | Fast turnaround, cheaper per task |
| "Explain this codebase" | Gemini CLI | 1M context eats any repo whole |
| Bulk surgical edits with deterministic diffs | Aider | Smallest token footprint, git-native |
-| Anything on a budget | OpenCode + GLM 4.6 / Kimi K2 | One-tenth the cost of Claude for ~80% quality |
+| Anything on a budget | OpenCode + GLM / Kimi | Much cheaper than frontier models for routine edits |
A sensible `~/.hermes/config.yaml`:
@@ -119,7 +119,7 @@ delegation:
agent: gemini-cli
- match: { budget: low }
agent: opencode
- model: glm-5.1
+ model: zai/glm
```
---
@@ -128,7 +128,7 @@ delegation:
What you actually want on your phone: a Telegram topic named "Claude Code" where every message lands in a persistent Claude Code session. No re-explaining context. No re-spawning. Just chat with the coding agent directly, with Hermes handling the transport, memory, and voice-to-text.
-This is the feature request tracked in [#5394](https://github.com/NousResearch/hermes-agent/issues/5394) and already landing in bits across v0.9/v0.10. As of v0.10.0 the workflow is:
+This pattern is now practical because v0.11 added orchestrator-role subagents, spawn-depth controls, and file-coordination between sibling workers. The workflow:
```bash
# In Telegram, create a topic, then from the CLI or dashboard:
@@ -138,6 +138,8 @@ hermes bind-thread --runtime claude-code --cwd ~/projects/myapp
From that point:
- Every message in the topic goes to a persistent Claude Code session
- File edits happen in `~/projects/myapp` on the Hermes host
+- Orchestrator subagents can spawn their own workers if `max_spawn_depth` allows it
+- Concurrent workers coordinate file state instead of blindly overwriting siblings
- `/unbind` in the topic detaches and reverts to normal Hermes chat
- `/runtime gemini-cli` swaps the runtime without losing the thread
diff --git a/part19-security-playbook.md b/part19-security-playbook.md
index 9c211a6..45533df 100644
--- a/part19-security-playbook.md
+++ b/part19-security-playbook.md
@@ -17,6 +17,8 @@ Hermes is uniquely exposed because it takes input from **many** surfaces and has
| GitHub MCP | PR titles, issue bodies, comments | Comment-and-Control pattern |
| Web-scraped content | Page HTML the agent reads | "Read then act" injections |
| Voice transcript | Whisper transcription | "Say the magic phrase" attacks |
+| MCP/plugin package | Tool schema, stdout, hook behavior | Supply-chain prompt injection / token burn |
+| Dashboard plugin | Browser UI + backend endpoints | Local secret/config exposure |
The goal isn't to eliminate these channels — Hermes is *for* reading them. The goal is to make sure untrusted text can't cross a trust boundary into secrets, writes, or shell.
@@ -82,6 +84,8 @@ security:
- "chmod -R 777 /"
- "curl * | sudo bash"
- ".*/etc/shadow"
+ - "169.254.169.254"
+ - "ssh-keyscan"
approval_channels: # Where the prompt shows up
- telegram_private # Your personal DM, not the group
- cli
@@ -102,6 +106,10 @@ security:
# DO NOT ADD: any subagent that reads Telegram, email, webhooks, or scraped web
```
+### v0.12 Hardline Blocks
+
+Hermes now has hardline command blocking for unrecoverable patterns. Treat it as the seatbelt, not the whole car: keep your own denylist, preserve private approval channels, and never route approvals back into the same untrusted group/chat that triggered the action.
+
---
## Layer 3: Secrets Isolation
diff --git a/part20-observability.md b/part20-observability.md
index 6e0e4cf..4127e0d 100644
--- a/part20-observability.md
+++ b/part20-observability.md
@@ -1,4 +1,4 @@
-# Part 20: Observability & Cost Control — Langfuse, Helicone, /usage, Routing Playbooks
+# Part 20: Observability & Cost Control — Langfuse Plugin, Helicone, /usage, Routing Playbooks
*You can't optimize what you can't see. Hermes tracks tokens, latency, and errors natively, but once you're running across CLI + Telegram + Discord + cron + coding-agent delegations, you want a real tracing stack. This part sets up Langfuse, Helicone, or OpenTelemetry → Phoenix with one config block, then gives you the cost-routing playbook that dropped our test deployment from $34 to $3 per feature implementation.*
@@ -70,7 +70,11 @@ hermes logs export --since 30d --format jsonl \
## Level 3 — Langfuse (Recommended Default)
-Langfuse is the "everything in one place" option: tracing, prompt management, evals, self-hostable. If you're not sure where to start, start here.
+Langfuse is the "everything in one place" option: tracing, prompt management, evals, self-hostable. If you're not sure where to start, start here. In v0.12, Langfuse also ships as a bundled observability plugin, so prefer enabling that over hand-rolled hooks.
+
+```bash
+hermes plugins enable observability/langfuse
+```
### Setup (Hosted Cloud)
@@ -176,36 +180,36 @@ Hermes emits `gen_ai.*` spans following the [OpenInference](https://github.com/A
### Rule 1: Route by Task Complexity, Not Default
-Most Hermes cost bloat comes from using Claude Opus / GPT-5 for tasks Kimi / GLM / MiniMax would handle identically. Set up a **task-aware default**:
+Most Hermes cost bloat comes from using your most expensive frontier model for tasks Gemini Flash, Kimi/Moonshot, GLM, MiniMax, Cerebras, or a local model would handle identically. Set up a **task-aware default**:
```yaml
model_routing:
default:
- model: claude-sonnet-4-20250514
+ model: claude-sonnet
provider: anthropic
routes:
- match: { intent: [classification, extraction, triage, sum_under_500_tokens] }
model: gemini-2.5-flash
- provider: openrouter
+ provider: google-gemini-cli
- match: { intent: long_context, tokens_gte: 150000 }
model: gemini-2.5-pro
provider: openrouter
- match: { intent: [write_code, refactor, debug], complexity: medium }
- model: glm-5.1
+ model: zai/glm
provider: zai
- match: { intent: [write_code, refactor, debug], complexity: high }
- model: claude-sonnet-4-20250514
+ model: claude-sonnet
provider: anthropic
- match: { intent: [reasoning, math], complexity: high }
- model: gpt-5.4
+ model: openai-reasoning
provider: openai
```
Hermes classifies intent via a tiny prompt (~100 tokens) and routes accordingly. Empirically:
-| Scenario | Naive default (Sonnet 4.5) | Routed | Savings |
+| Scenario | Naive frontier default | Routed | Savings |
|----------|----------------------------|--------|---------|
-| Feature implementation (100 calls) | ~$34 | ~$3 (mostly Kimi) | 91% |
+| Feature implementation (100 calls) | ~$34 | ~$3 (mostly Kimi/GLM) | 91% |
| Long-doc summarization (10 calls, 200K each) | ~$42 | ~$4 (Gemini 2.5 Pro) | 90% |
| Daily classification triage | ~$18/day | ~$1/day (Flash) | 94% |
@@ -294,8 +298,8 @@ hermes evals dataset create telegram-support-flows
hermes evals dataset add telegram-support-flows ~/.hermes/traces/support/*.json
# Run on every release
-hermes evals run telegram-support-flows --model claude-sonnet-4-20250514
-hermes evals run telegram-support-flows --model glm-5.1 # Check if cheaper model still passes
+hermes evals run telegram-support-flows --model anthropic/claude-sonnet
+hermes evals run telegram-support-flows --model zai/glm # Check if cheaper model still passes
hermes evals compare
```
diff --git a/part21-remote-sandboxes.md b/part21-remote-sandboxes.md
index a7988c2..3687247 100644
--- a/part21-remote-sandboxes.md
+++ b/part21-remote-sandboxes.md
@@ -1,6 +1,6 @@
-# Part 21: Remote Sandboxes & Bulk File Sync — SSH, Modal, Daytona
+# Part 21: Remote Sandboxes & Bulk File Sync — SSH, Modal, Daytona, Vercel
-*Running Hermes on a $5 VPS is great for chat. Running heavy coding work there is not. This part sets up the "phone drives, beefy remote does the work" pattern: Hermes lives on your small VPS, delegates execution to a disposable sandbox on SSH/Modal/Daytona, syncs files both ways, and tears it down when idle. Ships in v0.9+ with the [bulk file sync](https://github.com/NousResearch/hermes-agent/pull/8018) hardening that landed April 17, 2026.*
+*Running Hermes on a $5 VPS is great for chat. Running heavy coding work there is not. This part sets up the "phone drives, beefy remote does the work" pattern: Hermes lives on your small VPS, delegates execution to a disposable sandbox on SSH/Modal/Daytona/Vercel, syncs files both ways, and tears it down when idle.*
---
@@ -30,11 +30,12 @@ Hermes uploads your workspace on task start, delegates work, then downloads only
| **SSH** | Your infra | Whatever your host costs | Homelab / always-on dev box |
| **Modal** | Per-second compute | $0 (hibernate) | Bursty coding tasks, GPU work |
| **Daytona** | Per-second workspace | $0 (hibernate) | Long-lived dev workspaces |
+| **Vercel Sandbox** | Per-run / platform billing | $0 when unused | Webapp builds and isolated `execute_code` tasks |
| **Fly Machines** | Per-second | $0 (stop) | Regional sandboxes near your users |
| **E2B** | Per-second | $0 | Quick throwaway Python sandboxes |
| **Local Docker** | Your hardware | N/A | Testing / development |
-Hermes ships native support for SSH, Modal, and Daytona as of v0.9+. Fly Machines and E2B work via a thin `remote_exec` plugin.
+Hermes ships native support for SSH, Modal, Daytona, and Vercel Sandbox. Fly Machines and E2B work via thin plugins.
---
@@ -89,7 +90,7 @@ Under the hood on teardown:
5. Applies only changed files back to `~/.hermes`, with `fcntl.flock` serialization if another sandbox runs concurrently
6. SIGINT-safe — pressing Ctrl-C during sync rolls back cleanly
-This is what PR [#8018](https://github.com/NousResearch/hermes-agent/pull/8018) (merged April 17) formalized. Before it, you either rsynced everything every time (slow) or lost remote-made edits on teardown.
+This is the hardening that made remote sandboxes safe enough for real coding work. Before diff-based sync-back, you either rsynced everything every time (slow) or lost remote-made edits on teardown.
---
@@ -164,7 +165,32 @@ sandboxes:
pull_on_command: "/sync-home" # Manual sync when you want it
```
-Pair with the [Gemini CLI OAuth provider](./part9-custom-models.md) (merged PR [#11270](https://github.com/NousResearch/hermes-agent/pull/11270), April 16) for free-tier Gemini use inside the sandbox — the 1500 req/day free tier covers most exploratory work.
+Pair with the [Gemini OAuth provider](./part9-custom-models.md#gemini-oauth--free-tier-friendly) for free-tier-friendly long-context reads inside the sandbox.
+
+---
+
+## Vercel Sandbox (Web Builds / Isolated Code Execution)
+
+Vercel Sandbox is now a native backend for `execute_code` and terminal-style runs. Use it when the task is webapp-shaped: install dependencies, run a build, inspect generated output, and throw the environment away.
+
+```yaml
+sandboxes:
+ vercel-web:
+ backend: vercel
+ project: my-webapp
+ timeout: 1800
+ sync:
+ push: ~/projects/my-webapp
+ pull_on_teardown: true
+ pull_paths:
+ - .
+ ignore:
+ - node_modules
+ - .next
+ - dist
+```
+
+It is not a replacement for Daytona if you want a persistent dev workspace. Treat it as a clean execution target for builds, tests, and short isolated scripts.
---
diff --git a/part22-latest-power-moves.md b/part22-latest-power-moves.md
new file mode 100644
index 0000000..d9c6210
--- /dev/null
+++ b/part22-latest-power-moves.md
@@ -0,0 +1,175 @@
+# Part 22: Latest Power Moves — Curator, TUI, Plugins, Context Files
+
+*If you already know Hermes but missed the v0.11/v0.12 wave, read this part first. These are the changes that most improve daily usage.*
+
+---
+
+## 1. Turn On Curator Before Your Skill Library Becomes Noise
+
+Agent-created skills are valuable until the library fills with duplicates, stale CLI flags, and one-off task notes. Curator is the v0.12 maintenance loop for that.
+
+```bash
+hermes curator run --dry-run
+hermes curator run
+hermes curator enable
+```
+
+Use it like this:
+
+- Pin production runbooks and skills you personally rely on.
+- Let Curator archive weak/duplicate agent-created skills.
+- Run a dry-run after upgrades or big workflow changes.
+- Restore archived skills instead of recreating them from memory.
+
+Curator should prune skills, not decide project policy. Put durable project rules in context files.
+
+---
+
+## 2. Use the TUI as Your Daily Driver
+
+`hermes --tui` is now the primary power-user interface. It is not just prettier output; it changes how you steer long runs.
+
+```bash
+hermes --tui
+```
+
+Habits that pay off:
+
+- Use `/steer ` when the agent is mid-run but drifting.
+- Use `/queue ` for dependent follow-ups.
+- Use `/background ` for independent research or monitoring.
+- Use `/resume`, then delete stale sessions from the picker with `d`.
+- Use `/reload` after editing `.env`; avoid restarting the session just to pick up keys.
+- Toggle `/mouse` if your terminal/ConPTY injects phantom mouse events.
+
+If the dashboard Chat tab is enabled, it embeds the same TUI through a PTY, so improving your TUI workflow also improves the browser workflow.
+
+---
+
+## 3. Clean Up Context Files
+
+Hermes now reads common agent instruction files, including `.hermes.md`, `AGENTS.md`, `CLAUDE.md`, `SOUL.md`, and `.cursorrules`.
+
+Use them for different jobs:
+
+| File | Put this there | Avoid |
+|------|----------------|-------|
+| `.hermes.md` | Hermes-specific repo workflow, commands, approval expectations | Generic company policy |
+| `AGENTS.md` | Cross-agent coding instructions | Personal style/personality |
+| `SOUL.md` | Tone, boundaries, durable preferences | Build commands and API docs |
+| `.cursorrules` | Editor/Cursor compatibility | Secrets or credentials |
+
+Best pattern:
+
+1. Keep root instructions short.
+2. Add subdirectory-specific files only where behavior changes.
+3. Store secrets in `.env` or provider auth stores, never context files.
+4. Use skills for procedures, memory for facts, and context files for policy.
+
+---
+
+## 4. Use Plugins for Integrations, Not One-Off Scripts
+
+v0.12 made plugins the right abstraction for tools, hooks, slash commands, dashboard tabs, and gateway platforms.
+
+```bash
+hermes plugins list
+hermes plugins enable observability/langfuse
+hermes plugins enable spotify
+```
+
+Bundled plugins worth reviewing:
+
+| Plugin | Why enable it |
+|--------|---------------|
+| `observability/langfuse` | Trace LLM/tool calls without writing custom hooks |
+| `spotify` | Native playback, queue, search, playlists, devices |
+| `google_meet` | Join calls, transcribe, speak, and generate follow-ups |
+| `hermes-achievements` | Dashboard achievements from session history |
+| image-gen backends | Extra OpenAI/Codex/xAI image routes |
+
+Security posture:
+
+- Plugins are disabled by default; keep it that way.
+- Enable only trusted bundled/user plugins.
+- Enable project-local plugins only for trusted repos.
+- Treat hooks as code execution, not "just configuration."
+
+---
+
+## 5. Split Main and Auxiliary Models
+
+The dashboard and `hermes model` now expose auxiliary model configuration. Use it.
+
+| Job | Good default |
+|-----|--------------|
+| Main agent | Your preferred coding/reasoning model |
+| Compression | Cheap fast model |
+| Vision | A model with actual image capability |
+| Session search | Cheap summarizer/search-capable model |
+| Title generation | Cheapest reliable model |
+| Curator | Cheap model with enough context for skill review |
+
+This avoids spending premium tokens on titles, compression, and housekeeping.
+
+---
+
+## 6. Chain Cron Jobs Instead of Repeating Context
+
+Cron is no longer just "run this prompt every morning." Use:
+
+- Per-job `workdir` for project-aware jobs.
+- Per-job `enabled_toolsets` to shrink tool/context overhead.
+- `context_from` to feed one job's output into the next.
+- Webhook direct delivery for zero-LLM notifications.
+
+Example pattern:
+
+```yaml
+cron:
+ jobs:
+ collect-build-status:
+ schedule: "*/30 * * * *"
+ workdir: ~/projects/app
+ enabled_toolsets: [terminal]
+ prompt: "Run the build status check and summarize failures only."
+ notify-build-status:
+ schedule: "*/30 * * * *"
+ context_from: collect-build-status
+ deliver: telegram_private
+ prompt: "Notify only if the upstream job found failures."
+```
+
+---
+
+## 7. Upgrade Checklist for Existing Installs
+
+Before moving an older v0.9/v0.10 setup to v0.12:
+
+```bash
+hermes update --check
+hermes backup
+hermes --version
+hermes doctor
+```
+
+Then:
+
+1. Open `hermes dashboard`.
+2. Configure main + auxiliary models.
+3. Enable only the plugins you actually need.
+4. Run `hermes curator run --dry-run`.
+5. Test one gateway message, one tool call, one skill, and one cron job.
+6. Review [Part 19](./part19-security-playbook.md) before enabling broad platform access.
+
+---
+
+## What to Ignore
+
+Some old advice is no longer worth optimizing around:
+
+- Do not install external Gemini CLI just for Gemini auth; Hermes can do OAuth itself.
+- Do not fork the dashboard for a custom tab; write a dashboard plugin.
+- Do not keep a giant SOUL.md full of procedures; use skills and Curator.
+- Do not use one expensive default model for every auxiliary task.
+- Do not expose the dashboard publicly without a real reverse proxy and auth layer.
diff --git a/part4-telegram-setup.md b/part4-telegram-setup.md
index da8e49c..39090ad 100644
--- a/part4-telegram-setup.md
+++ b/part4-telegram-setup.md
@@ -6,7 +6,7 @@
## The 16-Platform Gateway
-As of v0.9.0 (April 2026), the Hermes gateway ships adapters for **16 platforms**. They all share the same session DB, the same `/fast` toggle, the same Tool Gateway plumbing, and the same cron delivery mechanism:
+As of v0.12.0 (April 2026), the Hermes gateway ships adapters/plugins for **18+ platforms**. They all share the same session DB, the same `/fast` toggle, the same Tool Gateway plumbing, and the same cron delivery mechanism:
| Flagship | New in v0.9 | Enterprise / regional | Self-hosted / generic |
|----------|-------------|-----------------------|-----------------------|
diff --git a/part5-creating-skills.md b/part5-creating-skills.md
index b3df145..ac16f04 100644
--- a/part5-creating-skills.md
+++ b/part5-creating-skills.md
@@ -114,6 +114,43 @@ Hermes patches the skill with new information using `skill_manage(action='patch'
---
+## Curator (v0.12): Keep the Skill Library From Rotting
+
+The old skill failure mode was predictable: after a month of "save that as a skill," `~/.hermes/skills/` filled with duplicates, stale commands, and one-off notes that should have been memory. Hermes v0.12 adds **Curator** to clean that up.
+
+Run it manually:
+
+```bash
+hermes curator run --dry-run
+hermes curator run
+```
+
+Or enable the default weekly schedule:
+
+```bash
+hermes curator enable
+hermes curator status
+```
+
+What Curator does:
+
+- **Scores skills** for freshness, usage, clarity, overlap, and safety.
+- **Merges duplicates** instead of letting near-identical workflows compete.
+- **Archives dead skills** without deleting them; restore if it was too aggressive.
+- **Pins important skills** so core workflows survive pruning.
+- **Focuses on agent-created skills** first, not bundled/vendor skills.
+
+Good operating pattern:
+
+1. Pin your production runbooks and irreplaceable workflows.
+2. Run `hermes curator run --dry-run` after major upgrades.
+3. Let it archive one-off skills, not memory facts or project instructions.
+4. Ask Hermes to update a skill immediately after a failed run; don't wait for Curator to infer the fix later.
+
+Curator is a librarian, not a teammate. It keeps the shelves useful; you still decide what knowledge is important.
+
+---
+
## Skill Structure
Every skill is a directory with a `SKILL.md` file:
diff --git a/part9-custom-models.md b/part9-custom-models.md
index 8ca0996..d043631 100644
--- a/part9-custom-models.md
+++ b/part9-custom-models.md
@@ -1,33 +1,38 @@
# Part 9: Custom Model Providers (Use Any Model You Want)
-*Hermes supports any OpenAI-compatible API, plus first-class native adapters for Nous Portal, xAI, Xiaomi MiMo, Kimi/Moonshot, z.ai/GLM, MiniMax, Arcee, Hugging Face, Cerebras, Groq, Fireworks, and Ollama. OAuth providers landing post-v0.10 add Gemini CLI (free tier: 1500 req/day), Qwen, and Claude Code Pro/Max. This is the up-to-date (April 17, 2026) cheat sheet.*
+*Hermes supports any OpenAI-compatible API, plus first-class native adapters for Nous Portal, Anthropic, OpenAI/Codex, OpenRouter, AWS Bedrock, Azure AI Foundry, Google Gemini, Gemini OAuth, LM Studio, xAI, Xiaomi MiMo, Kimi/Moonshot, z.ai/GLM, MiniMax, Arcee, GMI Cloud, Tencent TokenHub, Hugging Face, Cerebras, Groq, Fireworks, and Ollama. This is the April 30, 2026 cheat sheet.*
-> **What's new since v0.10.0** — [Gemini CLI OAuth inference provider](https://github.com/NousResearch/hermes-agent/pull/11270) (#11270), [Gemini TTS provider](https://github.com/NousResearch/hermes-agent/pull/10922), [multi-model FAL image gen](https://github.com/NousResearch/hermes-agent/pull/11265), [GLM 5.1 in OpenCode Go catalogs](https://github.com/NousResearch/hermes-agent/pull/11269), [Azure OpenAI GPT-5.x on chat/completions](https://github.com/NousResearch/hermes-agent/pull/10086), plus [TCP keepalives](https://github.com/NousResearch/hermes-agent/pull/11277) that detect dead provider connections before you notice the hang. All shipping on `main`, targeted for v0.11.
+> **What's new since the v0.10 guide refresh** — Gemini OAuth is now built into `hermes model` (no separate CLI install), AWS Bedrock uses the native Converse API, Azure AI Foundry auto-detects OpenAI vs Anthropic transports, LM Studio has `hermes doctor` checks and live `/models`, MiniMax OAuth uses PKCE, and OpenRouter/Nous model pickers update from a remote manifest instead of a hardcoded release snapshot.
---
## Native Adapters vs Generic OpenAI-Compatible
-As of v0.10.0 (April 2026), Hermes ships **native adapters** for a growing list of providers. Native adapters know about provider-specific features that a generic OpenAI-compatible wrapper can't:
+As of v0.12.0 (April 2026), Hermes ships **native adapters** for a large provider set. Native adapters know about provider-specific features that a generic OpenAI-compatible wrapper can't:
| Provider | Native adapter? | Notable feature |
|----------|-----------------|-----------------|
| **Nous Portal** | Yes | Auth via `hermes model` (no bare API key). Unlocks the [Tool Gateway](./part13-tool-gateway.md). |
| **Anthropic** | Yes | Native prompt caching, extended thinking, `/fast` priority tier |
| **OpenAI** | Yes | Native responses API, reasoning effort levels, `/fast` priority tier |
-| **xAI (Grok)** | **Yes, new in v0.10** | Native **live X/Twitter search** as a built-in tool |
-| **Xiaomi MiMo** | **Yes, new in v0.10** | Native reasoning modes (`low`/`medium`/`high`) exposed as config |
-| **Kimi / Moonshot** | Yes | 200K+ context, great for LightRAG entity extraction (see [Part 3](#part-3-lightrag--graph-rag-that-actually-works)) |
-| **z.ai / GLM** | Yes | **GLM 5.1** (added to OpenCode Go catalogs [#11269](https://github.com/NousResearch/hermes-agent/pull/11269)) — currently strongest open-weights model for tool use |
+| **OpenAI Codex OAuth** | Yes | ChatGPT/Codex login through `hermes model`, no API key |
+| **AWS Bedrock** | Yes | Converse API, IAM credentials, cross-region inference profiles, Bedrock Guardrails |
+| **Azure AI Foundry** | Yes | Auto-detects OpenAI-style vs Anthropic-style deployments and context length |
+| **LM Studio** | Yes | Local `/models` discovery, optional auth, reasoning transport, `hermes doctor` checks |
+| **xAI (Grok)** | Yes | Native live X search and xAI image/STT/TTS integrations |
+| **Xiaomi MiMo** | Yes | Native reasoning modes (`low`/`medium`/`high`) exposed as config |
+| **Kimi / Moonshot** | Yes | 200K+ context, great for LightRAG entity extraction (see [Part 3](./README.md#part-3-lightrag--graph-rag-that-actually-works)) |
+| **z.ai / GLM** | Yes | Strong open-weight tool-use models; good cheap fallback for planning/exploration |
| **Google Gemini (direct)** | Yes | 1M context; native prompt caching on Gemini 2.5 Pro |
-| **Google Gemini CLI (OAuth)** | **Yes, new post-v0.10** | OAuth via `gemini auth` — **1500 requests/day free tier**. [#11270](https://github.com/NousResearch/hermes-agent/pull/11270) |
-| **MiniMax** | Yes | M2.7 — balanced speed/quality; native streaming |
+| **Google Gemini (OAuth)** | Yes | Browser PKCE login via `hermes model`; free tier supported; no external `gemini` install |
+| **MiniMax** | Yes | API key or OAuth; native streaming and TTS |
+| **GMI Cloud** | Yes | Hosted open models behind a native provider |
+| **Tencent TokenHub** | Yes | Tencent model routing through TokenHub aliases |
| **Arcee** | Yes | AFM-4.5 function-calling specialist, cheap |
| **Cerebras** | Yes | 2000+ tok/s inference |
| **Groq** | Yes | Fast hosted Llama / Qwen |
-| **Qwen (OAuth)** | Yes | OAuth via portal-request flow, free-tier available |
| **Fireworks** | Yes | Qwen3-Embedding-8B (recommended for LightRAG) |
-| **Azure OpenAI** | Yes | GPT-5.x now via `/chat/completions` (was `/responses` only) [#10086](https://github.com/NousResearch/hermes-agent/pull/10086) |
+| **Vercel AI Gateway** | Yes | Dynamic model discovery, pricing metadata, attribution |
| **Hugging Face** | Yes | Any TGI / TEI endpoint (self-hosted or Inference Endpoints) |
| **OpenRouter** | Yes | Pass-through to 200+ models; respects native adapter quirks when downstream is one |
| **Ollama** (local) | Generic | OpenAI-compatible, zero auth |
@@ -35,30 +40,22 @@ As of v0.10.0 (April 2026), Hermes ships **native adapters** for a growing list
Pick the native adapter when one exists — you get the provider-specific features for free. Fall back to the generic OpenAI-compatible path only for endpoints that don't have a native adapter yet.
-### Flagship Model Cheat Sheet (April 17, 2026)
-
-For the "which model should I pick right now?" question, this is the current state of the world:
-
-| Model | Provider | Input / Output ($/MTok) | Context | Best for |
-|-------|----------|------------------------|---------|----------|
-| **Claude Sonnet 4.5** | Anthropic | $3 / $15 | 200K | Default for coding, refactor, multi-step reasoning |
-| **Claude Opus 4** | Anthropic | $15 / $75 | 200K | The hardest reasoning only; $15/MTok stings fast |
-| **Claude Mythos** (Cyber) | Anthropic | Invite-only | 200K | Security research — vulnerability discovery, malware triage |
-| **GPT-5.4** | OpenAI | $5 / $20 | 256K | Reasoning heavy-lift, agentic long chains |
-| **GPT-5.4-Cyber** | OpenAI | Trusted Access only | 256K | Defensive cybersec workflows, reverse engineering |
-| **GPT-5.4 Mini** | OpenAI | $0.60 / $4.80 | 256K | Cheap reasoning fallback |
-| **Gemini 2.5 Pro** | Google / OpenRouter | $1.25 / $10 | 1M | Long-context, whole-repo reads, research synthesis |
-| **Gemini 3 Flash Preview** | Google / OpenRouter | $0.50 / $3 | 1M | Fast agentic reasoning with 1M window |
-| **Gemini 2.5 Flash** | Google / OpenRouter | $0.30 / $2.50 | 1M | Classification, triage, bulk extraction |
-| **Kimi K2.5** | Moonshot | ~$0.15 / $2.50 | 200K | Best price/quality for coding in 2026 |
-| **GLM 5.1** | z.ai | ~$0.20 / $2 | 128K | Strongest open-weights tool use |
-| **xAI Grok 4** | xAI | $3 / $15 | 256K | Native live-X search; current-events questions |
-| **Xiaomi MiMo** | Xiaomi | $0.50 / $3 | 200K | Three-mode reasoning toggle (low/med/high) |
-| **MiniMax M2.7** | MiniMax | $10/mo flat | 256K | Flat-rate users doing bulk work |
-| **Cerebras Llama 3.3 70B** | Cerebras | $0.60 / $0.60 | 128K | 3000+ tok/s — interactive chat, fast classification |
-| **Local Nemotron 30B** | Ollama | Free | 128K | Privacy, offline, embedding, session search |
-
-> Prices are current per-provider retail as of April 17, 2026. Batch and prompt-caching discounts are not included — stack them via [Part 20](./part20-observability.md#rule-2-prompt-caching-is-free-money).
+### Provider Cheat Sheet (April 30, 2026)
+
+The exact "best model" moves weekly, so treat this as a routing posture rather than a leaderboard. Use `hermes model` for live picker data, then pin only what you need reproducible.
+
+| Need | Start here | Why |
+|------|------------|-----|
+| Default coding / refactors | Anthropic Sonnet or Codex OAuth | Best reliability for patch-heavy work; Codex OAuth avoids API-key churn |
+| Deep reasoning / high stakes | OpenAI reasoning or Anthropic Opus-class | Use explicitly; do not make it the default for cron/bulk tasks |
+| Long-context repo or document reads | Gemini Pro/Flash or OpenRouter equivalent | Huge window, cheap enough for map/reduce and summarization |
+| Cheap daily driver | Gemini OAuth + Kimi/Moonshot + z.ai/GLM | Good quality/cost mix, especially with auxiliary routing |
+| Enterprise / VPC / compliance | AWS Bedrock or Azure AI Foundry | IAM/Azure auth, guardrails, private deployments, audit controls |
+| Local/privacy/offline | LM Studio or Ollama | No cloud egress; great for extraction, embeddings, and drafts |
+| Ultra-fast interactive turns | Cerebras or Groq | Very high tokens/sec; useful for classification and short-form chat |
+| Current-events search | xAI Grok or tool-backed web search | Grok has native live-X search; Tool Gateway can cover broader web |
+
+> Pricing and context windows change too quickly to hardcode. Hermes now pulls OpenRouter and Nous Portal picker lists from a remote manifest, while provider APIs supply pricing/context metadata where available.
---
@@ -73,22 +70,51 @@ hermes model
If you're on a paid subscription, the setup also offers to enable the [Tool Gateway](./part13-tool-gateway.md) — web search, image gen, TTS, and browser automation through your subscription, no extra keys needed.
-### Gemini CLI OAuth — Free 1500 req/day
+### Gemini OAuth — Free-Tier Friendly
+
+If you have a Google account, skip the API key entirely and sign in from Hermes:
+
+```bash
+hermes model
+# Pick "Google Gemini (OAuth)" → complete the browser PKCE flow
+```
+
+Tokens are stored under `~/.hermes/auth/google_oauth.json` with 0600 permissions and automatic refresh. On headless SSH boxes, Hermes falls back to paste-mode auth.
+
+### AWS Bedrock and Azure AI Foundry — Enterprise Routing Without Proxy Glue
+
+Bedrock uses the native Converse API and the normal boto3 credential chain:
+
+```bash
+pip install 'hermes-agent[bedrock]'
+hermes model
+# Choose "AWS Bedrock" → region → model/profile
+```
+
+Use this when you want IAM roles, Bedrock Guardrails, and cross-region inference profiles instead of direct vendor API keys.
-If you have a Google account, skip the API key entirely and sign in with OAuth:
+Azure AI Foundry handles both endpoint styles:
```bash
-npm install -g @google/gemini-cli
-gemini auth
hermes model
-# Pick "Gemini CLI (OAuth)" — Hermes detects the logged-in session
+# Choose "Azure Foundry" → paste endpoint + key
```
-Hermes drives Gemini via the local CLI. You get 1500 requests/day on the free tier — plenty for exploration, classification, and Gemini's killer long-context reads. Merged in [#11270](https://github.com/NousResearch/hermes-agent/pull/11270) (April 16, 2026).
+Hermes probes the endpoint, detects OpenAI-style `/chat/completions` vs Anthropic-style `/messages`, discovers deployments when possible, and stores the right `api_mode` in `config.yaml`.
+
+### Remote Model Catalog: Stop Hardcoding This Week's Winner
+
+OpenRouter and Nous Portal model pickers now fetch:
+
+```text
+https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
+```
-### Gemini TTS — 7th Voice Provider
+The cache lives at `~/.hermes/cache/model_catalog.json`. If the manifest is down, Hermes falls back to the disk cache or the bundled snapshot, so model selection still works offline.
-As of [#10922](https://github.com/NousResearch/hermes-agent/issues/10922) (merged April 16), Gemini joins Edge, ElevenLabs, OpenAI, MiniMax, Mistral, and NeuTTS as a TTS backend:
+### Gemini TTS
+
+Gemini is now one of the practical voice backends alongside Edge, ElevenLabs, OpenAI, MiniMax, Mistral, NeuTTS, and xAI:
```yaml
tts:
@@ -109,7 +135,7 @@ Models are configured in `~/.hermes/config.yaml`:
```yaml
# Default model
-model: claude-sonnet-4-20250514
+model: claude-sonnet
provider: anthropic
# Provider configurations
@@ -120,11 +146,23 @@ providers:
openai:
api_key: ${OPENAI_API_KEY}
- xai: # Native adapter (v0.10+)
+ bedrock:
+ region: us-east-2 # Auth via AWS_PROFILE, env vars, or instance role
+
+ azure-foundry:
+ api_key: ${AZURE_FOUNDRY_API_KEY}
+ base_url: ${AZURE_FOUNDRY_ENDPOINT}
+ api_mode: chat_completions # Or anthropic_messages; wizard auto-detects
+
+ lmstudio:
+ base_url: http://127.0.0.1:1234/v1
+ api_key: ${LM_API_KEY} # Optional if your LM Studio server requires auth
+
+ xai:
api_key: ${XAI_API_KEY}
live_search: true # Grok's live X/Twitter search
- xiaomi: # Native adapter (v0.10+)
+ xiaomi:
api_key: ${XIAOMI_API_KEY}
reasoning_mode: high # low / medium / high
@@ -137,6 +175,12 @@ providers:
minimax:
api_key: ${MINIMAX_API_KEY}
+ gmi:
+ api_key: ${GMI_API_KEY}
+
+ tencent-tokenhub:
+ api_key: ${TOKENHUB_API_KEY}
+
arcee:
api_key: ${ARCEE_API_KEY}
@@ -220,12 +264,12 @@ Use these as opinionated defaults, then tune with [Part 20's cost-routing playbo
| Task | First choice | Fallback (cheaper) | Fallback (fastest) |
|------|--------------|--------------------|--------------------|
-| Daily conversation | Claude Sonnet 4.5 | GLM 5.1 | Cerebras Llama 70B |
-| Coding delegation | Claude Code via Sonnet 4.5 | OpenCode + Kimi K2.5 | OpenCode + Cerebras |
+| Daily conversation | Anthropic Sonnet | Gemini OAuth or z.ai/GLM | Cerebras Llama/Qwen |
+| Coding delegation | Claude Code / Codex OAuth | OpenCode + Kimi/Moonshot | OpenCode + Cerebras |
| Long-context reads (>200K) | Gemini 2.5 Pro | Gemini 2.5 Flash | — |
| Classification / triage | Gemini 2.5 Flash | Cerebras Qwen3 32B | Arcee AFM-4.5 |
-| Reasoning (math, planning) | GPT-5.4 | Claude Opus 4 | GLM 5.1 |
-| Current events / live search | xAI Grok 4 | Gemini with grounding | — |
+| Reasoning (math, planning) | OpenAI reasoning model | Anthropic Opus-class | z.ai/GLM |
+| Current events / live search | xAI Grok | Gemini with grounding | Tool Gateway web search |
| Embeddings (LightRAG) | Qwen3-Embedding-8B (Fireworks) | nomic-embed-text (Ollama) | OpenAI `text-embedding-3-small` |
| TTS (Telegram voice) | OpenAI TTS via Tool Gateway | Gemini 2.5 Flash TTS | Edge TTS (free) |
| Vision | Gemini 2.5 Flash | GPT-4o | Claude Sonnet 4.5 |
diff --git a/skills/dev/release-notes/SKILL.md b/skills/dev/release-notes/SKILL.md
index 76bb61f..1393bee 100644
--- a/skills/dev/release-notes/SKILL.md
+++ b/skills/dev/release-notes/SKILL.md
@@ -58,7 +58,7 @@ Produce a release-notes document following the "What's New / Improvements / Fixe
## 🚀 What's New
- HTTP MCP servers now reconnect automatically with exponential backoff. ([#1234](…))
-- Gemini CLI OAuth is now a first-class provider. ([#1270](…))
+- Gemini OAuth is now a first-class provider. ([#1270](…))
## ⚡ Improvements
- 40% faster skill load via async frontmatter parsing. ([#1205](…))
diff --git a/skills/ops/hermes-weekly/SKILL.md b/skills/ops/hermes-weekly/SKILL.md
index ef56730..a8a0f9b 100644
--- a/skills/ops/hermes-weekly/SKILL.md
+++ b/skills/ops/hermes-weekly/SKILL.md
@@ -27,7 +27,7 @@ model_hint: google/gemini-2.5-flash
# hermes-weekly — Weekly Digest
-Automates the "Cooking on main" section of the guide — but for anyone running Hermes who wants a once-a-week summary of what landed upstream.
+Automates a weekly upstream-change digest for anyone running Hermes who wants a concise summary of what landed.
## Procedure
@@ -68,7 +68,7 @@ Automates the "Cooking on main" section of the guide — but for anyone running
## Why this skill
-- The "Cooking on main" section in the guide's README is curated manually. This lets any Hermes user run it themselves with their own focus.
+- The guide no longer tracks speculative "cooking on main" notes. This skill lets Hermes users make their own upgrade digest from merged upstream work.
- Useful for users who are on a pinned version and want a checklist before upgrading.
- Can be piped into Discord / Telegram channel / newsletter via `notify:` in the cron config.
@@ -86,4 +86,4 @@ cron:
- [release-notes](../../dev/release-notes/SKILL.md) — same pattern but for your own repo
- [weekly-dep-audit](../weekly-dep-audit/SKILL.md) — upgrade-safety check
-- README "[Cooking on main](../../../README.md)" — the manually-curated version
+- [CHANGELOG](../../../CHANGELOG.md) — the manually curated guide history
diff --git a/templates/config/cost-optimized.yaml b/templates/config/cost-optimized.yaml
index 32ad1e1..478db84 100644
--- a/templates/config/cost-optimized.yaml
+++ b/templates/config/cost-optimized.yaml
@@ -3,9 +3,9 @@
# ------------------------------------------------------------
# Target: <$5/mo for personal daily-driver usage.
# - Gemini 2.5 Flash / Pro for 90% of calls
-# - Kimi K2.5 for bulk / background
+# - Kimi/Moonshot for bulk / background
# - Cerebras Llama 70B (free-ish tier) for classification
-# - Gemini CLI OAuth (1500 req/day FREE)
+# - Gemini OAuth free tier
# - Anthropic Sonnet only when `intent: coding` on complex files
# ------------------------------------------------------------
@@ -15,12 +15,12 @@ models:
default: google/gemini-2.5-flash
classification: cerebras/llama-3.1-70b
long_context: google/gemini-2.5-pro
- coding: moonshot/kimi-k2.5 # Fallback to Claude only for hard coding
- coding_complex: anthropic/claude-sonnet-4-5
- reasoning: zai/glm-5.1
+ coding: moonshot/kimi # Fallback to Claude only for hard coding
+ coding_complex: anthropic/claude-sonnet
+ reasoning: zai/glm
providers:
google:
- oauth_enabled: true # <-- this is the free 1500/day tier
+ oauth_enabled: true # Hermes-managed Gemini OAuth free tier
api_key: ${GOOGLE_API_KEY} # Used only when OAuth is unavailable
anthropic:
api_key: ${ANTHROPIC_API_KEY}
@@ -38,13 +38,13 @@ routing:
model: cerebras/llama-3.1-70b
- intent: coding
when: { complexity: high }
- model: anthropic/claude-sonnet-4-5
+ model: anthropic/claude-sonnet
- intent: coding
- model: moonshot/kimi-k2.5
+ model: moonshot/kimi
- intent: long_context
model: google/gemini-2.5-pro
- intent: reasoning
- model: zai/glm-5.1
+ model: zai/glm
prefer_cached: true # Reroute if prompt is >80% cache-hit
context:
diff --git a/templates/config/minimum.yaml b/templates/config/minimum.yaml
index 5e629c9..246d06f 100644
--- a/templates/config/minimum.yaml
+++ b/templates/config/minimum.yaml
@@ -12,7 +12,7 @@
version: 1
models:
- default: anthropic/claude-sonnet-4-5
+ default: anthropic/claude-sonnet
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY}
diff --git a/templates/config/production.yaml b/templates/config/production.yaml
index 159d3d1..7d3ab79 100644
--- a/templates/config/production.yaml
+++ b/templates/config/production.yaml
@@ -14,12 +14,12 @@
version: 1
models:
- default: anthropic/claude-sonnet-4-5
+ default: anthropic/claude-sonnet
classification: google/gemini-2.5-flash
long_context: google/gemini-2.5-pro
- coding: anthropic/claude-sonnet-4-5
- reasoning: openai/gpt-5.4
- cheap: moonshot/kimi-k2.5
+ coding: anthropic/claude-sonnet
+ reasoning: openai/reasoning
+ cheap: moonshot/kimi
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY}
@@ -28,7 +28,7 @@ models:
api_key: ${OPENAI_API_KEY}
google:
api_key: ${GOOGLE_API_KEY}
- oauth_enabled: true # Use Gemini CLI OAuth when available
+ oauth_enabled: true # Use Gemini OAuth when available
moonshot:
api_key: ${MOONSHOT_API_KEY}
zai:
@@ -42,15 +42,15 @@ routing:
- intent: classification
model: google/gemini-2.5-flash
- intent: coding
- model: anthropic/claude-sonnet-4-5
+ model: anthropic/claude-sonnet
- intent: long_context
when: { tokens_in: { gt: 200000 } }
model: google/gemini-2.5-pro
- intent: reasoning
when: { needs_deep_reasoning: true }
- model: openai/gpt-5.4
+ model: openai/reasoning
- intent: bulk_data
- model: moonshot/kimi-k2.5
+ model: moonshot/kimi
gateways:
cli: { enabled: true }
diff --git a/templates/config/security-hardened.yaml b/templates/config/security-hardened.yaml
index 3ac37c9..e6a98ce 100644
--- a/templates/config/security-hardened.yaml
+++ b/templates/config/security-hardened.yaml
@@ -29,10 +29,10 @@ profiles:
- { tool: "*", actions: [exec, write, send, create, update, delete] }
trusted:
description: Admin-only. Full capability.
- models: { default: anthropic/claude-sonnet-4-5 }
+ models: { default: anthropic/claude-sonnet }
models:
- default: anthropic/claude-sonnet-4-5
+ default: anthropic/claude-sonnet
providers:
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
diff --git a/templates/config/telegram-bot.yaml b/templates/config/telegram-bot.yaml
index 7268b5c..66d0432 100644
--- a/templates/config/telegram-bot.yaml
+++ b/templates/config/telegram-bot.yaml
@@ -12,7 +12,7 @@
version: 1
models:
- default: anthropic/claude-sonnet-4-5
+ default: anthropic/claude-sonnet
classification: google/gemini-2.5-flash
providers:
anthropic:
From a6838ba23182a4bc2b50e311d96febec051c2863 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 20:20:08 +0000
Subject: [PATCH 2/9] docs: address review feedback
---
docs/wizard/index.html | 8 ++++----
part20-observability.md | 2 +-
part4-telegram-setup.md | 10 +++++-----
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/docs/wizard/index.html b/docs/wizard/index.html
index 2c3a82d..4dc90ed 100644
--- a/docs/wizard/index.html
+++ b/docs/wizard/index.html
@@ -425,10 +425,10 @@
Hermes Config Wizard
}
if (mcps.linear) {
lines.push(` linear:`);
- lines.push(` command: npx`);
- lines.push(` args: [-y, "@linear/mcp-server-linear"]`);
- lines.push(` env:`);
- lines.push(` LINEAR_API_KEY: "$\{LINEAR_API_KEY\}"`);
+ lines.push(` url: https://mcp.linear.app/mcp`);
+ lines.push(` # OAuth is completed by the MCP client on first connection.`);
+ lines.push(` trust: trusted`);
+ lines.push(` allow_sampling: false`);
}
if (mcps.filesystem) {
lines.push(` filesystem:`);
diff --git a/part20-observability.md b/part20-observability.md
index 4127e0d..0fa7fc6 100644
--- a/part20-observability.md
+++ b/part20-observability.md
@@ -190,7 +190,7 @@ model_routing:
routes:
- match: { intent: [classification, extraction, triage, sum_under_500_tokens] }
model: gemini-2.5-flash
- provider: google-gemini-cli
+ provider: google
- match: { intent: long_context, tokens_gte: 150000 }
model: gemini-2.5-pro
provider: openrouter
diff --git a/part4-telegram-setup.md b/part4-telegram-setup.md
index 39090ad..d6bb580 100644
--- a/part4-telegram-setup.md
+++ b/part4-telegram-setup.md
@@ -1,10 +1,10 @@
# Part 4: Telegram Setup (Chat From Anywhere)
-*Connect Hermes to Telegram for mobile access, voice memos, group chats, and scheduled task delivery. This is the most battle-tested of the 16 messaging adapters — start here, branch out to the others as needed.*
+*Connect Hermes to Telegram for mobile access, voice memos, group chats, and scheduled task delivery. This is the most battle-tested of the 18+ messaging adapters — start here, branch out to the others as needed.*
---
-## The 16-Platform Gateway
+## The 18+ Platform Gateway
As of v0.12.0 (April 2026), the Hermes gateway ships adapters/plugins for **18+ platforms**. They all share the same session DB, the same `/fast` toggle, the same Tool Gateway plumbing, and the same cron delivery mechanism:
@@ -13,12 +13,12 @@ As of v0.12.0 (April 2026), the Hermes gateway ships adapters/plugins for **18+
| Telegram (this part) | iMessage (BlueBubbles) | DingTalk | Signal |
| Discord | WeChat / Weixin | Feishu / Lark | Matrix |
| Slack | WeCom | Mattermost | SMS (Twilio) |
-| WhatsApp | | | Email (IMAP+SMTP) |
-| | | | Home Assistant |
+| WhatsApp | QQBot | Microsoft Teams | Email (IMAP+SMTP) |
+| | Tencent Yuanbao | | Home Assistant |
| | | | Webhook (generic) |
- For **iMessage, WeChat, and Android/Termux**, see [Part 15](./part15-new-platforms.md).
-- For **gateway crash recovery** and health checks across all 16, see [Part 11](./part11-gateway-recovery.md).
+- For **gateway crash recovery** and health checks across all platforms, see [Part 11](./part11-gateway-recovery.md).
- For the browser UI that manages every platform's state, see [Part 12](./part12-web-dashboard.md).
---
From 710ca218959947ce154d2df0a636a2c201bcedb1 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 20:29:37 +0000
Subject: [PATCH 3/9] docs: fix contributing layout reference
---
CONTRIBUTING.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index d2b5117..07bb889 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -37,7 +37,7 @@ This guide is built in public. PRs welcome.
├── ECOSYSTEM.md
├── ROADMAP.md
├── LICENSE
-├── part6-context-compression.md … part22-latest-power-moves.md
+├── part1-setup.md … part22-latest-power-moves.md
├── diagrams/architecture.md
├── skills/
│ ├── README.md
From 2c1f7c3bda76a394f865314c07e0c47606f6efe5 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 20:36:59 +0000
Subject: [PATCH 4/9] docs: update outreach part count
---
docs/outreach/blog-post-long.md | 2 +-
docs/outreach/hacker-news-post.md | 2 +-
docs/outreach/launch-tweet-thread.md | 2 +-
docs/outreach/nous-upstream-pr-body.md | 4 ++--
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/docs/outreach/blog-post-long.md b/docs/outreach/blog-post-long.md
index 9bed5aa..3b40d79 100644
--- a/docs/outreach/blog-post-long.md
+++ b/docs/outreach/blog-post-long.md
@@ -21,7 +21,7 @@ So I wrote the opposite.
## What "ships code" means
-The [Hermes Optimization Guide](https://github.com/OnlyTerp/hermes-optimization-guide) has 21 chapters of documentation. That's the part that looks like every other guide.
+The [Hermes Optimization Guide](https://github.com/OnlyTerp/hermes-optimization-guide) has 23 parts of documentation. That's the part that looks like every other guide.
But it also has, in the same repo:
diff --git a/docs/outreach/hacker-news-post.md b/docs/outreach/hacker-news-post.md
index d0f0302..52fbbf9 100644
--- a/docs/outreach/hacker-news-post.md
+++ b/docs/outreach/hacker-news-post.md
@@ -14,7 +14,7 @@ Author here. Context on what this is and why:
Hermes (Nous Research, ~94K GH stars) is the agent framework I've been using for a year. Most of the existing community guides explain the architecture but don't give you anything to run — you read 15 parts, still have to write your own `config.yaml`, your own cron skills, your own systemd hardening.
-This guide is the other direction: 21 parts of actual documentation *plus*
+This guide is the other direction: 23 parts of actual documentation *plus*
- **13 installable `SKILL.md` files** (audit-mcp, rotate-secrets, audit-approval-bypass, nightly-backup, weekly-dep-audit, cost-report, telegram-triage, pr-review, release-notes, daily-inbox-triage, hermes-weekly, spam-trap, meeting-prep) — drop them into `~/.hermes/skills/` or symlink them in
- **5 opinionated configs** for the 5 real personas (minimum / telegram-bot / production / cost-optimized / security-hardened) — every non-obvious field commented
diff --git a/docs/outreach/launch-tweet-thread.md b/docs/outreach/launch-tweet-thread.md
index a84e2a4..84adc54 100644
--- a/docs/outreach/launch-tweet-thread.md
+++ b/docs/outreach/launch-tweet-thread.md
@@ -7,7 +7,7 @@
**1/8**
I got tired of Hermes guides that explain the architecture but don't give you anything to run, so I shipped the opposite:
-21 parts of documentation **plus** 13 installable skills, 5 production configs, 4 reference architectures, a VPS bootstrap script, hardened systemd units, a reproducible cost benchmark, and an in-browser config wizard.
+23 parts of documentation **plus** 13 installable skills, 5 production configs, 4 reference architectures, a VPS bootstrap script, hardened systemd units, a reproducible cost benchmark, and an in-browser config wizard.
github.com/OnlyTerp/hermes-optimization-guide
diff --git a/docs/outreach/nous-upstream-pr-body.md b/docs/outreach/nous-upstream-pr-body.md
index cc59c0a..ec6b2e7 100644
--- a/docs/outreach/nous-upstream-pr-body.md
+++ b/docs/outreach/nous-upstream-pr-body.md
@@ -19,7 +19,7 @@ Add a new section to `README.md` (just below "Documentation" or "Quick Start"):
Independent guides written by Hermes users. These are not official, but have been vetted by maintainers for accuracy.
-- [Hermes Optimization Guide](https://github.com/OnlyTerp/hermes-optimization-guide) — 21-part guide covering LightRAG, Telegram deployment, MCP, security hardening, cost routing, observability, and remote sandboxes. Ships installable skills, 5 production configs, a VPS bootstrap script, and reproducible cost benchmarks.
+- [Hermes Optimization Guide](https://github.com/OnlyTerp/hermes-optimization-guide) — 23-part guide covering LightRAG, Telegram deployment, MCP, security hardening, cost routing, observability, and remote sandboxes. Ships installable skills, 5 production configs, a VPS bootstrap script, and reproducible cost benchmarks.
_Maintain your own? Open a PR adding it here._
````
@@ -30,7 +30,7 @@ _Maintain your own? Open a PR adding it here._
>
> I've been writing a community optimization guide since v0.9.0 shipped, and have gotten enough "where should I link this so people can find it?" messages that I wanted to propose an upstream spot: a small **Community Guides** section in the README.
>
-> The guide itself is at https://github.com/OnlyTerp/hermes-optimization-guide — 21 parts of documentation, 13 installable `SKILL.md` files, 5 production configs, 4 reference architectures, a VPS bootstrap script, an in-browser config wizard, and a reproducible cost benchmark. MIT license. CHANGELOG + ROADMAP are real. I cross-check every release note on `main` and update within 72h.
+> The guide itself is at https://github.com/OnlyTerp/hermes-optimization-guide — 23 parts of documentation, 13 installable `SKILL.md` files, 5 production configs, 4 reference architectures, a VPS bootstrap script, an in-browser config wizard, and a reproducible cost benchmark. MIT license. CHANGELOG + ROADMAP are real. I cross-check every release note on `main` and update within 72h.
>
> Totally understand if you'd rather maintain a separate page, or curate more carefully before pointing at third-party content. Happy to iterate on the section copy, add more guides as they show up, or even move the list to `docs/community.md` if that fits better.
>
From 6b0df1efbe83eb9282f1b29974adb7bd5fdd417d Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 20:46:43 +0000
Subject: [PATCH 5/9] docs: align quickstart env with telegram template
---
docs/quickstart.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 5b442e6..8a971bd 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -6,8 +6,8 @@ From zero to working Telegram bot.
- A Linux, macOS, or WSL machine (anything with bash)
- A Telegram account
-- One provider: Anthropic/OpenAI/OpenRouter API key, Nous Portal login, or Gemini OAuth via `hermes model`
-- (Optional) A Google API key — [aistudio.google.com](https://aistudio.google.com/apikey) if you prefer API-key Gemini over OAuth
+- An Anthropic API key for the default model
+- A Google API key — [aistudio.google.com](https://aistudio.google.com/apikey) for Gemini Flash classification + LightRAG in the Telegram template
## Step 1 — Install Hermes
@@ -41,7 +41,7 @@ Create `~/.hermes/.env`:
```bash
cat > ~/.hermes/.env <<'EOF'
ANTHROPIC_API_KEY=sk-ant-...
-# GOOGLE_API_KEY=AIza... # optional; Gemini OAuth can be configured via `hermes model`
+GOOGLE_API_KEY=AIza... # required by telegram-bot.yaml for Gemini Flash classification + LightRAG
TELEGRAM_ADMIN_BOT_TOKEN=1234567890:ABC...
TELEGRAM_OWNER_ID=1234567 # your numeric ID from @userinfobot
EOF
From 217635009e59f121b5103db4be93231bf7f20faf Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 20:57:50 +0000
Subject: [PATCH 6/9] docs: align quickstart lightrag keys
---
README.md | 2 +-
docs/quickstart.md | 6 ++++--
templates/config/telegram-bot.yaml | 3 +++
3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 3e36970..4235752 100644
--- a/README.md
+++ b/README.md
@@ -78,7 +78,7 @@ Full set of diagrams: [`diagrams/architecture.md`](./diagrams/architecture.md).
## Pick Your Path
-This guide grew to 23 parts because *Hermes grew*. Parts 1–5 live in this README; Parts 6–22 live as separate files. You don't have to read them all — pick the shortest path to what you need:
+This guide grew to 23 parts because *Hermes grew*. Six sections (Parts 1–5 plus SOUL.md) live in this README; Parts 6–22 live as separate files. You don't have to read them all — pick the shortest path to what you need:
### 🎯 "I just want it working in 10 minutes"
[Part 1: Setup](#part-1-setup-stop-fumbling-with-installation) → [Part 12: Web Dashboard](./part12-web-dashboard.md) → done. Use the dashboard to point-and-click the rest.
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 8a971bd..42cbce8 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -7,7 +7,8 @@ From zero to working Telegram bot.
- A Linux, macOS, or WSL machine (anything with bash)
- A Telegram account
- An Anthropic API key for the default model
-- A Google API key — [aistudio.google.com](https://aistudio.google.com/apikey) for Gemini Flash classification + LightRAG in the Telegram template
+- A Google API key — [aistudio.google.com](https://aistudio.google.com/apikey) for Gemini Flash classification + LightRAG LLM in the Telegram template
+- An OpenAI API key — [platform.openai.com/api-keys](https://platform.openai.com/api-keys) for LightRAG embeddings in the Telegram template
## Step 1 — Install Hermes
@@ -41,7 +42,8 @@ Create `~/.hermes/.env`:
```bash
cat > ~/.hermes/.env <<'EOF'
ANTHROPIC_API_KEY=sk-ant-...
-GOOGLE_API_KEY=AIza... # required by telegram-bot.yaml for Gemini Flash classification + LightRAG
+OPENAI_API_KEY=sk-... # required by telegram-bot.yaml for LightRAG embeddings
+GOOGLE_API_KEY=AIza... # required by telegram-bot.yaml for Gemini Flash classification + LightRAG LLM
TELEGRAM_ADMIN_BOT_TOKEN=1234567890:ABC...
TELEGRAM_OWNER_ID=1234567 # your numeric ID from @userinfobot
EOF
diff --git a/templates/config/telegram-bot.yaml b/templates/config/telegram-bot.yaml
index 66d0432..9f19cb3 100644
--- a/templates/config/telegram-bot.yaml
+++ b/templates/config/telegram-bot.yaml
@@ -3,6 +3,7 @@
# ------------------------------------------------------------
# Opinionated setup for a personal Telegram assistant:
# - Anthropic primary + Gemini Flash for classification
+# - OpenAI embeddings for LightRAG memory
# - Telegram gateway with a private admin DM + (optional) public bot
# - LightRAG memory backend
# - Sensible approval defaults
@@ -20,6 +21,8 @@ models:
prompt_caching: true
google:
api_key: ${GOOGLE_API_KEY}
+ openai:
+ api_key: ${OPENAI_API_KEY}
gateways:
cli:
From a11efb073db7623e63153482940afa8b6749afdc Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 21:04:49 +0000
Subject: [PATCH 7/9] docs: add openai provider to cost template
---
templates/config/cost-optimized.yaml | 2 ++
1 file changed, 2 insertions(+)
diff --git a/templates/config/cost-optimized.yaml b/templates/config/cost-optimized.yaml
index 478db84..0f759e7 100644
--- a/templates/config/cost-optimized.yaml
+++ b/templates/config/cost-optimized.yaml
@@ -25,6 +25,8 @@ models:
anthropic:
api_key: ${ANTHROPIC_API_KEY}
prompt_caching: true # 90% discount on repeat context
+ openai:
+ api_key: ${OPENAI_API_KEY} # Required for LightRAG embeddings below
moonshot:
api_key: ${MOONSHOT_API_KEY}
cerebras:
From a1106375db2af485ead74330b13f101b64ce343f Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 21:12:29 +0000
Subject: [PATCH 8/9] docs: normalize routing model identifiers
---
part20-observability.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/part20-observability.md b/part20-observability.md
index 0fa7fc6..94ee5d0 100644
--- a/part20-observability.md
+++ b/part20-observability.md
@@ -195,13 +195,13 @@ model_routing:
model: gemini-2.5-pro
provider: openrouter
- match: { intent: [write_code, refactor, debug], complexity: medium }
- model: zai/glm
+ model: glm
provider: zai
- match: { intent: [write_code, refactor, debug], complexity: high }
model: claude-sonnet
provider: anthropic
- match: { intent: [reasoning, math], complexity: high }
- model: openai-reasoning
+ model: reasoning
provider: openai
```
From 2627ba7cc4569fef1f643d53a76a544cbc577010 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 21:22:01 +0000
Subject: [PATCH 9/9] docs: keep github mcp package consistent
---
ECOSYSTEM.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ECOSYSTEM.md b/ECOSYSTEM.md
index 0e86b0c..dbcbc9c 100644
--- a/ECOSYSTEM.md
+++ b/ECOSYSTEM.md
@@ -7,7 +7,7 @@ The canonical "where do I find X for Hermes" directory. Maintained alongside the
## MCP Servers Worth Installing
### Official / reference
-- [`github/github-mcp-server`](https://github.com/github/github-mcp-server) — PRs, issues, code search, Actions
+- [`@modelcontextprotocol/server-github`](https://www.npmjs.com/package/@modelcontextprotocol/server-github) — PRs, issues, code search, Actions
- [`@modelcontextprotocol/server-filesystem`](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem) — read/write to scoped directories
- [`@modelcontextprotocol/server-postgres`](https://www.npmjs.com/package/@modelcontextprotocol/server-postgres) — read-only SQL
- [`@modelcontextprotocol/server-sqlite`](https://github.com/modelcontextprotocol/servers-archived/tree/main/src/sqlite) — local SQLite