|
| 1 | +# Hermes-Agent Feature Ports — Design Spec |
| 2 | + |
| 3 | +**Date:** 2026-04-08 |
| 4 | +**Source:** NousResearch/hermes-agent v0.2.0–v0.7.0 release notes |
| 5 | +**Scope:** 7 low-to-medium effort features worth porting into Sofia |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Context |
| 10 | + |
| 11 | +After reviewing all hermes-agent releases, several features stood out as high-value and not yet present in Sofia. Features already implemented in Sofia (stale file detection, pre-exec scanning, approval gate) were excluded. Features requiring major architectural changes (pluggable memory providers, profile system) were deferred. The 7 features below are focused, bounded, and independently implementable. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Feature 1: Inline Diff Previews |
| 16 | + |
| 17 | +### Problem |
| 18 | +`EditFileTool` and `WriteFileTool` return bare confirmation strings like `"File edited: /path"`. Neither the user nor the LLM can see what actually changed without re-reading the file. |
| 19 | + |
| 20 | +### Design |
| 21 | +- `go-difflib` is already an indirect dependency via testify — promote to direct use. |
| 22 | +- In `EditFileTool.Run()` (`pkg/tools/edit.go`): after applying the replacement, generate a unified diff of original→modified content and append it to the tool result. |
| 23 | +- In `WriteFileTool.Run()` (`pkg/tools/filesystem.go`): read existing file content before overwriting (if file exists), generate diff, append to result. If file is new, return a `+++ (new file)` header. |
| 24 | +- Diff format: standard unified diff with 3 lines of context, capped at 100 lines total (truncated with a `... N more lines omitted` trailer if longer). |
| 25 | +- The diff is returned as part of the tool result string, visible to both the LLM and the user. |
| 26 | + |
| 27 | +### Key files |
| 28 | +- `pkg/tools/edit.go` — `EditFileTool.Run()` |
| 29 | +- `pkg/tools/filesystem.go` — `WriteFileTool.Run()` |
| 30 | +- `go.mod` — add `github.com/pmezard/go-difflib` as direct dep |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Feature 2: Session Compression Config |
| 35 | + |
| 36 | +### Problem |
| 37 | +All summarization thresholds in `loop_summarize.go` are hardcoded (75%/90% context triggers, protected head/tail sizes, tool result truncation limit). There is no way to tune these per-agent or globally without editing source. |
| 38 | + |
| 39 | +### Design |
| 40 | +Add a `SummarizationConfig` struct to `pkg/config/config.go`, nested under the agent defaults: |
| 41 | + |
| 42 | +```go |
| 43 | +type SummarizationConfig struct { |
| 44 | + ContextTriggerPct int `json:"context_trigger_pct,omitempty"` // default 75 |
| 45 | + ForceTriggerPct int `json:"force_trigger_pct,omitempty"` // default 90 |
| 46 | + ProtectHead int `json:"protect_head,omitempty"` // default 2 |
| 47 | + ProtectTailPct int `json:"protect_tail_pct,omitempty"` // default 30 |
| 48 | + MinTail int `json:"min_tail,omitempty"` // default 4 |
| 49 | + ToolResultTruncateChars int `json:"tool_result_truncate_chars,omitempty"` // default 200 |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +- Add `Summarization SummarizationConfig` to `AgentDefaults` and `AgentConfig` (per-agent override). |
| 54 | +- In `loop_summarize.go`, replace each hardcoded constant with a lookup: `agent.Summarization.ContextTriggerPct` falling back to the default if zero. |
| 55 | +- `AgentInstance` already has access to agent config — no new wiring needed. |
| 56 | + |
| 57 | +### Key files |
| 58 | +- `pkg/config/config.go` — add `SummarizationConfig`, embed in defaults + per-agent |
| 59 | +- `pkg/agent/loop_summarize.go` — replace all hardcoded thresholds |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Feature 3: `/yolo` Approval Toggle |
| 64 | + |
| 65 | +### Problem |
| 66 | +`ApprovalGate` has a rich config structure but there is no runtime way to bypass approval mid-session without restarting. Hermes uses `/yolo` for this. |
| 67 | + |
| 68 | +### Design |
| 69 | +- Add `approvalBypass sync.Map` to `ApprovalGate` in `pkg/agent/approval.go`. |
| 70 | +- At the top of `RequiresApproval()`, check: if `approvalBypass.Load(sessionKey)` is true, return `false`. |
| 71 | +- Expose `SetBypass(sessionKey string, on bool)` method on `ApprovalGate`. |
| 72 | +- Add `/yolo [on|off]` to `loop_commands.go` following the same pattern as `/verbose`: |
| 73 | + - No args: report current state. |
| 74 | + - `on`: call `al.approvalGate.SetBypass(sessionKey, true)`, return confirmation. |
| 75 | + - `off`: call `al.approvalGate.SetBypass(sessionKey, false)`, return confirmation. |
| 76 | +- Bypass is session-scoped and non-persistent (resets on session restart). |
| 77 | + |
| 78 | +### Key files |
| 79 | +- `pkg/agent/approval.go` — `approvalBypass sync.Map`, `SetBypass()`, check in `RequiresApproval()` |
| 80 | +- `pkg/agent/loop_commands.go` — `/yolo` handler |
| 81 | +- `pkg/agent/loop.go` — confirm `AgentLoop` holds a reference to `ApprovalGate` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## Feature 4: `/btw` Ephemeral Questions |
| 86 | + |
| 87 | +### Problem |
| 88 | +Users sometimes want a quick side answer without polluting the session history (e.g., "what does this function do?" mid-task). Currently any message extends the history. |
| 89 | + |
| 90 | +### Design |
| 91 | +- Supported in CLI and Web channels only. In gateway mode, `/btw` is treated as a regular message. |
| 92 | +- Detection: in `handleCommand()` in `loop_commands.go`, intercept `/btw <question>`. Extract the trailing text as the question. |
| 93 | +- Pass an `ephemeral bool` field through `processMessageOpts` (or equivalent options struct) to `runLLMIteration` in `loop_llm.go`. |
| 94 | +- In `runLLMIteration`, when `opts.Ephemeral` is true: |
| 95 | + - Build messages normally (using full session history as context). |
| 96 | + - After the LLM responds, **skip** `agent.Sessions.AddFullMessage()` for both the assistant message and any tool results. |
| 97 | + - Skip the summarization trigger check. |
| 98 | + - Prepend `[btw] ` to the response so the user can distinguish it visually. |
| 99 | +- The session history is unchanged after the exchange. |
| 100 | + |
| 101 | +### Key files |
| 102 | +- `pkg/agent/loop_commands.go` — `/btw` detection and dispatch |
| 103 | +- `pkg/agent/loop_llm.go` — `opts.Ephemeral` flag, conditional `AddFullMessage()` skip |
| 104 | +- `pkg/agent/loop.go` — `processMessageOpts` struct update (or wherever opts are threaded) |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## Feature 5: `@file` / `@url` Context Injection |
| 109 | + |
| 110 | +### Problem |
| 111 | +Users have no way to inline file contents or URLs into a message without using a separate tool call. Hermes supports `@/path` and `@https://url` references anywhere in message text. |
| 112 | + |
| 113 | +### Design |
| 114 | +- New function `enrichMessageContent(content string, workspacePath string) string` in a new file `pkg/agent/context_refs.go`. |
| 115 | +- Called in `processMessage()` in `loop_processing.go` after guardrails (secret scrubbing + PII redaction) but before routing — so the enriched content reaches the LLM. |
| 116 | +- **Parsing:** regex `@(\./[^\s]+|/[^\s]+|https?://[^\s]+)` finds all references inline. |
| 117 | +- **File references** (`@/abs/path` or `@./rel/path`): |
| 118 | + - Resolve relative paths against the agent's workspace root. |
| 119 | + - Reject any path that escapes the workspace (path traversal guard) — leave the token as-is. |
| 120 | + - Read up to 50 KB; if larger, truncate with a trailer. |
| 121 | + - Replace the token with a fenced block: `` `\n```\n<content>\n```\n` `` |
| 122 | +- **URL references** (`@https://...`): |
| 123 | + - Reuse the existing `WebFetchTool` fetch/markdown conversion logic. |
| 124 | + - Respect the same SSRF blocklist already enforced by the web tools. |
| 125 | + - Cap at 50 KB. |
| 126 | +- **Limits:** max 5 references per message. Extras are left as-is. |
| 127 | +- Errors (file not found, fetch failure) are noted inline: `[could not read @/path: file not found]`. |
| 128 | + |
| 129 | +### Key files |
| 130 | +- `pkg/agent/context_refs.go` — new file, `enrichMessageContent()` |
| 131 | +- `pkg/agent/loop_processing.go` — call `enrichMessageContent()` after guardrails |
| 132 | +- `pkg/tools/web_fetch.go` — reuse fetch logic (or extract a shared helper) |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Feature 6: Per-Model Output Limits |
| 137 | + |
| 138 | +### Problem |
| 139 | +`max_tokens` is set globally per agent. Model-specific limits (e.g., Anthropic's 128K cap on Opus 4.6, 64K on Sonnet 4.6) are not enforced. A misconfigured `max_tokens` can cause API errors. |
| 140 | + |
| 141 | +### Design |
| 142 | +- `ModelConfig.MaxTokens` already exists in `pkg/config/config_providers.go` but is not wired into the LLM call. |
| 143 | +- In `loop_llm.go`, after building `llmOpts["max_tokens"] = agent.MaxTokens`, check if the resolved model config has a non-zero `MaxTokens` and use it as an override. |
| 144 | +- Add a `provider_defaults.go` (already untracked in git) for Anthropic-specific defaults: |
| 145 | + - `claude-opus-4` family → 128K output cap |
| 146 | + - `claude-sonnet-4` family → 64K output cap |
| 147 | + - Applied only when `ModelConfig.MaxTokens` is zero (user-set value wins). |
| 148 | +- **Anthropic 429 long-context handling:** when the Anthropic provider receives a 429 with `"long-context-tier"` in the error body, retry the request with `context_window` reduced to 200K. This goes in `pkg/providers/anthropic/provider.go`. |
| 149 | + |
| 150 | +### Key files |
| 151 | +- `pkg/agent/loop_llm.go` — wire `ModelConfig.MaxTokens` into `llmOpts` |
| 152 | +- `pkg/config/provider_defaults.go` — already exists (untracked), add Anthropic model caps |
| 153 | +- `pkg/providers/anthropic/provider.go` — 429 long-context retry |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Feature 7: Reasoning Block Preservation (Anthropic) |
| 158 | + |
| 159 | +### Problem |
| 160 | +The Anthropic provider's `parseResponse()` silently drops `thinking` content blocks — they are never parsed into `ReasoningContent`. Conversely, `buildParams()` never sends thinking blocks back in subsequent assistant messages. Extended thinking is therefore broken for multi-turn tool-use conversations. |
| 161 | + |
| 162 | +### Design |
| 163 | + |
| 164 | +**In `parseResponse()` (`pkg/providers/anthropic/provider.go`):** |
| 165 | +- Add `case "thinking":` to the content block switch. |
| 166 | +- Use `block.AsThinking()` to get the thinking block. |
| 167 | +- Concatenate into `ReasoningContent` (same as how `content` accumulates text blocks). |
| 168 | + |
| 169 | +**In `buildParams()` (`pkg/providers/anthropic/provider.go`):** |
| 170 | +- For assistant messages where `msg.ReasoningContent != ""`, prepend a `ThinkingBlockParam` content block before the text/tool_use blocks. |
| 171 | +- The Anthropic SDK provides `anthropic.ThinkingBlockParam{Type: "thinking", ThinkingText: msg.ReasoningContent}` (verify exact API from SDK). |
| 172 | + |
| 173 | +**Enabling extended thinking:** |
| 174 | +- Check `options["thinking_budget"]` (int). If non-zero, set `params.Thinking = anthropic.ThinkingParam{Type: "enabled", BudgetTokens: budget}` on the request params. |
| 175 | +- Wire `thinking_budget` from agent config into `llmOpts` in `loop_llm.go` (alongside `max_tokens`). |
| 176 | +- Add `ThinkingBudget int` to agent config (defaults to 0 = disabled). |
| 177 | + |
| 178 | +**Note:** Requires the `anthropic-sdk-go` to expose `ThinkingBlockParam` — verify this exists in the current SDK version before implementing. |
| 179 | + |
| 180 | +### Key files |
| 181 | +- `pkg/providers/anthropic/provider.go` — `parseResponse()` and `buildParams()` |
| 182 | +- `pkg/config/config.go` — `ThinkingBudget int` in agent config |
| 183 | +- `pkg/agent/loop_llm.go` — pass `thinking_budget` in `llmOpts` |
| 184 | + |
| 185 | +--- |
| 186 | + |
| 187 | +## Verification |
| 188 | + |
| 189 | +For each feature, test as follows: |
| 190 | + |
| 191 | +1. **Inline diffs** — Edit a file via the `edit_file` tool. Confirm the tool result contains a `---`/`+++` unified diff. Test with a new file (write_file) — confirm `+++ (new file)` header. |
| 192 | + |
| 193 | +2. **Compression config** — Set `context_trigger_pct: 50` in agent config. Send messages until history grows. Confirm summarization triggers earlier than the default 75%. |
| 194 | + |
| 195 | +3. **`/yolo`** — Call `/yolo on`. Perform a tool call that normally requires approval. Confirm it runs without prompting. Call `/yolo off` and confirm approval returns. |
| 196 | + |
| 197 | +4. **`/btw`** — Send `/btw what is 2+2?`. Confirm a response is returned. Check session history — confirm the exchange was not stored. |
| 198 | + |
| 199 | +5. **`@file`/`@url`** — Send `summarize @/path/to/file`. Confirm the file contents are inlined in the system message sent to the LLM. Test with a non-workspace path — confirm it's left as-is. |
| 200 | + |
| 201 | +6. **Per-model limits** — Set `max_tokens: 200000` for an Opus agent. Add a unit test in `pkg/providers/anthropic/` that confirms the resolved `max_tokens` in `buildParams()` is capped at 128K regardless of the input value. |
| 202 | + |
| 203 | +7. **Reasoning preservation** — Enable `thinking_budget: 8000` for an Anthropic agent. Run a multi-turn conversation with tool calls. Confirm thinking blocks appear in `/verbose` output on each turn, not just the first. |
0 commit comments