Skip to content

Commit cd66e3e

Browse files
committed
feat: implement system diagnostic tool and expand agent capabilities with new tool integrations
1 parent 9b7d7fd commit cd66e3e

106 files changed

Lines changed: 10524 additions & 470 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Hermes-Agent Feature Ports — Design Spec
2+
3+
**Date:** 2026-04-08
4+
**Source:** NousResearch/hermes-agent v0.2.0–v0.7.0 release notes
5+
**Scope:** 7 low-to-medium effort features worth porting into Sofia
6+
7+
---
8+
9+
## Context
10+
11+
After reviewing all hermes-agent releases, several features stood out as high-value and not yet present in Sofia. Features already implemented in Sofia (stale file detection, pre-exec scanning, approval gate) were excluded. Features requiring major architectural changes (pluggable memory providers, profile system) were deferred. The 7 features below are focused, bounded, and independently implementable.
12+
13+
---
14+
15+
## Feature 1: Inline Diff Previews
16+
17+
### Problem
18+
`EditFileTool` and `WriteFileTool` return bare confirmation strings like `"File edited: /path"`. Neither the user nor the LLM can see what actually changed without re-reading the file.
19+
20+
### Design
21+
- `go-difflib` is already an indirect dependency via testify — promote to direct use.
22+
- In `EditFileTool.Run()` (`pkg/tools/edit.go`): after applying the replacement, generate a unified diff of original→modified content and append it to the tool result.
23+
- In `WriteFileTool.Run()` (`pkg/tools/filesystem.go`): read existing file content before overwriting (if file exists), generate diff, append to result. If file is new, return a `+++ (new file)` header.
24+
- Diff format: standard unified diff with 3 lines of context, capped at 100 lines total (truncated with a `... N more lines omitted` trailer if longer).
25+
- The diff is returned as part of the tool result string, visible to both the LLM and the user.
26+
27+
### Key files
28+
- `pkg/tools/edit.go``EditFileTool.Run()`
29+
- `pkg/tools/filesystem.go``WriteFileTool.Run()`
30+
- `go.mod` — add `github.com/pmezard/go-difflib` as direct dep
31+
32+
---
33+
34+
## Feature 2: Session Compression Config
35+
36+
### Problem
37+
All summarization thresholds in `loop_summarize.go` are hardcoded (75%/90% context triggers, protected head/tail sizes, tool result truncation limit). There is no way to tune these per-agent or globally without editing source.
38+
39+
### Design
40+
Add a `SummarizationConfig` struct to `pkg/config/config.go`, nested under the agent defaults:
41+
42+
```go
43+
type SummarizationConfig struct {
44+
ContextTriggerPct int `json:"context_trigger_pct,omitempty"` // default 75
45+
ForceTriggerPct int `json:"force_trigger_pct,omitempty"` // default 90
46+
ProtectHead int `json:"protect_head,omitempty"` // default 2
47+
ProtectTailPct int `json:"protect_tail_pct,omitempty"` // default 30
48+
MinTail int `json:"min_tail,omitempty"` // default 4
49+
ToolResultTruncateChars int `json:"tool_result_truncate_chars,omitempty"` // default 200
50+
}
51+
```
52+
53+
- Add `Summarization SummarizationConfig` to `AgentDefaults` and `AgentConfig` (per-agent override).
54+
- In `loop_summarize.go`, replace each hardcoded constant with a lookup: `agent.Summarization.ContextTriggerPct` falling back to the default if zero.
55+
- `AgentInstance` already has access to agent config — no new wiring needed.
56+
57+
### Key files
58+
- `pkg/config/config.go` — add `SummarizationConfig`, embed in defaults + per-agent
59+
- `pkg/agent/loop_summarize.go` — replace all hardcoded thresholds
60+
61+
---
62+
63+
## Feature 3: `/yolo` Approval Toggle
64+
65+
### Problem
66+
`ApprovalGate` has a rich config structure but there is no runtime way to bypass approval mid-session without restarting. Hermes uses `/yolo` for this.
67+
68+
### Design
69+
- Add `approvalBypass sync.Map` to `ApprovalGate` in `pkg/agent/approval.go`.
70+
- At the top of `RequiresApproval()`, check: if `approvalBypass.Load(sessionKey)` is true, return `false`.
71+
- Expose `SetBypass(sessionKey string, on bool)` method on `ApprovalGate`.
72+
- Add `/yolo [on|off]` to `loop_commands.go` following the same pattern as `/verbose`:
73+
- No args: report current state.
74+
- `on`: call `al.approvalGate.SetBypass(sessionKey, true)`, return confirmation.
75+
- `off`: call `al.approvalGate.SetBypass(sessionKey, false)`, return confirmation.
76+
- Bypass is session-scoped and non-persistent (resets on session restart).
77+
78+
### Key files
79+
- `pkg/agent/approval.go``approvalBypass sync.Map`, `SetBypass()`, check in `RequiresApproval()`
80+
- `pkg/agent/loop_commands.go``/yolo` handler
81+
- `pkg/agent/loop.go` — confirm `AgentLoop` holds a reference to `ApprovalGate`
82+
83+
---
84+
85+
## Feature 4: `/btw` Ephemeral Questions
86+
87+
### Problem
88+
Users sometimes want a quick side answer without polluting the session history (e.g., "what does this function do?" mid-task). Currently any message extends the history.
89+
90+
### Design
91+
- Supported in CLI and Web channels only. In gateway mode, `/btw` is treated as a regular message.
92+
- Detection: in `handleCommand()` in `loop_commands.go`, intercept `/btw <question>`. Extract the trailing text as the question.
93+
- Pass an `ephemeral bool` field through `processMessageOpts` (or equivalent options struct) to `runLLMIteration` in `loop_llm.go`.
94+
- In `runLLMIteration`, when `opts.Ephemeral` is true:
95+
- Build messages normally (using full session history as context).
96+
- After the LLM responds, **skip** `agent.Sessions.AddFullMessage()` for both the assistant message and any tool results.
97+
- Skip the summarization trigger check.
98+
- Prepend `[btw] ` to the response so the user can distinguish it visually.
99+
- The session history is unchanged after the exchange.
100+
101+
### Key files
102+
- `pkg/agent/loop_commands.go``/btw` detection and dispatch
103+
- `pkg/agent/loop_llm.go``opts.Ephemeral` flag, conditional `AddFullMessage()` skip
104+
- `pkg/agent/loop.go``processMessageOpts` struct update (or wherever opts are threaded)
105+
106+
---
107+
108+
## Feature 5: `@file` / `@url` Context Injection
109+
110+
### Problem
111+
Users have no way to inline file contents or URLs into a message without using a separate tool call. Hermes supports `@/path` and `@https://url` references anywhere in message text.
112+
113+
### Design
114+
- New function `enrichMessageContent(content string, workspacePath string) string` in a new file `pkg/agent/context_refs.go`.
115+
- Called in `processMessage()` in `loop_processing.go` after guardrails (secret scrubbing + PII redaction) but before routing — so the enriched content reaches the LLM.
116+
- **Parsing:** regex `@(\./[^\s]+|/[^\s]+|https?://[^\s]+)` finds all references inline.
117+
- **File references** (`@/abs/path` or `@./rel/path`):
118+
- Resolve relative paths against the agent's workspace root.
119+
- Reject any path that escapes the workspace (path traversal guard) — leave the token as-is.
120+
- Read up to 50 KB; if larger, truncate with a trailer.
121+
- Replace the token with a fenced block: `` `\n```\n<content>\n```\n` ``
122+
- **URL references** (`@https://...`):
123+
- Reuse the existing `WebFetchTool` fetch/markdown conversion logic.
124+
- Respect the same SSRF blocklist already enforced by the web tools.
125+
- Cap at 50 KB.
126+
- **Limits:** max 5 references per message. Extras are left as-is.
127+
- Errors (file not found, fetch failure) are noted inline: `[could not read @/path: file not found]`.
128+
129+
### Key files
130+
- `pkg/agent/context_refs.go` — new file, `enrichMessageContent()`
131+
- `pkg/agent/loop_processing.go` — call `enrichMessageContent()` after guardrails
132+
- `pkg/tools/web_fetch.go` — reuse fetch logic (or extract a shared helper)
133+
134+
---
135+
136+
## Feature 6: Per-Model Output Limits
137+
138+
### Problem
139+
`max_tokens` is set globally per agent. Model-specific limits (e.g., Anthropic's 128K cap on Opus 4.6, 64K on Sonnet 4.6) are not enforced. A misconfigured `max_tokens` can cause API errors.
140+
141+
### Design
142+
- `ModelConfig.MaxTokens` already exists in `pkg/config/config_providers.go` but is not wired into the LLM call.
143+
- In `loop_llm.go`, after building `llmOpts["max_tokens"] = agent.MaxTokens`, check if the resolved model config has a non-zero `MaxTokens` and use it as an override.
144+
- Add a `provider_defaults.go` (already untracked in git) for Anthropic-specific defaults:
145+
- `claude-opus-4` family → 128K output cap
146+
- `claude-sonnet-4` family → 64K output cap
147+
- Applied only when `ModelConfig.MaxTokens` is zero (user-set value wins).
148+
- **Anthropic 429 long-context handling:** when the Anthropic provider receives a 429 with `"long-context-tier"` in the error body, retry the request with `context_window` reduced to 200K. This goes in `pkg/providers/anthropic/provider.go`.
149+
150+
### Key files
151+
- `pkg/agent/loop_llm.go` — wire `ModelConfig.MaxTokens` into `llmOpts`
152+
- `pkg/config/provider_defaults.go` — already exists (untracked), add Anthropic model caps
153+
- `pkg/providers/anthropic/provider.go` — 429 long-context retry
154+
155+
---
156+
157+
## Feature 7: Reasoning Block Preservation (Anthropic)
158+
159+
### Problem
160+
The Anthropic provider's `parseResponse()` silently drops `thinking` content blocks — they are never parsed into `ReasoningContent`. Conversely, `buildParams()` never sends thinking blocks back in subsequent assistant messages. Extended thinking is therefore broken for multi-turn tool-use conversations.
161+
162+
### Design
163+
164+
**In `parseResponse()` (`pkg/providers/anthropic/provider.go`):**
165+
- Add `case "thinking":` to the content block switch.
166+
- Use `block.AsThinking()` to get the thinking block.
167+
- Concatenate into `ReasoningContent` (same as how `content` accumulates text blocks).
168+
169+
**In `buildParams()` (`pkg/providers/anthropic/provider.go`):**
170+
- For assistant messages where `msg.ReasoningContent != ""`, prepend a `ThinkingBlockParam` content block before the text/tool_use blocks.
171+
- The Anthropic SDK provides `anthropic.ThinkingBlockParam{Type: "thinking", ThinkingText: msg.ReasoningContent}` (verify exact API from SDK).
172+
173+
**Enabling extended thinking:**
174+
- Check `options["thinking_budget"]` (int). If non-zero, set `params.Thinking = anthropic.ThinkingParam{Type: "enabled", BudgetTokens: budget}` on the request params.
175+
- Wire `thinking_budget` from agent config into `llmOpts` in `loop_llm.go` (alongside `max_tokens`).
176+
- Add `ThinkingBudget int` to agent config (defaults to 0 = disabled).
177+
178+
**Note:** Requires the `anthropic-sdk-go` to expose `ThinkingBlockParam` — verify this exists in the current SDK version before implementing.
179+
180+
### Key files
181+
- `pkg/providers/anthropic/provider.go``parseResponse()` and `buildParams()`
182+
- `pkg/config/config.go``ThinkingBudget int` in agent config
183+
- `pkg/agent/loop_llm.go` — pass `thinking_budget` in `llmOpts`
184+
185+
---
186+
187+
## Verification
188+
189+
For each feature, test as follows:
190+
191+
1. **Inline diffs** — Edit a file via the `edit_file` tool. Confirm the tool result contains a `---`/`+++` unified diff. Test with a new file (write_file) — confirm `+++ (new file)` header.
192+
193+
2. **Compression config** — Set `context_trigger_pct: 50` in agent config. Send messages until history grows. Confirm summarization triggers earlier than the default 75%.
194+
195+
3. **`/yolo`** — Call `/yolo on`. Perform a tool call that normally requires approval. Confirm it runs without prompting. Call `/yolo off` and confirm approval returns.
196+
197+
4. **`/btw`** — Send `/btw what is 2+2?`. Confirm a response is returned. Check session history — confirm the exchange was not stored.
198+
199+
5. **`@file`/`@url`** — Send `summarize @/path/to/file`. Confirm the file contents are inlined in the system message sent to the LLM. Test with a non-workspace path — confirm it's left as-is.
200+
201+
6. **Per-model limits** — Set `max_tokens: 200000` for an Opus agent. Add a unit test in `pkg/providers/anthropic/` that confirms the resolved `max_tokens` in `buildParams()` is capped at 128K regardless of the input value.
202+
203+
7. **Reasoning preservation** — Enable `thinking_budget: 8000` for an Anthropic agent. Run a multi-turn conversation with tool calls. Confirm thinking blocks appear in `/verbose` output on each turn, not just the first.

pkg/agent/context.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ type ContextBuilder struct {
4242

4343
// Frozen memory snapshot: captured once per session to preserve prompt cache.
4444
// Mid-session memory writes update the database but don't invalidate the cached prompt.
45-
frozenMemory string
46-
frozenMemoryOnce sync.Once
45+
frozenMemory string
46+
frozenMemoryOnce sync.Once
4747
frozenMemoryVersion int64 // tracks memory version for auto-refresh
4848
}
4949

@@ -239,7 +239,7 @@ When delegating to subagents, tell them which skills to use: "Read workspace/ski
239239
// Mid-session memory writes go to disk but don't change the system prompt.
240240
// Auto-refreshes if memory version has changed significantly.
241241
currentVersion := cb.memory.GetVersion()
242-
if cb.frozenMemoryVersion == 0 || currentVersion - cb.frozenMemoryVersion > 5 {
242+
if cb.frozenMemoryVersion == 0 || currentVersion-cb.frozenMemoryVersion > 5 {
243243
// First call or memory has changed significantly (5+ updates)
244244
cb.frozenMemoryOnce = sync.Once{}
245245
cb.frozenMemoryOnce.Do(func() {

pkg/agent/context_cache.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,4 +239,3 @@ func skillFilesModifiedSince(skillsDir string, t time.Time) bool {
239239
}
240240
return changed
241241
}
242-

0 commit comments

Comments
 (0)