Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions docs/session-decomposition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
# Session God-Object Decomposition — Design Sketch

> Status: **DRAFT / NOT YET IMPLEMENTED**
> Author: opencode session
> Date: 2026-06-12
> Scope: `hawk/internal/engine/session.go` (the 35-collaborator `Session` struct)

## Problem

`Session` at `hawk/internal/engine/session.go:42-141` is a god object:

- 35 fields, ~30 of them `*T` pointer collaborators with optional behavior
- Every new feature adds a field (Beliefs, Critic, Trajectory, Steer­ing, etc.)
- Construction requires wiring 30+ fields; unit testing is impractical
- `NewSessionWithClient` (lines 149-187) already has a 35-line constructor body
- The `AgentLoop` 833-line function reads/writes ~25 of these fields, creating a hidden web of dependencies
- New contributors can't tell which field is required vs optional vs diagnostic

## Goal

Break `Session` into ~6 cohesive sub-services, each with:
- a single responsibility
- an interface (so tests can use stubs)
- a clear lifecycle (created once in `NewSessionWithClient`, passed to `agentLoop`)

The `agentLoop` should consume these sub-services as named dependencies — no implicit `s.Beliefs.Size()` reach-throughs.

## Proposed Decomposition

### 1. `ChatService` — owns the LLM transport

**Owns:** `client ChatClient`, `provider string`, `model string`, `apiKeys map[string]string`, `Router *modelPkg.Router`, `DeploymentRouting bool`, `RateLimiter *ratelimit.Limiter`, `OutputSchema string`, `GLMThinkingEnabled *bool`, `ContCfg types.ContinuationConfig`, `ChatOptions{}` builder.

**Methods:**
- `BuildOptions(systemPrompt string, tools []client.EyrieTool, activeModel string, maxTokens int) types.ChatOptions`
- `Chat(ctx, msgs, opts) (*types.EyrieResponse, error)` — non-streaming, used by background goroutines
- `Stream(ctx, msgs, opts) (*types.StreamResult, error)` — with retry, circuit-breaker record, rate-limit wait
- `RecordSuccess(latency time.Duration)` / `RecordFailure(err error)`
- `BuildContinuationConfig() types.ContinuationConfig`

**File:** `hawk/internal/engine/chat_service.go` (~150 LOC)

### 2. `MemoryService` — owns yaad bridge + recall/remember

**Owns:** `Memory MemoryRecaller`, `YaadBridge *memory.YaadBridge`, `EnhancedMemory *memory.EnhancedMemoryManager`, `SkillDistiller *memory.SkillDistiller`, `Sleeptime *memory.SleeptimeAgent`, `Activity *memory.ActivityTracker`, `AgentsAccum *prompts.AgentsAccum`, `FewShotStore *FewShotStore`, `AdaptivePrompt *AdaptivePrompt`.

**Methods:**
- `RecallContext(ctx, lastUserMsg string, budget int) (string, error)` — unifies yaad + few-shot + agents-accum
- `Remember(ctx, content, category string)` — wraps memory.Remember
- `RunSleeptimeConsolidation(ctx, provider ChatService, messages []types.EyrieMessage)` — background
- `RunSkillDistillation(ctx, provider ChatService, ...)` — background
- `RecordFeedback(ctx, content string, action string)` — wraps EnhancedMemory feedback
- `ShouldRemember(text string) bool` — heuristic for auto-remember

**File:** `hawk/internal/engine/memory_service.go` (~200 LOC)

### 3. `ToolService` — owns the registry + tool execution

**Owns:** `registry *tool.Registry`, `ContainerExecutor tool.ContainerExecutor`, `ContainerRequired bool`, `Snapshots SnapshotTracker`, `BackgroundManager *tool.BackgroundAgentManager`, `AllowedDirs []string`, `Protected PathProtector`.

**Methods:**
- `Classify(calls []types.ToolCall) (concurrent, sequential []types.ToolCall)` — uses `tool.IsReadOnly`
- `ExecuteAll(ctx, calls []types.ToolCall, ch chan<- StreamEvent, turn int, intent string) []toolExecResult` — the 15-stage post-call pipeline moved out of `stream_tool_exec.go`
- `EstimateBlastRadius(calls []types.ToolCall) BlastReport`
- `SpawnBackgroundAgent(ctx, prompt string) (id string, err error)`
- `PollBackgroundAgent(id string) (output string, done bool, err error)`

**File:** `hawk/internal/engine/tool_service.go` (~400 LOC — biggest, owns the 15-stage pipeline)

### 4. `PermissionService` — owns the safety layer

**Owns:** `Perm *PermissionEngine`, `Permissions *PermissionMemory`, `AutoMode *permissions.AutoModeState`, `Classifier *permissions.Classifier`, `BypassKill *permissions.BypassKillswitch`, `Mode PermissionMode`, `Autonomy AutonomyLevel`, `PermissionFn func(PermissionRequest)`, `Approval *ApprovalGate`, `MaxBudgetUSD float64`.

**Methods:**
- `CheckTool(ctx, info ToolCallInfo) (granted bool, denyMsg string)` — wraps Perm.CheckTool + Approval.CheckApproval
- `ExceedsBudget(cost float64) bool`
- `RecordToolCall(name string)`
- `RecordCost(cost float64)`
- `SetMode(mode string) error` — validates + applies
- `SetMaxTurns(n int)`, `SetMaxBudgetUSD(usd float64)`

**File:** `hawk/internal/engine/permission_service.go` (~150 LOC)

### 5. `LifecycleService` — owns the self-improvement loop

**Owns:** `Lifecycle *SessionLifecycle`, `Reflector *Reflector`, `Beliefs *BeliefState`, `Backtrack *BacktrackEngine`, `Limits *LimitTracker`, `Critic *Critic`, `Backtrack`, `Trajectory *TrajectoryDistiller`, `Shadow *branching.ShadowWorkspace`, `Cascade *branching.CascadeRouter`, `Steering *SteeringQueue`, `LoopDet *LoopDetector`, `Snowball *branching.SnowballDetector`.

**Methods:**
- `OnSessionStart(ctx, lastUserMsg string) (learnedCtx string)`
- `OnSessionEnd(ctx, success bool, duration time.Duration, messages []types.EyrieMessage)`
- `SelectModel(taskType, currentModel, hint string) string` — wraps Cascade
- `OnToolFailure(ctx, intent string, history []types.EyrieMessage, errOutput string) (*Reflection, error)`
- `RecordDecision(turn int, toolNames string, msgs []types.EyrieMessage)`
- `CheckLimits(turnCount int) (cont bool, reason string)` — wraps Limits
- `RecordTurn(turn int)`, `RecordToolCall(name string)`
- `DetectDoomLoop(steps []ToolStep) bool`
- `InjectSteering(messages []types.EyrieMessage) []types.EyrieMessage`

**File:** `hawk/internal/engine/lifecycle_service.go` (~250 LOC)

### 6. `PersistenceService` — owns checkpoint, session, yaad snapshot

**Owns:** `persistID string`, `checkpointMgr *session.CheckpointManager`, `lastPromptTokens int`, `lastCompletionTokens int`, `ConvoDAG *storage.DAG`, `OnCompaction OnCompaction`, `PinnedMessages int`, `AutoCompactThresholdPct int`, `ContextWindowCached int`, `AutoCompactor *AutoCompactor`, `CompactSplit`, `CompactProviderNative`, `CompactStrategyEngine`, `Files *FileTracker`.

**Methods:**
- `PersistID() string`, `SetPersistID(id string)`
- `ManageContextBeforeTurn(ctx) (strategy string, compacted bool)`
- `Compact()`, `SmartCompact()`, `ShouldAutoCompact()`, `ShouldCompactByBudget()`
- `AddUser(content string)`, `AddAssistant(content string)`, `AddUserWithImage(...)`, `AddUserWithAttachment(...)`
- `AddUserWithDocumentText(content, label, extracted string)`
- `AppendSystemContext(content string)`, `ReplaceSystemContextSection(header, content string)`
- `Messages() []types.EyrieMessage`, `RawMessages()`, `MessageCount()`, `LoadMessages([]types.EyrieMessage)`
- `RemoveLastExchange()`
- `ConvoHead() string`, `ForkConversation(nodeID string) (string, error)`, `SwitchBranch(nodeID string) error`, `ListBranches(nodeID string) ([]*storage.DAGNode, error)`
- `CheckpointOnCompaction(strategy string, before, after int, manual bool)`
- `TokenTracking() (prompt, completion int)`, `RecordAPIUsage(prompt, completion int)`

**File:** `hawk/internal/engine/persistence_service.go` (~300 LOC)

## What stays on `Session`

After the refactor, `Session` becomes a thin facade:

```go
type Session struct {
// Identity (still here)
ID string
Provider string
Model string
SystemPrompt string
// The 6 sub-services
Chat *ChatService
Memory *MemoryService
Tools *ToolService
Permission *PermissionService
Lifecycle *LifecycleService
Persistence *PersistenceService
// Observability pass-through
Tracer *oteltrace.Tracer
Metrics *metrics.Registry
Log *logger.Logger
// Loop control (still on Session because the agent loop reads them every turn)
Sandbox *DiffSandbox
Plan *PlanState
OutputSchema string
Teach TeachConfig
}
```

~15 fields total, down from 35. The 6 sub-services are constructed once in `NewSessionWithClient` and stay for the session's lifetime.

## `agentLoop` becomes a 250-line orchestrator

```go
func (s *Session) agentLoop(ctx context.Context, ch chan<- StreamEvent) {
defer close(ch)
defer s.Lifecycle.OnSessionEnd(ctx, ...)

hooks.ExecuteAsync(ctx, hooks.EventSessionStart, ...)
s.Lifecycle.OnSessionStart(ctx, s.lastUserMessage())

s.Persistence.ManageContextBeforeTurn(ctx)
msgs := s.Persistence.Messages()
sysCtx, err := s.Memory.RecallContext(ctx, s.lastUserMessage(), 2000)
if err == nil && sysCtx != "" {
s.Persistence.AppendSystemContext(sysCtx)
}

activeModel := s.Lifecycle.SelectModel(taskType, s.Model, "")

for turn := 0; ; turn++ {
if !s.Lifecycle.CheckLimits(turn) { return }

opts := s.Chat.BuildOptions(s.system, s.Tools.Registry().EyrieTools(), activeModel, maxTok)
result, err := s.Chat.Stream(ctx, s.Persistence.Messages(), opts)
if err != nil { /* error path */ }
defer result.Close()

// Consume stream: textContent, toolCalls, sawThinking, stopReason
...

if len(toolCalls) == 0 {
s.Persistence.AddAssistant(textContent)
s.Memory.AutoRemember(ctx, textContent)
ch <- StreamEvent{Type: "done"}
return
}
s.Persistence.AddAssistantWithToolUse(textContent, toolCalls)
results := s.Tools.ExecuteAll(ctx, toolCalls, ch, turn, textContent)
s.Persistence.AddToolResults(results)
}
}
```

The loop body shrinks from 800 lines to ~250 because:
- 15-stage tool pipeline moves to `ToolService.ExecuteAll`
- 20+ post-call side effects (beliefs, memory, critic, shadow, sandbox, snapshot, agents-accum) move to sub-service methods
- Stream retry with ctx-cancel moves to `ChatService.Stream`
- Permission/approval flow moves to `PermissionService.CheckTool`

## Migration Plan

1. **Phase 1** (this PR's 4 fixes): just the time.Sleep, ReadOnlyTools, AGENTS.md, self-review fixes. **No Session refactor yet.**
2. **Phase 2** (separate PR): extract `ChatService` first because it has the fewest field touches. Verify `agentLoop` builds and tests pass.
3. **Phase 3**: extract `MemoryService` (most fields, but only consumed in the preamble/postamble of `agentLoop`).
4. **Phase 4**: extract `PermissionService` (smallest surface).
5. **Phase 5**: extract `LifecycleService` (biggest behavioral surface, but mostly reads `messages` and `tools`).
6. **Phase 6**: extract `PersistenceService` (last because it owns the `messages []types.EyrieMessage` field which is touched everywhere).
7. **Phase 7**: extract `ToolService` (depends on 4, 5; do last).
8. **Phase 8**: delete fields from `Session`, add backwards-compatible shims that delegate to the new services, mark for removal in v0.2.0.

Each phase is a separate PR with a focused test suite.

## Testing Strategy

After the refactor, the following tests should be trivial (impossible to write today):

```go
func TestAgentLoopRespectsContextCancellation(t *testing.T) {
chat := &mockChatService{streamErr: ctx.Err()}
session := &Session{Chat: chat, /* no other services wired */}
stream, _ := session.Stream(ctx)
// Assert: loop exits on first event
}

func TestToolExecutionRejectsWithoutPermission(t *testing.T) {
perm := &mockPermissionService{granted: false, denyMsg: "nope"}
session := &Session{Permission: perm}
result := session.Tools.ExecuteAll(ctx, calls, ch, 0, "")
// Assert: all results are isErr=true with "nope" content
}

func TestMemoryRecallReturnsEmptyWhenNoBridge(t *testing.T) {
session := &Session{Memory: &MemoryService{}} // no YaadBridge
ctx, _ := session.Memory.RecallContext(ctx, "test", 100)
assert.Equal(t, "", ctx)
}
```

These tests don't need to construct a `Session` anymore; they can construct just the sub-service under test.

## Open Questions

1. Should `SubSession` (the sub-agent factory at `session.go:207-216`) become a method on `Session` or move to a dedicated `SubAgent` service? Currently it copies the parent's LLM transport — that's a `ChatService` concern, not a `Session` concern. Proposal: move to `SubAgentService`.
2. Should the sub-service interfaces be exported (so external test packages can mock them) or kept internal? The `MockChatService` test above would require either exporting or building test-helper shims.
3. Where do `OutputSchema`, `GLMThinkingEnabled`, `TeachConfig`, `Sandbox`, `Plan` belong? They're all chat/options concerns. Proposal: fold into `ChatService` or extract a `SessionOptions` struct that `ChatService.BuildOptions` reads.

## Estimated Effort

- Phase 1: ✅ done (this PR)
- Phases 2-7: ~6 PRs, ~1500 LOC moved, ~800 LOC of `agentLoop` removed, ~600 LOC of new sub-service files added
- Total: ~5 days of focused refactor work + tests

## Status

**NOT YET IMPLEMENTED.** The above is a design proposal pending review. The 4 concrete fixes (time.Sleep, ReadOnlyTools, AGENTS.md, self-review revert) are merged independently of this refactor.
12 changes: 9 additions & 3 deletions internal/engine/stream.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"github.com/GrayCodeAI/hawk/internal/hooks"
"github.com/GrayCodeAI/hawk/internal/observability/oteltrace"
"github.com/GrayCodeAI/hawk/internal/resilience/retry"
"github.com/GrayCodeAI/hawk/internal/tool"
)

// Stream runs the agentic loop: LLM → tool_use → execute → loop.
Expand Down Expand Up @@ -476,7 +477,13 @@ func (s *Session) agentLoop(ctx context.Context, ch chan<- StreamEvent) {
"reason": retryReason,
"error": streamErr.Error(),
})
time.Sleep(time.Duration(streamAttempt+1) * time.Second)
select {
case <-time.After(time.Duration(streamAttempt+1) * time.Second):
case <-ctx.Done():
ch <- StreamEvent{Type: "error", Content: "stream retry cancelled: " + ctx.Err().Error()}
result.Close()
return
}

// Notify consumer to discard previously streamed content for this turn.
ch <- StreamEvent{Type: "retry", Content: fmt.Sprintf("retrying after %s (attempt %d)", retryReason, streamAttempt+2)}
Expand Down Expand Up @@ -713,9 +720,8 @@ func (s *Session) agentLoop(ctx context.Context, ch chan<- StreamEvent) {
// Auto-snapshot after write operations for granular undo
if s.Snapshots != nil && len(toolCalls) > 0 {
var writeNames []string
safeConcurrent := map[string]bool{"Read": true, "Grep": true, "Glob": true, "LS": true, "WebSearch": true, "WebFetch": true, "ToolSearch": true}
for _, tc := range toolCalls {
if !safeConcurrent[tc.Name] {
if !tool.IsReadOnly(tc.Name) {
writeNames = append(writeNames, tc.Name)
}
}
Expand Down
44 changes: 24 additions & 20 deletions internal/engine/stream_tool_exec.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,8 @@ type toolExecResult struct {

// classifyToolCalls splits tool calls into concurrent (read-only) and sequential (write) batches.
func classifyToolCalls(calls []types.ToolCall) (concurrent, sequential []types.ToolCall) {
safeConcurrent := map[string]bool{"Read": true, "Grep": true, "Glob": true, "LS": true, "WebSearch": true, "WebFetch": true, "ToolSearch": true}
for _, tc := range calls {
if safeConcurrent[tc.Name] {
if tool.IsReadOnly(tc.Name) {
concurrent = append(concurrent, tc)
} else {
sequential = append(sequential, tc)
Expand Down Expand Up @@ -207,28 +206,33 @@ func (s *Session) executeSingleTool(ctx context.Context, tc types.ToolCall, ch c
if newContent, readErr := readFileContent(preEditPath); readErr == nil && newContent != preEditContent {
reviewResult, reviewErr := ReviewBeforeWrite(ctx, s.client, s.model, intentText, preEditPath, preEditContent, newContent)
if reviewErr == nil && reviewResult != nil && !reviewResult.Approved {
// Revert the file to its original state
// Revert the file to its original state. If revert fails we
// MUST surface that as a hard tool error: silently leaving
// the rejected diff on disk would let a downstream turn
// build on top of code the LLM just said was wrong.
var revertErr error
if preEditContent == "" {
if removeErr := os.Remove(preEditPath); removeErr != nil {
s.log.Warn("failed to remove file during self-review revert", map[string]interface{}{
"path": preEditPath,
"error": removeErr.Error(),
})
}
revertErr = os.Remove(preEditPath)
} else {
if writeErr := os.WriteFile(preEditPath, []byte(preEditContent), 0o644); writeErr != nil {
s.log.Warn("failed to revert file during self-review", map[string]interface{}{
"path": preEditPath,
"error": writeErr.Error(),
})
}
revertErr = os.WriteFile(preEditPath, []byte(preEditContent), 0o644)
}
issueStr := "Self-review found issues: " + strings.Join(reviewResult.Issues, "; ")
if len(reviewResult.Suggestions) > 0 {
issueStr += ". Suggestions: " + strings.Join(reviewResult.Suggestions, "; ")
if revertErr != nil {
s.log.Error("self-review revert failed; rejecting diff loudly", map[string]interface{}{
"path": preEditPath,
"error": revertErr.Error(),
})
output = fmt.Sprintf("Self-review rejected the change AND the revert failed: %s. "+
"Original review issues: %s. Manual intervention required.",
revertErr.Error(), strings.Join(reviewResult.Issues, "; "))
isErr = true
} else {
issueStr := "Self-review found issues: " + strings.Join(reviewResult.Issues, "; ")
if len(reviewResult.Suggestions) > 0 {
issueStr += ". Suggestions: " + strings.Join(reviewResult.Suggestions, "; ")
}
output = issueStr + ". Please fix these issues and try again."
isErr = true
}
output = issueStr + ". Please fix these issues and try again."
isErr = true
} else if reviewErr == nil && reviewResult != nil && reviewResult.Approved {
// Append diff summary to output for TUI display
diffSummary := generateDiffSummary(preEditContent, newContent, preEditPath)
Expand Down
Loading
Loading