docs: add Session god-object decomposition design sketch#33
Merged
Conversation
ai_passage.md was a 53-line, ~1000-word essay on the history and ethics of AI in general — entirely unrelated to the hawk project, no README/AGENTS.md/CHANGELOG.md reference to it. It looks like LLM-generated filler committed in '99261ca Fix CI formatting and toolchain hygiene' to satisfy a 'must have an essay' requirement that no longer applies. Untrack and delete.
…revert failure
Three related fixes in the hawk agent loop:
1. **stream.go Stream retry no longer blocks ctx cancellation.**
Previously: `time.Sleep(time.Duration(streamAttempt+1) * time.Second)` — a
user-initiated cancel during a reasoning-only retry's backoff was ignored for
up to 3s. Now uses a select with ctx.Done() so cancel is observed immediately,
the in-flight stream is closed, and the loop exits with a structured error event.
2. **stream_tool_exec.go Self-review-before-apply now returns a hard error
when the revert itself fails.** Previously: if the LLM said 'this diff is bad'
and we tried to revert (os.Remove or os.WriteFile) and that syscall failed, the
code only logged s.log.Warn and proceeded with the rejected diff still on disk.
Now: capture the revert error, log it at Error level, and surface a hard tool
error ('Self-review rejected the change AND the revert failed: <err>. Manual
intervention required.') so the LLM can't continue building on top of code it
just flagged as broken.
3. **internal/tool/tool.go now exports ReadOnlyTools + IsReadOnly() so the
safe-concurrent allowlist has a single source of truth.** The map was previously
duplicated in stream.go:716 and stream_tool_exec.go:29 with identical content
— drift waiting to happen. Both call sites now go through tool.IsReadOnly(name),
which canonicalises aliases (read/file_read, ls, etc.) before lookup. A new
test (TestIsReadOnly, TestReadOnlyToolsSetContainsExpectedNames) locks the
contract so a future removal of e.g. 'Read' from the allowlist fails CI in
internal/tool/ rather than silently breaking concurrency classification in
internal/engine/.
hawk/internal/engine/session.go has 35 collaborators, 30 of them optional *T pointer fields. The agentLoop is an 833-line function that reads ~25 of these fields, creating a hidden web of dependencies. New contributors can't tell which field is required vs optional vs diagnostic, and unit tests can't construct a Session without wiring all 30 fields. This is a design proposal (NOT yet implemented) for breaking Session into 6 cohesive sub-services: - ChatService (150 LOC) — LLM transport, retry, circuit-breaker - MemoryService (200 LOC) — yaad bridge, recall, sleeptime, skill distill - ToolService (400 LOC) — registry, 15-stage post-call pipeline - PermissionService (150 LOC) — guardian, approval, budget tracking - LifecycleService (250 LOC) — self-improvement, beliefs, doom-loop - PersistenceService (300 LOC) — checkpoint, compaction, branching DAG The proposed 7-PR migration sequence is documented in the doc. Phases 2-7 are each independent and additive; each gets its own focused test suite. The 4 concrete fixes in the current PR (time.Sleep, ReadOnlyTools, AGENTS.md, self-review revert) are independent of this refactor and ship ahead of it. See: docs/session-decomposition.md (this file)
2 tasks
Patel230
added a commit
that referenced
this pull request
Jun 13, 2026
Bump external/eyrie to dce29ff (#33 squash) and external/trace to 6feded7 so all submodule checkouts match their origin/main heads. Co-authored-by: Cursor <cursoragent@cursor.com>
Patel230
added a commit
that referenced
this pull request
Jun 15, 2026
* docs: remove generic LLM boilerplate ai_passage.md
ai_passage.md was a 53-line, ~1000-word essay on the history and
ethics of AI in general — entirely unrelated to the hawk project,
no README/AGENTS.md/CHANGELOG.md reference to it. It looks like
LLM-generated filler committed in '99261ca Fix CI formatting and
toolchain hygiene' to satisfy a 'must have an essay' requirement
that no longer applies. Untrack and delete.
* fix(engine): respect ctx cancel on stream retry, surface self-review revert failure
Three related fixes in the hawk agent loop:
1. **stream.go Stream retry no longer blocks ctx cancellation.**
Previously: `time.Sleep(time.Duration(streamAttempt+1) * time.Second)` — a
user-initiated cancel during a reasoning-only retry's backoff was ignored for
up to 3s. Now uses a select with ctx.Done() so cancel is observed immediately,
the in-flight stream is closed, and the loop exits with a structured error event.
2. **stream_tool_exec.go Self-review-before-apply now returns a hard error
when the revert itself fails.** Previously: if the LLM said 'this diff is bad'
and we tried to revert (os.Remove or os.WriteFile) and that syscall failed, the
code only logged s.log.Warn and proceeded with the rejected diff still on disk.
Now: capture the revert error, log it at Error level, and surface a hard tool
error ('Self-review rejected the change AND the revert failed: <err>. Manual
intervention required.') so the LLM can't continue building on top of code it
just flagged as broken.
3. **internal/tool/tool.go now exports ReadOnlyTools + IsReadOnly() so the
safe-concurrent allowlist has a single source of truth.** The map was previously
duplicated in stream.go:716 and stream_tool_exec.go:29 with identical content
— drift waiting to happen. Both call sites now go through tool.IsReadOnly(name),
which canonicalises aliases (read/file_read, ls, etc.) before lookup. A new
test (TestIsReadOnly, TestReadOnlyToolsSetContainsExpectedNames) locks the
contract so a future removal of e.g. 'Read' from the allowlist fails CI in
internal/tool/ rather than silently breaking concurrency classification in
internal/engine/.
* docs: add Session god-object decomposition design sketch
hawk/internal/engine/session.go has 35 collaborators, 30 of them optional
*T pointer fields. The agentLoop is an 833-line function that reads ~25 of
these fields, creating a hidden web of dependencies. New contributors can't
tell which field is required vs optional vs diagnostic, and unit tests can't
construct a Session without wiring all 30 fields.
This is a design proposal (NOT yet implemented) for breaking Session into 6
cohesive sub-services:
- ChatService (150 LOC) — LLM transport, retry, circuit-breaker
- MemoryService (200 LOC) — yaad bridge, recall, sleeptime, skill distill
- ToolService (400 LOC) — registry, 15-stage post-call pipeline
- PermissionService (150 LOC) — guardian, approval, budget tracking
- LifecycleService (250 LOC) — self-improvement, beliefs, doom-loop
- PersistenceService (300 LOC) — checkpoint, compaction, branching DAG
The proposed 7-PR migration sequence is documented in the doc. Phases 2-7
are each independent and additive; each gets its own focused test suite.
The 4 concrete fixes in the current PR (time.Sleep, ReadOnlyTools, AGENTS.md,
self-review revert) are independent of this refactor and ship ahead of it.
See: docs/session-decomposition.md (this file)
Patel230
added a commit
that referenced
this pull request
Jun 15, 2026
Bump external/eyrie to dce29ff (#33 squash) and external/trace to 6feded7 so all submodule checkouts match their origin/main heads.
Patel230
added a commit
that referenced
this pull request
Jun 18, 2026
* docs: remove generic LLM boilerplate ai_passage.md
ai_passage.md was a 53-line, ~1000-word essay on the history and
ethics of AI in general — entirely unrelated to the hawk project,
no README/AGENTS.md/CHANGELOG.md reference to it. It looks like
LLM-generated filler committed in '99261ca Fix CI formatting and
toolchain hygiene' to satisfy a 'must have an essay' requirement
that no longer applies. Untrack and delete.
* fix(engine): respect ctx cancel on stream retry, surface self-review revert failure
Three related fixes in the hawk agent loop:
1. **stream.go Stream retry no longer blocks ctx cancellation.**
Previously: `time.Sleep(time.Duration(streamAttempt+1) * time.Second)` — a
user-initiated cancel during a reasoning-only retry's backoff was ignored for
up to 3s. Now uses a select with ctx.Done() so cancel is observed immediately,
the in-flight stream is closed, and the loop exits with a structured error event.
2. **stream_tool_exec.go Self-review-before-apply now returns a hard error
when the revert itself fails.** Previously: if the LLM said 'this diff is bad'
and we tried to revert (os.Remove or os.WriteFile) and that syscall failed, the
code only logged s.log.Warn and proceeded with the rejected diff still on disk.
Now: capture the revert error, log it at Error level, and surface a hard tool
error ('Self-review rejected the change AND the revert failed: <err>. Manual
intervention required.') so the LLM can't continue building on top of code it
just flagged as broken.
3. **internal/tool/tool.go now exports ReadOnlyTools + IsReadOnly() so the
safe-concurrent allowlist has a single source of truth.** The map was previously
duplicated in stream.go:716 and stream_tool_exec.go:29 with identical content
— drift waiting to happen. Both call sites now go through tool.IsReadOnly(name),
which canonicalises aliases (read/file_read, ls, etc.) before lookup. A new
test (TestIsReadOnly, TestReadOnlyToolsSetContainsExpectedNames) locks the
contract so a future removal of e.g. 'Read' from the allowlist fails CI in
internal/tool/ rather than silently breaking concurrency classification in
internal/engine/.
* docs: add Session god-object decomposition design sketch
hawk/internal/engine/session.go has 35 collaborators, 30 of them optional
*T pointer fields. The agentLoop is an 833-line function that reads ~25 of
these fields, creating a hidden web of dependencies. New contributors can't
tell which field is required vs optional vs diagnostic, and unit tests can't
construct a Session without wiring all 30 fields.
This is a design proposal (NOT yet implemented) for breaking Session into 6
cohesive sub-services:
- ChatService (150 LOC) — LLM transport, retry, circuit-breaker
- MemoryService (200 LOC) — yaad bridge, recall, sleeptime, skill distill
- ToolService (400 LOC) — registry, 15-stage post-call pipeline
- PermissionService (150 LOC) — guardian, approval, budget tracking
- LifecycleService (250 LOC) — self-improvement, beliefs, doom-loop
- PersistenceService (300 LOC) — checkpoint, compaction, branching DAG
The proposed 7-PR migration sequence is documented in the doc. Phases 2-7
are each independent and additive; each gets its own focused test suite.
The 4 concrete fixes in the current PR (time.Sleep, ReadOnlyTools, AGENTS.md,
self-review revert) are independent of this refactor and ship ahead of it.
See: docs/session-decomposition.md (this file)
Patel230
added a commit
that referenced
this pull request
Jun 18, 2026
Bump external/eyrie to dce29ff (#33 squash) and external/trace to 6feded7 so all submodule checkouts match their origin/main heads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A design proposal (NOT yet implemented) for breaking the 35-collaborator
Sessionstruct athawk/internal/engine/session.go:42-141into 6 cohesive sub-services:ChatService(150 LOC) — LLM transport, retry, circuit-breakerMemoryService(200 LOC) — yaad bridge, recall, sleeptime, skill distillToolService(400 LOC) — registry, 15-stage post-call pipelinePermissionService(150 LOC) — guardian, approval, budget trackingLifecycleService(250 LOC) — self-improvement, beliefs, doom-loopPersistenceService(300 LOC) — checkpoint, compaction, branching DAGWhy
Sessionis a god object.agentLoopis an 833-line function that reads ~25 fields. New contributors can't tell which field is required vs optional vs diagnostic, and unit tests can't construct aSessionwithout wiring all 30 fields. Every new feature adds another field.Status
Design proposal only. No code changes. The 4 concrete fixes in the companion PR
fix/agent-loop-safety-and-tool-classificationship ahead of this refactor and are independent of it.The proposed 7-PR migration sequence (Phases 2-7) is laid out in the doc. Each phase is independent and additive; each gets its own focused test suite. Estimated ~5 days of focused refactor work + tests.
Files
docs/session-decomposition.md(new, 256 lines) — full design with proposed sub-service file paths, method signatures, the slimmed-downSessionshape, the 250-line refactoredagentLoop, testing strategy, open questions, and effort estimate.Seeking
Comments on the decomposition boundary, the proposed sub-service responsibilities, and whether anyone wants to start Phase 2 (
ChatServiceextraction) before or after other work.