Skip to content

Commit ddd32f1

Browse files
johnteeeclaude
andcommitted
feat: harness-first direction batch — event spine M0, scoped approvals, docs/tests restructure
Direction (owner-ratified 2026-06-13): - docs/strategy/harness-first-direction-2026-06-13.md: TeaAgent is harness-first; persona = owner-operator; UX evidence = operator friction log + competitor-UX survey hypotheses; README and product-contract repositioned (external-adoption personas moved to aspirational) Architecture: - ADR 0032 (Accepted): RunEvent taxonomy + EventSpine — interceptor veto, consumer isolation, sync-first, monotonic seq; M0 dual-write in AgentRunner at 8 audit sites (audit calls untouched); new tests/lifecycle tier (8 tests) Approvals (TASK-008): - New pre-run scoped approval intake --approve-scoped TOOL:SHA256 binding tool name + canonical args digest (excludes model-controlled call_id); flows through the existing scoped-approval store (mint + single-use consume); audit records scope=payload_digest - Flagship proofs (first-hour e2e, five-minute proof, demo script) migrated off deprecated --approve-call-id with all approval assertions intact; flagship runs no longer hit the deprecated path - TASK-004 first attempt rejected and reverted (it would have removed the approval pillar from the proof); analysis in docs/work-log/task-004-blocked-2026-06-13.md Docs/tests restructure: - Three-tier docs classification (constitution / working / archive) in the inventory and aging dashboard; archive tier exempt from staleness checks; read-only --check pre-commit gate for inventory freshness - Test typing classifier (contract/behavior/adversarial/lifecycle) in audit_test_quality.py with per-type reporting; test_type marker registered Also archives the 2026-06-12 intent-critical-review (parallel-agent artifact). Constraint: zero behavior change for runs without --approve-scoped; event spine is dual-write only; flagship test assertions unweakened Tested: scoped-preapproval suite 9 passed; flagship 4 passed under -W error:preapproved_call_ids; demo 6/6 steps; p0+policy+lifecycle 85 passed; smoke 200; acceptance tier 646/646; ruff check+format clean; mypy clean (1007 files); docs validator 0 errors Not-tested: full unit suite (tiered per WDG-002); CI execution on GitHub Confidence: high Roadmap-Status: unchanged Agent: Claude Code (orchestrating six haiku subagent lanes; every lane output reviewed, one rejected and redone) Agent-Session: harness-first execution batch, 2026-06-13 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent d736a1b commit ddd32f1

40 files changed

Lines changed: 3204 additions & 696 deletions

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,12 @@ repos:
4848
entry: bash -c 'if [ "${TEAAGENT_PRECOMMIT_FULL:-0}" = 1 ]; then env UV_CACHE_DIR=.uv-cache uv run pytest -q; else env UV_CACHE_DIR=.uv-cache uv run pytest tests/test_p0_harness.py tests/test_surface_auth_hardening.py tests/test_policy.py tests/test_phase5_context_bus.py tests/test_governance_hardening.py tests/regression/ tests/acceptance/test_subagent_lineage_flow.py -q; fi'
4949
language: system
5050
pass_filenames: false
51+
- id: check-docs-inventory
52+
name: check-docs-inventory
53+
# Fail the commit if docs/generated/docs-inventory.md is stale. --check is
54+
# read-only: it never writes, so a failure forces the committer to run the
55+
# generator and stage the result. (A regenerate-then-check entry would pass
56+
# trivially while leaving the regenerated file unstaged.)
57+
entry: python3 scripts/generate_docs_inventory.py --check
58+
language: system
59+
pass_filenames: false

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# TeaAgent
22

3-
> **Last reviewed:** 2026-06-06
3+
> **Last reviewed:** 2026-06-13
44
> **Review trigger:** README feature claims, golden path, or provider count changes.
5+
> **Direction record:** [Harness-First Direction](docs/strategy/harness-first-direction-2026-06-13.md) (owner-operator persona, aspirational adoption)
56
6-
Governance-first agent harness for autonomous coding tasks. Thin orchestration layer with tool governance, state boundaries, audit logging, and destructive-tool approval.
7+
A personal, local-first governance harness for autonomous coding tasks — built by and for the owner-operator who maintains, uses, and audits his own runs. Thin orchestration layer with tool governance, state boundaries, audit logging, and destructive-tool approval.
78

8-
**TeaAgent is not** a generic IDE agent clone or hosted cloud delegate. It is a local-first harness you operate — with explicit permission modes, hash-chained audit logs, and verification commands a security reviewer can run. See [When Not to Use TeaAgent](docs/guides/when-not-to-use-teaagent.md) for honest non-fit cases.
9+
**TeaAgent is not** a generic IDE agent clone, enterprise multi-user platform, or hosted cloud delegate. It is a local-first harness you operate — with explicit permission modes, hash-chained audit logs, and verification commands a security reviewer can run. See [When Not to Use TeaAgent](docs/guides/when-not-to-use-teaagent.md) for honest non-fit cases.
910

1011
## Governance-first harness
1112

@@ -19,7 +20,7 @@ Governance-first agent harness for autonomous coding tasks. Thin orchestration l
1920

2021
Trust model: [Trust and Audit Whitepaper](docs/governance/trust-and-audit-whitepaper.md). Enterprise NIST mapping: [Security Whitepaper](docs/security-whitepaper.md).
2122

22-
**Start by persona:** [Solo CLI](docs/guides/getting-started-solo-cli.md) · [Team operator](docs/guides/getting-started-team-operator.md) · [Tool/plugin author](docs/guides/getting-started-tool-plugin-author.md) · [Security reviewer](docs/guides/getting-started-security-reviewer.md)
23+
**Getting started:** [Owner-operator quickstart](docs/guides/getting-started-solo-cli.md) · [Tool/plugin author](docs/guides/getting-started-tool-plugin-author.md) · [Security reviewer](docs/guides/getting-started-security-reviewer.md)
2324

2425
## What makes it different
2526

docs/INDEX.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ governance ledgers should prefer the canonical set above.
5858

5959
| Topic | Evidence package |
6060
| --- | --- |
61+
| **June 12 reflective intent critical review** | [Intent Critical Review and Worklist 2026-06-12](analysis/intent-critical-review-and-worklist-2026-06-12.md) |
6162
| **June 10 system critical review package (current)** | [System Critical Review Package 2026-06-10](analysis/system-critical-review-2026-06-10-INDEX.md) |
6263
| June 10 engineering critique refresh | [Engineering Architecture Critique Refresh](analysis/engineering-critique-refresh-2026-06-10.md) |
6364
| June 10 remote multi-agent readiness refresh | [Remote Multi-Agent Readiness Refresh](analysis/remote-multi-agent-readiness-refresh-2026-06-10.md) |
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# ADR 0032: Run Event Taxonomy and Event Spine
2+
3+
## Status
4+
5+
Accepted — owner-approved 2026-06-13 (unblocks M1: AuditLogger as consumer)
6+
7+
## Date
8+
9+
2026-06-13
10+
11+
## Context
12+
13+
Three parallel half-systems currently handle run-lifecycle events, making it difficult to reason about governance, audit, and receipts as a unified concern:
14+
15+
1. **Audit strings** (`audit.record('run_started', ...)` etc.) — scattered call sites, implicit taxonomy of event names, consumed by receipts and evidence.
16+
2. **HookRegistry** (teaagent/hooks.py) — Claude-Code-compatible hook events (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SubagentStop, SessionEnd), wired only at the tool boundary, carries veto semantics via HookError.
17+
3. **ContextBus** (teaagent/context_bus.py) — separate event mechanism for deltas.
18+
19+
Meanwhile, every governance gate (approval, budget, plan, tool policy) is inlined in AgentRunner (runner/_core.py), which creates a gravity well and makes testing governance independently difficult. The control-loop ownership map (docs/architecture/control-loop-ownership-map-2026-06-11.md) identifies this as a core architectural pain point.
20+
21+
## Decision
22+
23+
Introduce a typed **run-lifecycle event spine** with explicit event taxonomy and two subscriber classes:
24+
25+
### 1. RunEvent Type System
26+
27+
Define a `RunEventType(str, Enum)` whose members are seeded from:
28+
- The union of existing audit event names (run_started, iteration_started, tool_call_completed, tool_call_failed, context_compacted, validation_started)
29+
- The run-lifecycle taxonomy from harness-first-direction §6.3 (plan_resolved, decision_received, tool_call_requested, budget_checkpoint, context_compacted, iteration_completed, final_validation, run_completed, run_failed, run_pending_approval, run_cancelled, receipt_emitted, session_start, session_end, etc.)
30+
31+
Minimal M0 set for this spike:
32+
- `RUN_STARTED` — run begins
33+
- `ITERATION_STARTED` — iteration loop begins
34+
- `TOOL_CALL_REQUESTED` — tool call requested (before gates)
35+
- `TOOL_CALL_COMPLETED` — tool call succeeded
36+
- `TOOL_CALL_FAILED` — tool call errored
37+
- `RUN_COMPLETED` — run ends successfully
38+
- `RUN_FAILED` — run ends in failure
39+
40+
(Extendable; the full taxonomy is defined in this ADR and documented in code comments.)
41+
42+
### 2. Event Spine Architecture
43+
44+
**RunEvent dataclass** (frozen, immutable):
45+
```
46+
type: RunEventType
47+
run_id: str
48+
payload: Mapping[str, Any] # typed payload; structure per event type
49+
seq: int # monotonic sequence number per spine instance
50+
```
51+
52+
**EventSpine class** (sync-first, in-process, deterministic):
53+
```
54+
register_interceptor(fn, *, name: str) -> None
55+
# Callable[[RunEvent], None]; may raise to veto
56+
# Interceptors run in registration order before consumers
57+
# Exceptions propagate (veto semantics)
58+
59+
register_consumer(fn, *, name: str) -> None
60+
# Callable[[RunEvent], None]; never veto
61+
# Consumers run after interceptors
62+
# Exceptions are caught, logged, and isolated (never affect run)
63+
64+
emit(event: RunEvent) -> None
65+
# Fire an event: run interceptors in order, then consumers
66+
# If any interceptor raises, propagate immediately (no further subscribers run)
67+
# If any consumer raises, log and continue
68+
# Return normally on success or after isolated consumer failure
69+
```
70+
71+
### 3. Subscriber Semantics
72+
73+
**Interceptors:**
74+
- Represent governance gates (plan validation, approval, budget, policy)
75+
- Run in declared order before any consumer sees the event
76+
- May raise any exception (converted to DenialReasonCode if ToolPermissionError or similar)
77+
- Exception from interceptor halts the spine (veto)
78+
- Used to enforce hard constraints
79+
80+
**Consumers:**
81+
- Represent audit, receipt building, evidence, ContextBus, webhook sinks
82+
- Run after all interceptors complete
83+
- Each wrapped in try/except (exception logged via logging module, never propagates)
84+
- Never affect the run (crash-safe)
85+
- Used for side effects and derived state
86+
87+
### 4. HookRegistry Alignment
88+
89+
Existing Claude-Code hook names (SessionStart, PreToolUse, PostToolUse, etc.) are preserved as **aliases** to RunEventType members where semantically equivalent (e.g., PRE_TOOL_USE ← PreToolUse). The public hook API (teaagent/hooks.py) will be re-homed onto the spine in a later migration step (M5).
90+
91+
### 5. Compliance with ADR 0030
92+
93+
New code lives inside the existing `teaagent/runner/` package (teaagent/runner/_events.py) — no new root module. The module freeze is respected.
94+
95+
## Rationale
96+
97+
- **Single contract**: One typed enum replaces three implicit taxonomies; claim-testable, refactorable, and extensible.
98+
- **Determinism**: Sync-first, in-process, no threads — deterministic for tests, safe for receipts.
99+
- **Veto clarity**: Interceptor ordering and exception semantics are explicit, enabling governance gates to be extracted without rewriting the runner.
100+
- **Gradual migration**: Dual-write (M0) allows the old audit.record() paths to coexist with new events, so the migration is strangler-safe.
101+
- **Test leverage**: Lifecycle tests can assert event sequences instead of implementation internals, decoupling tests from runner refactors.
102+
103+
## Implementation
104+
105+
### Phase M0 (this ADR, this spike)
106+
107+
1. Define `RunEventType(str, Enum)` and `RunEvent` dataclass in teaagent/runner/_events.py.
108+
2. Define `EventSpine` class with register_interceptor, register_consumer, emit semantics.
109+
3. Add optional `event_spine: EventSpine | None` parameter to AgentRunner (default: fresh spine, no subscribers).
110+
4. At existing audit.record call sites, **dual-write**: emit corresponding RunEvent (audit calls unchanged).
111+
5. Lifecycle tests assert the event sequence for the five-minute-proof scenario.
112+
6. Acceptance tier stays green.
113+
114+
### Future Phases (M1–M6)
115+
116+
| Step | Change | Invariant |
117+
| --- | --- | --- |
118+
| M1 | AuditLogger becomes a consumer (serializes RunEvents to JSONL) | Byte-equivalent audit on proof scenario |
119+
| M2 | Receipts/evidence fold over event stream | Receipt completeness guaranteed structurally |
120+
| M3 | Plan gate moves to interceptor | Same denials, same reason codes |
121+
| M4 | Approval and budget gates to interceptors | Same semantics, extracted from runner |
122+
| M5 | HookRegistry re-homed onto spine; public hook API documented | Existing hook tests pass via aliases |
123+
| M6 | ContextBus + webhook sinks consume spine; inline emission paths deleted | No orphaned eventing modules |
124+
125+
## Consequences
126+
127+
**Positive:**
128+
- Unified event contract enables incremental gate extraction without rewriting AgentRunner.
129+
- Governance gates become testable independently via lifecycle assertions.
130+
- Receipts/audit can be derived from a single immutable event stream (M2+), eliminating synthetic-vs-real gaps.
131+
- Hook ordering and error semantics are explicit and stable for the public API.
132+
133+
**Negative:**
134+
- M0 dual-write adds ~5 lines per call site (acceptable; temporary until M1).
135+
- EventSpine is new infrastructure; must be proven correct before gates migrate to interceptors.
136+
- Full governance-gate extraction (M3–M4) is multi-phase and requires consecutive landing without behavioral changes (per stop-rule in strategy doc §6.4).
137+
138+
## Alternatives Considered
139+
140+
1. **Extend HookRegistry instead of creating EventSpine**: HookRegistry is Claude-Code-specific and tool-boundary-scoped; the spine covers the full run lifecycle and cannot be scoped to tools. Separate design avoids conflating concerns.
141+
142+
2. **Async event sink**: Async sinks (queue-based consumers) would enable webhook delivery and distributed audit. Rejected at M0 for determinism: tests must not depend on timing. Async can be added at M2+ if friction evidence justifies it.
143+
144+
3. **Fold events into context/observations**: Events would become observation slots instead of a separate spine. Rejected: observations are model-visible; governance events must be opaque to the model and ordered by the harness.
145+
146+
## References
147+
148+
- [Harness-First Direction §6](../strategy/harness-first-direction-2026-06-13.md#6-core-architecture-one-event-spine-gates-as-interceptors)
149+
- [Control-Loop Ownership Map §6.1](../architecture/control-loop-ownership-map-2026-06-11.md)
150+
- [ADR 0030: Root-Module Freeze](0030-root-module-freeze.md)
151+
- [ADR 0009: 5-Loop Governance System](0009-five-loop-governance.md)
152+
153+
## Full Event Taxonomy (M0 + Planned)
154+
155+
```
156+
RUN_STARTED # Run begins; payload: run_id, task, model, etc.
157+
SESSION_START # Session begins (alias: SessionStart)
158+
PLAN_RESOLVED # Plan loaded/validated
159+
ITERATION_STARTED # Iteration loop begins
160+
DECISION_RECEIVED # Model returns a decision (tool call or final answer)
161+
TOOL_CALL_REQUESTED # Tool call identified (before gates)
162+
TOOL_CALL_APPROVED # Approval gate approved
163+
TOOL_CALL_DENIED # Approval gate denied
164+
TOOL_CALL_COMPLETED # Tool call succeeded
165+
TOOL_CALL_FAILED # Tool call errored
166+
CONTEXT_COMPACTED # Context compaction occurred
167+
BUDGET_CHECKPOINT # Budget check (not veto; informational)
168+
ITERATION_COMPLETED # Iteration loop ends
169+
FINAL_VALIDATION # Final answer validation
170+
RUN_COMPLETED # Run ends successfully
171+
RUN_FAILED # Run ends in failure
172+
RUN_PENDING_APPROVAL # Run paused for approval
173+
RUN_CANCELLED # Run cancelled by user
174+
RECEIPT_EMITTED # Receipt finalized
175+
SESSION_END # Session ends (alias: SessionEnd)
176+
SKILL_LOAD # Skill loaded
177+
MODEL_ROUTE # Model routed (provider selection)
178+
GIT_SANDBOX_STARTED # Sandbox workspace initialized
179+
GIT_SANDBOX_RESOLVED # Sandbox resolved/cleaned
180+
UNDO_PERFORMED # Undo action executed
181+
PRE_TOOL_USE # Hook: before tool execution (alias: PreToolUse)
182+
POST_TOOL_USE # Hook: after tool execution (alias: PostToolUse)
183+
PRE_COMPACT # Hook: before context compaction (alias: PreCompact)
184+
```
185+
186+
The M0 spike covers RUN_STARTED, ITERATION_STARTED, TOOL_CALL_REQUESTED, TOOL_CALL_COMPLETED, TOOL_CALL_FAILED, RUN_COMPLETED, RUN_FAILED. Extended events are added in later phases as gates migrate.

docs/adr/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ This directory contains all Architecture Decision Records (ADRs) for the TeaAgen
3333
| 0025 | Shared ChatSessionController for Chat Surfaces | Accepted and Implemented | 2026-06-01 | 2026-06-04 13:18:00 +0800 |
3434
| 0029 | Consensus Validation Deferred Behind Approval Queue | Accepted | 2026-06-10 | 2026-12-10 (expiry review) |
3535
| 0031 | Shadow Mode Exit Criteria | Proposed | 2026-06-12 | 2026-09-12 (expiry review) |
36+
| 0032 | Run Event Taxonomy and Event Spine | Accepted | 2026-06-13 | - |
3637

3738
## ADR Categories
3839

@@ -48,13 +49,14 @@ This directory contains all Architecture Decision Records (ADRs) for the TeaAgen
4849
- **0007**: ANP Adapter Boundary - External federation boundary
4950
- **0008**: P4 Strategic Posture - Storage, TLS, P2P auth posture
5051

51-
### Governance Hardening (0009, 0022-0024, 0029, 0031)
52+
### Governance Hardening (0009, 0022-0024, 0029, 0031-0032)
5253
- **0009**: 5-Loop Governance System - Comprehensive governance loops
5354
- **0022**: Centralized Approval Queue for Subagents - Batch approval management
5455
- **0023**: Strict Plan-Before-Write Enforcement - Plan validation
5556
- **0024**: Automated Memory Invalidation - Memory hygiene
5657
- **0029**: Consensus Validation Deferred Behind Approval Queue - Consensus gate deferral
5758
- **0031**: Shadow Mode Exit Criteria - Policy/RBAC shadow→enforce promotion path
59+
- **0032**: Run Event Taxonomy and Event Spine - Unified run-lifecycle event contract
5860

5961
### Multi-Agent & Swarm (0019)
6062
- **0019**: Phase 4 - Federated Swarm Consensus & Peer Attestations - Swarm coordination

0 commit comments

Comments
 (0)