Skip to content

Commit 3edc078

Browse files
docs: expand lifecycle hooks guide with competitive positioning (#3317)
Expands docs/hooks-guide.md with full 29-event taxonomy, side-by-side competitive comparisons, feature matrix, circuit breaker docs, and MCP bidirectional integration explanation. Closes #3316
1 parent 25fb62d commit 3edc078

1 file changed

Lines changed: 251 additions & 43 deletions

File tree

β€Ždocs/hooks-guide.mdβ€Ž

Lines changed: 251 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,83 @@
11
# Lifecycle Hooks Guide
22

3-
Aegis captures Claude Code lifecycle events (tool use, permission requests, session stops) via HTTP hooks and exposes them through SSE streams, webhooks, and the REST API. This guide covers how hooks work, how to configure them, and how Aegis's hook system differs from alternatives.
3+
Aegis captures Claude Code lifecycle events via HTTP hooks and enriches them with MCP tool integration, SSE streaming, multi-channel delivery, and enterprise-grade security. This guide covers how hooks work, how to configure them, and how Aegis's architecture compares to alternatives.
4+
5+
> **For positioning context**, see the [Competitive Threat Matrix](./competitive-threat-matrix.md).
46
57
## Overview
68

7-
When Claude Code runs a session, it emits lifecycle events at key points:
9+
When Claude Code runs a session, it emits lifecycle events at key points. Claude Code supports three native hook types:
10+
11+
| Type | Mechanism | Scope |
12+
|------|-----------|-------|
13+
| **Command** | Shell script, receives JSON on stdin | Local machine |
14+
| **HTTP** | POST to a URL with JSON body | Network-accessible |
15+
| **Prompt** | LLM prompt injection | In-process |
16+
17+
Aegis uses **HTTP hooks** exclusively β€” registering a single endpoint (`POST /v1/hooks/:eventName`) that receives all 29+ CC lifecycle events. This gives Aegis a centralized event bus that no shell-only or config-only approach can match.
18+
19+
## Complete Event Reference
20+
21+
Aegis handles all Claude Code lifecycle events. Here's the full taxonomy:
22+
23+
### Session Lifecycle
24+
25+
| Event | Trigger | Aegis Action |
26+
|-------|---------|-------------|
27+
| `SessionStart` | Session begins or resumes | Track session state |
28+
| `SessionEnd` | Session terminates | Clean up resources, emit final metrics |
29+
| `Setup` | `--init-only` or `--maintenance` mode | One-time CI preparation |
30+
| `Stop` | Claude finishes responding | Detect waiting-for-input, emit `session.idle` |
31+
| `StopFailure` | Turn ends due to API error | Circuit breaker protection (see below) |
32+
33+
### Tool Lifecycle (Agentic Loop)
834

935
| Event | Trigger | Aegis Action |
1036
|-------|---------|-------------|
11-
| `PreToolUse` | Before a tool executes | Evaluate permission policy, approve or reject |
12-
| `PostToolUse` | After a tool completes | Record tool usage, emit SSE event |
13-
| `PostToolUseFailure` | After a tool fails | Log failure, emit error event |
14-
| `PermissionRequest` | CC asks for user approval | Route to dashboard / Telegram / Slack for human decision |
15-
| `Stop` | Session completes | Clean up resources, emit session.idle event |
37+
| `PreToolUse` | Before a tool call executes | Permission policy evaluation β†’ approve/deny, OTel span start |
38+
| `PostToolUse` | After a tool call succeeds | Record metrics, emit SSE, OTel span close |
39+
| `PostToolUseFailure` | After a tool call fails | Log failure, emit error event, OTel span with error |
40+
| `PostToolBatch` | After a parallel batch resolves | Batch metrics recording |
41+
| `PermissionRequest` | Permission dialog appears | Route to dashboard/Telegram/Slack for human decision |
42+
| `PermissionDenied` | Auto-mode classifier denies tool | Emit denial event for audit |
1643

17-
Aegis registers these hooks automatically when creating a session. You don't need to configure Claude Code hooks manually β€” Aegis manages the entire lifecycle.
44+
### Agent Orchestration
45+
46+
| Event | Trigger | Aegis Action |
47+
|-------|---------|-------------|
48+
| `SubagentStart` | Subagent spawned | Track active subagents, emit `subagent_start` SSE |
49+
| `SubagentStop` | Subagent finishes | Remove subagent tracking, emit `subagent_stop` SSE |
50+
| `TaskCreated` | Task created via `TaskCreate` | Status β†’ `working` |
51+
| `TaskCompleted` | Task marked complete | Status β†’ `idle` |
52+
| `TeammateIdle` | Agent team teammate goes idle | Status β†’ `idle` |
53+
54+
### Context Management
55+
56+
| Event | Trigger | Aegis Action |
57+
|-------|---------|-------------|
58+
| `PreCompact` | Before context compaction | Update activity timestamp, status β†’ `compacting` |
59+
| `PostCompact` | After context compaction | Update activity timestamp, status β†’ `idle` |
60+
| `UserPromptSubmit` | User submits a prompt | Status β†’ `working` |
61+
| `UserPromptExpansion` | Slash command expands | Informational |
62+
63+
### File & Environment
64+
65+
| Event | Trigger | Aegis Action |
66+
|-------|---------|-------------|
67+
| `FileChanged` | Watched file changes on disk | Informational, forward to SSE |
68+
| `CwdChanged` | Working directory changes | Informational, forward to SSE |
69+
| `ConfigChange` | Configuration file changes | Informational, forward to SSE |
70+
| `InstructionsLoaded` | CLAUDE.md or rules file loaded | Informational, forward to SSE |
71+
| `Notification` | CC sends a notification | Forward to SSE + channels |
72+
73+
### Worktree & MCP
74+
75+
| Event | Trigger | Aegis Action |
76+
|-------|---------|-------------|
77+
| `WorktreeCreate` | Worktree being created | Status β†’ `working`, log |
78+
| `WorktreeRemove` | Worktree being removed | Status β†’ `idle`, log |
79+
| `Elicitation` | MCP server requests user input | Status β†’ `working` |
80+
| `ElicitationResult` | User responds to MCP elicitation | Status β†’ `working` |
1881

1982
## How It Works
2083

@@ -26,24 +89,32 @@ When Aegis creates a Claude Code session, it registers HTTP hooks pointing to `P
2689
Claude Code β†’ HTTP POST β†’ Aegis /v1/hooks/PreToolUse β†’ Permission policy evaluation β†’ Approve/Reject
2790
```
2891

92+
You don't need to configure Claude Code hooks manually β€” Aegis manages the entire lifecycle.
93+
2994
### Event Flow
3095

96+
Every hook event passes through a five-stage pipeline:
97+
3198
```
3299
CC Session Event
33100
↓
34-
Aegis Hook Endpoint (/v1/hooks/:eventName)
35-
β”œβ”€β”€ Permission Guard β†’ approve / reject
36-
β”œβ”€β”€ Tool Registry β†’ record metrics
37-
β”œβ”€β”€ SSE Emitter β†’ broadcast to dashboard
38-
β”œβ”€β”€ Channel Manager β†’ fan-out to Telegram/Slack/Email
39-
└── OTel Tracing β†’ create tool spans
101+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
102+
β”‚ 1. VALIDATE β€” Zod schema check (hookBodySchema) β”‚
103+
β”‚ 2. AUTHENTICATE β€” X-Hook-Secret timing-safe comparison β”‚
104+
β”‚ 3. DECIDE β€” Permission policy evaluation β”‚
105+
β”‚ 4. OBSERVE β€” OTel spans, Prometheus metrics β”‚
106+
β”‚ 5. BROADCAST β€” SSE + channels (Telegram/Slack/Email) β”‚
107+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
40108
```
41109

42-
Every hook event is:
43-
1. **Validated** β€” checked against `hookBodySchema`
44-
2. **Authenticated** β€” verified via `X-Hook-Secret`
45-
3. **Acted on** β€” permission decisions, metric recording, event broadcasting
46-
4. **Traced** β€” OpenTelemetry spans for observability
110+
### Decision Events
111+
112+
Two hook events require a response body that Claude Code acts on:
113+
114+
- **`PreToolUse`** β€” Aegis evaluates the tool against the session's permission profile. Returns `allow`, `deny`, or `ask` (escalate to human).
115+
- **`PermissionRequest`** β€” Aegis checks the session's permission mode. Auto-approve modes (`bypassPermissions`, `dontAsk`, `acceptEdits`, `auto`) respond immediately. Others wait for a human decision via dashboard or chat.
116+
117+
All other events receive `{ ok: true }` and are processed asynchronously.
47118

48119
## Configuration
49120

@@ -67,20 +138,39 @@ AEGIS_HOOK_SECRET_HEADER_ONLY=true
67138

68139
This rejects the deprecated `?secret=` query parameter and prevents secret leakage in URLs/logs.
69140

70-
## Security Model
141+
### Circuit Breaker (StopFailure Protection)
142+
143+
When a user-defined Stop hook returns `ok: false`, Claude Code retries in an infinite loop. Aegis detects this and trips a **circuit breaker**:
144+
145+
| Variable | Default | Range | Description |
146+
|---|---|---|---|
147+
| `HOOK_CIRCUIT_BREAKER_MAX` | `5` | 1–100 | Failures before breaker trips |
148+
| `HOOK_CIRCUIT_BREAKER_WINDOW_MS` | `60000` | 1000–3600000 | Sliding window (ms) |
149+
150+
After the threshold is reached, Aegis returns `{ ok: true }` to break the retry loop and emits a `circuit_breaker` SSE event. The breaker stays tripped for the session's lifetime. A successful `Stop` event resets it.
151+
152+
### Answer Timeout (AskUserQuestion)
153+
154+
When Claude Code asks a question via `AskUserQuestion`, Aegis can intercept and answer from external clients:
155+
156+
| Variable | Default | Range | Description |
157+
|---|---|---|---|
158+
| `ANSWER_TIMEOUT_MS` | `30000` | 1000–600000 | How long to wait for an external answer |
71159

72-
Aegis hooks are designed for production security:
160+
## Security Model
73161

74162
| Feature | Description |
75-
|---------|-------------|
163+
|---------|---------|
76164
| **Secret authentication** | `X-Hook-Secret` header validates inbound hook calls |
77165
| **Header-only mode** | Prevents secret leakage via URL query parameters |
78166
| **Permission policies** | `PreToolUse` hooks evaluate tool access against configurable policies |
79167
| **Audit logging** | Every hook event is recorded in the audit trail with hash chain integrity |
80168
| **Rate limiting** | Per-IP rate limits prevent hook endpoint abuse |
81169
| **Payload validation** | All hook bodies validated against strict Zod schemas |
82-
| **Circuit breaker** | Detects rapid `Stop` hook failures and trips breaker to prevent session death loops |
170+
| **Circuit breaker** | Detects rapid `StopFailure` events and trips breaker to prevent session death loops |
83171
| **Payload truncation protection** | Warns when hook payloads exceed 1.5KB (CC silently truncates at ~2KB) |
172+
| **Session validation** | Rejects non-UUID session IDs before lookup |
173+
| **Event allowlist** | Unknown event names return `400` β€” prevents injection |
84174

85175
## Observability
86176

@@ -102,7 +192,7 @@ AEGIS_OTEL_ENABLED=true AEGIS_OTEL_OTLP_ENDPOINT=http://localhost:4318 ag
102192
Hook events are broadcast via SSE in real-time:
103193

104194
- `GET /v1/events` β€” global event stream (requires SSE token)
105-
- `GET /v1/sessions/:id/events` β€” per-session event stream
195+
- `GET /v1/sessions/:id/sse` β€” per-session event stream
106196

107197
### Prometheus Metrics
108198

@@ -127,34 +217,152 @@ curl http://localhost:9100/v1/webhooks/dead-letter \
127217
-H "Authorization: Bearer $TOKEN"
128218
```
129219

130-
## Comparison: Aegis vs Shell/HTTP Hook Systems
220+
---
221+
222+
## Competitive Comparison: Why Aegis Hooks Win
223+
224+
> This section is for technical decision-makers evaluating orchestration tools. For broader competitive context, see the [Competitive Threat Matrix](./competitive-threat-matrix.md).
225+
226+
### The Architecture Gap
227+
228+
Claude Code hooks are a **point-to-point mechanism**: CC fires an event, one handler responds. Most orchestration tools use this directly β€” a shell script or HTTP callback that makes a binary allow/deny decision.
229+
230+
Aegis layers a **service mesh** on top of that mechanism:
231+
232+
```
233+
Shell-only tools: CC ──hook──▢ Shell script (allow/deny)
234+
HTTP-only tools: CC ──hook──▢ HTTP handler (allow/deny)
235+
Aegis: CC ──hook──▢ Hook endpoint ──┬── Permission policy
236+
β”œβ”€β”€ OTel tracing
237+
β”œβ”€β”€ SSE broadcast
238+
β”œβ”€β”€ Multi-channel fan-out
239+
β”œβ”€β”€ Audit logging
240+
β”œβ”€β”€ Circuit breaker
241+
└── Prometheus metrics
242+
```
243+
244+
### Side-by-Side: Permission Control
245+
246+
**cc-connect** (Go binary, TOML config):
247+
248+
```toml
249+
# cc-connect config.toml
250+
[hooks]
251+
allow_tools = ["Read", "Write", "Bash"]
252+
deny_tools = ["RMRF"]
253+
```
254+
255+
Flat allow/deny list. No per-session policies. No conditional rules. No audit trail of which tool was approved by which policy.
256+
257+
**Native Claude Code** (shell hook):
258+
259+
```json
260+
{
261+
"hooks": {
262+
"PreToolUse": [{
263+
"matcher": "Bash",
264+
"hooks": [{
265+
"type": "command",
266+
"command": "/path/to/block-rm.sh"
267+
}]
268+
}]
269+
}
270+
}
271+
```
272+
273+
Runs a shell script on every `Bash` tool call. The script must parse JSON from stdin, make a decision, and print JSON to stdout. No built-in audit, no metrics, no fan-out. Each event spawns a new process.
274+
275+
**Aegis** (HTTP + MCP + policy engine):
276+
277+
```bash
278+
# Create session with a permission profile
279+
curl -X POST http://localhost:9100/v1/sessions \
280+
-H "Authorization: Bearer $TOKEN" \
281+
-d '{
282+
"permissionProfile": {
283+
"rules": [
284+
{ "tool": "Bash", "behavior": "ask", "reason": "Shell commands need approval" },
285+
{ "tool": "Edit", "behavior": "allow" },
286+
{ "tool": "Write", "behavior": "allow", "pattern": "src/**" },
287+
{ "tool": "Write", "behavior": "deny", "pattern": "prod/**" }
288+
]
289+
}
290+
}'
291+
```
292+
293+
Per-session, per-tool, per-path rules. Decisions are audit-logged with hash chain integrity. Metrics track auto-approvals vs escalations. No shell scripts to maintain.
294+
295+
### Side-by-Side: Observability
296+
297+
**cc-connect**: Logs to stdout. No structured metrics. No tracing. No real-time event stream.
298+
299+
**OpenACP**: Telegram/Discord notifications. No OTel, no Prometheus, no SSE for external consumers.
300+
301+
**Aegis**: Every hook event generates:
302+
- OpenTelemetry span (`tool.invoke` with `sessionId`, `toolName`, `toolUseId`)
303+
- Prometheus counter (`aegis_tool_calls_total`, `aegis_auto_approvals_total`)
304+
- SSE event (real-time broadcast to dashboard and clients)
305+
- Audit log entry (hash-chained, tamper-proof, queryable via API)
306+
- Channel fan-out (Telegram + Slack + Email simultaneously)
307+
308+
### Side-by-Side: Failure Handling
309+
310+
**Native CC hooks**: If a Stop hook fails (returns `ok: false`), Claude Code retries forever. The session burns tokens in an infinite loop. No automatic protection.
311+
312+
**Aegis**: The circuit breaker detects rapid `StopFailure` events and trips automatically, returning `{ ok: true }` to break the loop. The event is emitted as `circuit_breaker` SSE for monitoring. Configurable threshold and window.
313+
314+
```bash
315+
# Trip after 5 failures in 60 seconds
316+
HOOK_CIRCUIT_BREAKER_MAX=5
317+
HOOK_CIRCUIT_BREAKER_WINDOW_MS=60000
318+
```
319+
320+
### Side-by-Side: Multi-Agent Awareness
321+
322+
**cc-connect**: Tracks multiple agent backends but has no subagent lifecycle tracking within a session.
323+
324+
**Aegis**: `SubagentStart`/`SubagentStop` events track active subagents per session. The dashboard shows live subagent counts. `TaskCreated`/`TaskCompleted` events enable pipeline progress tracking. `TeammateIdle` enables agent team coordination.
325+
326+
### Feature Matrix
327+
328+
| Capability | Aegis | cc-connect | ClaudeClaw | OpenACP |
329+
|-----------|-------|-----------|-----------|---------|
330+
| **Hook transport** | HTTP + MCP | HTTP + shell | Shell only | HTTP only |
331+
| **Authentication** | `X-Hook-Secret` (header-only mode) | Basic token | None | None |
332+
| **Permission policies** | Per-session, per-tool, per-path rules | Flat allow/deny list | Allow/block all | Allow/block all |
333+
| **Audit trail** | Hash-chained, immutable, API-queryable | None | None | None |
334+
| **Real-time SSE** | Per-session + global streams | None | None | None |
335+
| **OTel tracing** | `tool.invoke` spans with correlation | None | None | None |
336+
| **Prometheus metrics** | Per-session tool calls, auto-approvals, latency | None | None | None |
337+
| **Circuit breaker** | Automatic StopFailure protection | None | None | None |
338+
| **Multi-channel fan-out** | Dashboard + Telegram + Slack + Email | Single channel | Telegram only | Telegram + Discord |
339+
| **Subagent tracking** | `SubagentStart`/`SubagentStop` per session | No subagent events | No subagent events | No subagent events |
340+
| **Payload validation** | Zod schema on every event | Best-effort | None | None |
341+
| **Rate limiting** | Per-IP + global | None | None | None |
342+
| **Events handled** | 29 lifecycle events | Subset | 3–5 basic events | 5–8 events |
343+
| **AskUserQuestion intercept** | Yes β€” external answer with timeout | No | No | No |
344+
| **RBAC on hook endpoints** | Role-based access (admin/operator/viewer) | Token-only | None | None |
345+
346+
### Why MCP Integration Matters
131347

132-
Some Claude Code orchestration tools offer simpler hook systems based on shell commands or raw HTTP callbacks. Here's how Aegis's approach differs:
348+
Aegis's hooks don't just receive events β€” they integrate with the **Model Context Protocol** server. This creates a bidirectional relationship:
133349

134-
| Capability | Aegis (MCP + HTTP) | Shell/HTTP Only |
135-
|-----------|-------------------|-----------------|
136-
| **Authentication** | Secret-based with header-only mode | Often none or basic token |
137-
| **Permission control** | Configurable policies per tool, per session | Allow/block all |
138-
| **Audit trail** | Hash-chained, immutable, queryable | Typically none |
139-
| **Real-time observability** | SSE streams, OTel spans, Prometheus metrics | Limited or custom logging |
140-
| **Multi-channel delivery** | Dashboard + Telegram + Slack + Email + webhooks | Usually single channel |
141-
| **Circuit breaker** | Automatic detection of hook failure loops | Manual intervention |
142-
| **Payload validation** | Strict schema validation (Zod) | Best-effort or none |
143-
| **Rate limiting** | Per-IP + global limits | Often none |
144-
| **Tool-level metrics** | Per-session tool usage, token counts, latency | Aggregate or none |
350+
1. **Inbound** (CC β†’ Aegis): HTTP hooks deliver lifecycle events for monitoring, auditing, and permission decisions.
351+
2. **Outbound** (Aegis β†’ CC): MCP tools let external systems control sessions β€” send messages, approve permissions, kill sessions, inspect transcripts.
145352

146-
### Why MCP-Based Hooks Matter
353+
This means you can build **full control planes** on top of Aegis:
354+
- A dashboard that watches tool calls via SSE and approves permissions via MCP
355+
- A Telegram bot that receives session alerts and sends corrective instructions
356+
- A CI/CD pipeline that creates sessions, monitors progress, and reviews results
147357

148-
Aegis's hooks integrate with the **Model Context Protocol** server, not just HTTP endpoints. This means:
358+
Shell-only tools can only react. Aegis can **observe AND act**.
149359

150-
1. **Agent-native** β€” Claude Code interacts with Aegis via MCP tools, not just callbacks
151-
2. **Composable** β€” Other MCP hosts can use the same tools
152-
3. **Auditable** β€” Every MCP tool call is logged with parameters and results
153-
4. **RBAC-ready** β€” Per-tool role-based access control (Phase 4)
360+
---
154361

155362
## See Also
156363

157364
- [API Reference β€” Webhooks](./api-reference.md#12-webhooks) β€” full endpoint documentation
158365
- [API Reference β€” Session Hooks](./api-reference.md#session-hooks) β€” circuit breaker and truncation handling
159366
- [Observability Guide](./OBSERVABILITY.md) β€” Prometheus, Grafana, OTel setup
160367
- [Architecture β€” Channels](./architecture.md#5-notification-channels) β€” channel delivery architecture
368+
- [Competitive Threat Matrix](./competitive-threat-matrix.md) β€” strategic competitive positioning

0 commit comments

Comments
Β (0)