Skip to content

Commit 1baf1ac

Browse files
Follow up PR #265: refine chapters, diagrams, and add S20 (#283)
* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building incrementally on the previous. Key fixes across chapters: - s01-s04: agent loop, tool dispatch, permission pipeline, hooks - s05-s08: todo write, subagent, skill loading, context compact - s09-s11: memory system, system prompt assembly, error recovery - s12-s14: task graph, background tasks, cron scheduler All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS, json.dumps cache, real-state context, can_start dep protection, etc.). * feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform chapters. Each chapter inherits all previous fixes and adds one mechanism: - s15: agent teams (TeamCreate, teammate threads, shared task list) - s16: team protocols (plan approval, shutdown handshake, consume_inbox) - s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox) - s18: worktree isolation (git worktree, bind_task, cwd switching, safety) - s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache) All appendix source code references verified against CC source. Config priority corrected: claude.ai < plugin < user < project < local. * fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash - s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02) - s06-s08: todo_write validates content/status required fields (inherited from s05) - s09: extract_memories uses pre-compression snapshot instead of compacted messages - s16: submit_plan docstring clarifies protocol-only (not code-level gate) - s17-s19: match_response restores type mismatch validation (from s16) - s17-s19: claim_task deps list handles missing dep files without crashing * fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation - s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task, non-interactive/SDK defaults to TodoWrite. Fix env var name to CLAUDE_CODE_ENABLE_TASKS (not TODO_V2). - s14/s15: add _validate_cron_field with per-field range checks (minute 0-59, hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi. Replace old try/except validation that only caught exceptions. - s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree, not just create_worktree. * fix: align s16-s19 teaching tool consistency * fix pr265 chapter diagrams * Add comprehensive s20 harness chapter * Fix chapter smoke test regressions * Clarify README tutorial track transition --------- Co-authored-by: Haoran <bill-billion@outlook.com>
1 parent c354cf7 commit 1baf1ac

174 files changed

Lines changed: 35837 additions & 357 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,12 @@ cython_debug/
195195
.task_outputs/
196196
.tasks/
197197
.teams/
198+
.mailboxes/
199+
.worktrees/
200+
.scheduled_tasks.json
201+
202+
# Accidental root npm lockfile; web/package-lock.json is tracked.
203+
/package-lock.json
198204

199205
# Ruff stuff:
200206
.ruff_cache/

README-ja.md

Lines changed: 157 additions & 75 deletions
Large diffs are not rendered by default.

README-zh.md

Lines changed: 158 additions & 75 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 288 additions & 205 deletions
Large diffs are not rendered by default.

docs/zh/s01-the-agent-loop.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
一个退出条件控制整个流程。循环持续运行, 直到模型不再调用工具。
2727

28-
## 工作原理
28+
## 工作原理
2929

3030
1. 用户 prompt 作为第一条消息。
3131

docs/zh/s03-todo-write.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
| [ ] task A |
2727
| [>] task B <- doing |
2828
| [x] task C |
29-
+-----------------------+
29+
+----------- ------------+
3030
|
3131
if rounds_since_todo >= 3:
3232
inject <reminder> into tool_result

s01_agent_loop/README.en.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# s01: The Agent Loop — One Loop Is All You Need
2+
3+
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
4+
5+
`s01`[s02](../s02_tool_use/) → s03 → s04 → ... → s20
6+
> *"One loop & Bash is all you need"* — One tool + one loop = one Agent.
7+
>
8+
> **Harness Layer**: The Loop — the first bridge between the model and the real world.
9+
10+
---
11+
12+
## The Problem
13+
14+
You ask the model: "List the files in my directory and run XXX.py."
15+
16+
The model can output a bash command, but once it's done outputting, it stops — it won't execute the command on its own, and it won't keep reasoning based on the result.
17+
18+
You could run it manually, paste the output back into the chat, and let it continue. Next command comes out, you run it again, paste it back.
19+
20+
Every round-trip, you're the middle layer. Automating that is what this chapter is about.
21+
22+
---
23+
24+
## The Solution
25+
26+
![Agent Loop](images/agent-loop.en.svg)
27+
28+
A `while True` loop: keep going when the model calls a tool, stop when it doesn't. The entire process hinges on two signals:
29+
30+
| Signal | Meaning | Loop Action |
31+
|--------|---------|-------------|
32+
| `stop_reason == "tool_use"` | Model raises hand: "I need a tool" | Execute → feed result back → continue |
33+
| `stop_reason != "tool_use"` | Model says: "I'm done" | Exit loop |
34+
35+
---
36+
37+
## How It Works
38+
39+
Let's translate this process into code. Step by step:
40+
41+
**Step 1**: Start with the user's question as the first message.
42+
43+
```python
44+
messages = [{"role": "user", "content": query}]
45+
```
46+
47+
**Step 2**: Send the messages and tool definitions to the LLM.
48+
49+
```python
50+
response = client.messages.create(
51+
model=MODEL, system=SYSTEM, messages=messages,
52+
tools=TOOLS, max_tokens=8000,
53+
)
54+
```
55+
56+
**Step 3**: Append the model's response and check whether it called a tool. No tool call → done.
57+
58+
```python
59+
messages.append({"role": "assistant", "content": response.content})
60+
if response.stop_reason != "tool_use":
61+
return
62+
```
63+
64+
**Step 4**: Execute the tool the model requested and collect the results.
65+
66+
```python
67+
results = []
68+
for block in response.content:
69+
if block.type == "tool_use":
70+
output = run_bash(block.input["command"])
71+
results.append({
72+
"type": "tool_result",
73+
"tool_use_id": block.id,
74+
"content": output,
75+
})
76+
```
77+
78+
**Step 5**: Append the tool results as a new message and go back to Step 2.
79+
80+
```python
81+
messages.append({"role": "user", "content": results})
82+
```
83+
84+
Assembled into a complete function:
85+
86+
```python
87+
def agent_loop(messages):
88+
while True:
89+
response = client.messages.create(
90+
model=MODEL, system=SYSTEM, messages=messages,
91+
tools=TOOLS, max_tokens=8000,
92+
)
93+
messages.append({"role": "assistant", "content": response.content})
94+
95+
if response.stop_reason != "tool_use":
96+
return
97+
98+
results = []
99+
for block in response.content:
100+
if block.type == "tool_use":
101+
output = run_bash(block.input["command"])
102+
results.append({
103+
"type": "tool_result",
104+
"tool_use_id": block.id,
105+
"content": output,
106+
})
107+
messages.append({"role": "user", "content": results})
108+
```
109+
110+
Under 30 lines — that's the minimal runnable agent harness kernel. It's not intelligence itself, but the smallest runtime framework that lets the model keep acting. The model decides (whether to call a tool, which one), the harness executes (if called, run it, feed the result back). The next 18 chapters all add mechanisms on top of this loop. The loop itself never changes.
111+
112+
---
113+
114+
## Try It
115+
116+
> **Teaching demo notice**: The code executes shell commands generated by the model. Run it in a temporary test directory to avoid affecting your project files. s03 covers the real permission system.
117+
118+
**Setup** (first run):
119+
120+
```sh
121+
pip install -r requirements.txt
122+
cp .env.example .env
123+
# Edit .env, fill in ANTHROPIC_API_KEY and MODEL_ID
124+
```
125+
126+
**Run**:
127+
128+
```sh
129+
python s01_agent_loop/code.py
130+
```
131+
132+
Try these prompts:
133+
134+
1. `Create a file called hello.py that prints "Hello, World!"`
135+
2. `List all Python files in this directory`
136+
3. `What is the current git branch?`
137+
138+
What to watch for: When does the model call a tool (loop continues), and when does it not (loop ends)?
139+
140+
---
141+
142+
## What's Next
143+
144+
Right now the model only has bash — reading files requires `cat`, writing files requires `echo ... >`, finding files requires `find`. Ugly and error-prone.
145+
146+
→ s02 Tool Use: What happens when we give it 5 proper tools? Will the model call multiple tools at once? Will parallel tool executions step on each other?
147+
148+
<details>
149+
<summary>Dive into CC Source Code</summary>
150+
151+
> The following is based on a review of CC source code `src/query.ts` (1729 lines). The core differences are twofold: CC doesn't rely on the `stop_reason` field to decide whether to continue the loop — instead it checks whether the content contains `tool_use` blocks (because `stop_reason` is unreliable in streaming responses); CC has more exit paths and recovery strategies for production-grade protection.
152+
153+
**The 30-line `while True` from the teaching version IS the core of CC's 1729 lines.** Everything below is a protection mechanism layered on top of that core.
154+
155+
<details>
156+
<summary>1. Loop Structure Differences</summary>
157+
158+
The teaching version checks `response.stop_reason`. CC doesn't use it as the sole signal for loop continuation — in streaming responses, `stop_reason` may not have updated yet even though `tool_use` blocks are already present. CC uses a `needsFollowUp` flag: during streaming message reception (`query.ts:830-834`), it's set to `true` whenever a `tool_use` block is detected. `QueryEngine.ts` captures the real `stop_reason` from `message_delta` for other logic, but the query loop itself relies on `needsFollowUp`.
159+
160+
```typescript
161+
// query.ts:554-558
162+
// stop_reason === 'tool_use' is unreliable.
163+
// Set during streaming whenever a tool_use block arrives.
164+
let needsFollowUp = false
165+
```
166+
167+
</details>
168+
169+
<details>
170+
<summary>2. State Object — 10 Fields (Teaching Version Only Uses messages)</summary>
171+
172+
| # | Field | Purpose | Chapter |
173+
|---|-------|---------|---------|
174+
| 1 | `messages` | Message array for the current iteration | s01 |
175+
| 2 | `toolUseContext` | Tool, signal, and permission context | s02 |
176+
| 3 | `autoCompactTracking` | Compaction state tracking | s08 |
177+
| 4 | `maxOutputTokensRecoveryCount` | Token recovery attempt count (max 3) | s11 |
178+
| 5 | `hasAttemptedReactiveCompact` | Whether reactive compaction was attempted this round | s08 |
179+
| 6 | `maxOutputTokensOverride` | 8K→64K upgrade override | s11 |
180+
| 7 | `pendingToolUseSummary` | Background Haiku-generated tool use summary | s08 |
181+
| 8 | `stopHookActive` | Whether the stop hook produced a blocking error | s04 |
182+
| 9 | `turnCount` | Turn count (for maxTurns check) | s01 |
183+
| 10 | `transition` | Last continue reason | s11 |
184+
185+
> Note: `taskBudgetRemaining` (`query.ts:291`) is a loop-local variable, not on State. The source comment explicitly says "Loop-local (not on State)".
186+
187+
</details>
188+
189+
<details>
190+
<summary>3. Multiple Exit and Continue Paths</summary>
191+
192+
The teaching version has only 1 exit path (model doesn't call a tool → done). The production version has multiple exit and continue paths, covering blocking limit, prompt too long, model error, abort, hook stop, max turns, token budget continuation, reactive compact retry, and more. Each scenario has a corresponding recovery or exit strategy.
193+
194+
</details>
195+
196+
<details>
197+
<summary>4. Streaming Tool Execution and QueryEngine</summary>
198+
199+
CC's `StreamingToolExecutor` (`query.ts:561`) allows tools to begin parallel execution while the model is still generating (concurrency-safe tools run in parallel, others run exclusively). `QueryEngine.ts` adds additional protections for cost overruns, structured output validation failures, and more. The teaching version doesn't implement these — the goal is conceptual clarity, not peak performance.
200+
201+
</details>
202+
203+
**In one sentence**: The core of query.ts's 1729 lines is a 30-line `while True`. All the complex fields and exit paths are protection mechanisms. Understand the core loop first, and everything that follows unfolds naturally.
204+
205+
</details>
206+
207+
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

0 commit comments

Comments
 (0)