s01 → s02 → s03 → s04 → s05 → s06 → s07 → ... → s20
"An agent without a plan goes wherever the wind blows" — List the steps first, then execute. Complex tasks are less likely to miss steps.
Harness Layer: Planning — Let the Agent think before it acts.
Give the Agent a complex task: "Rename all Python files to snake_case, run tests, and fix failures."
The Agent starts working, renames 3 files, runs a test, finds 2 failures, starts fixing. While fixing, it forgets the original goal was "rename to snake_case", the test failures have consumed all its attention.
The longer the conversation, the worse it gets: tool results keep filling the context, diluting the system prompt's influence. A 10-step refactoring: after steps 1-3, the Agent starts improvising because steps 4-10 have been pushed out of its attention.
The minimal hook structure from the previous chapter is preserved, focusing on the new todo_write tool and reminder mechanism. todo_write does no actual work, can't read files or run commands, it simply lets the Agent organize its thoughts before diving in.
The dispatch mechanism is unchanged; the new tool is still routed through TOOL_HANDLERS[block.name]. However, to demonstrate the todo reminder, a counter was added to the loop: after 3 consecutive rounds without calling todo_write, a reminder is injected.
The todo_write tool accepts a list with statuses, keeps it in the current process memory, and displays progress in the terminal:
CURRENT_TODOS: list[dict] = []
def run_todo_write(todos: list) -> str:
global CURRENT_TODOS
CURRENT_TODOS = todos
lines = ["\n## Current Tasks"]
for t in CURRENT_TODOS:
icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]]
lines.append(f" [{icon}] {t['content']}")
print("\n".join(lines))
return f"Updated {len(CURRENT_TODOS)} tasks"The tool definition joins the other 5 in the dispatch map:
TOOLS = [
{"name": "bash", ...},
{"name": "read_file", ...},
{"name": "write_file", ...},
{"name": "edit_file", ...},
{"name": "glob", ...},
# s05: new entry
{"name": "todo_write", "description": "Create and manage a task list ...",
"input_schema": {
"type": "object",
"properties": {
"todos": {
"type": "array",
"items": {
"type": "object",
"properties": {
"content": {"type": "string"},
"status": {"type": "string", "enum": ["pending", "in_progress", "completed"]},
},
},
},
},
},
},
]
TOOL_HANDLERS["todo_write"] = run_todo_writeNag reminder, when the model hasn't called todo_write for 3 consecutive rounds, a reminder is automatically injected (teaching mechanism; CC source has no fixed round-count logic):
if rounds_since_todo >= 3 and messages:
messages.append({
"role": "user",
"content": "<reminder>Update your todos.</reminder>",
})
rounds_since_todo = 0Typical flow when the Agent receives a task: first call todo_write to list all steps (all pending) → pick one step, set it to in_progress → complete it, set to completed → look at the next pending → continue. After 3 rounds without todo_write, the loop appends a reminder before the next LLM call.
Key insight: todo_write doesn't give the Agent any additional execution capability. What it adds is planning capability.
| Component | Before (s04) | After (s05) |
|---|---|---|
| Tool count | 5 (bash, read, write, edit, glob) | 6 (+todo_write) |
| Planning | None | Stateful TODO list + nag reminder |
| SYSTEM prompt | Generic prompt | Added "plan before executing" guidance |
| Loop | Unchanged | Dispatch unchanged, added rounds_since_todo counter and reminder injection |
cd learn-claude-code
python s05_todo_write/code.pyTry these prompts:
Refactor s05_todo_write/example/hello.py: add type hints, docstrings, and a main guard(should list 3 steps first, then execute)Create a Python package under s05_todo_write/example/demo_pkg with __init__.py, utils.py, and tests/test_utils.pyReview Python files under s05_todo_write/example and fix any style issues
What to watch for: Was the first tool call todo_write? How many TODO steps were listed? Did statuses move from pending to in_progress / completed during execution?
The Agent can plan now. But if a task is too large, say "refactor the entire auth module", a TODO list alone isn't enough. That task is itself a collection of dozens of subtasks that would drown in a single conversation's context.
→ s06 Subagent: Break large tasks into subtasks, each handled by an independent Agent with its own clean context, no cross-contamination.
Dive into CC Source Code
CC has two task systems coexisting (tasks.ts:133-139):
- TodoWrite (V1): A simple list tool, data maintained in memory AppState (
TodoWriteTool.ts:65-103). The teaching version also keeps it in process memory and clears it on exit. - Task System (V2 = s12): File-persisted, dependency graph, concurrency locks, ownership.
The switch is controlled by isTodoV2Enabled(). In the current source: V2 is enabled by default in interactive sessions, V1 in non-interactive (SDK) sessions; setting CLAUDE_CODE_ENABLE_TASKS forces V2 regardless. Note the source comment "Force-enable tasks in non-interactive mode" describes the env var path's purpose, not the default branch's return semantics.
The teaching version omits the activeForm field from the real source (utils/todo/types.ts:8-15). CC uses it for the UI spinner to show "what's being done"; the teaching version only has terminal output and doesn't need this field.
The teaching version's nag reminder (3 rounds without update triggers injection) is an educational mechanism. The CC source has no fixed "3 rounds" logic; the closest is TodoWriteTool.ts:72-107 which appends a verification nudge when 3+ todos are all completed without a verification item.
Core increments of the Task System over TodoWrite:
- File persistence (Claude config directory
tasks/{taskListId}/{taskId}.json) instead of in-memory list blockedBydependency graph instead of flat listproper-lockfileconcurrency safety instead of no locking- Four separate tools (Create/Get/Update/List) instead of one
- TaskCreated / TaskCompleted hooks (
TaskCreateTool.ts:80-129,TaskUpdateTool.ts:231-260) for external system integration