The query pipeline is the core of Claude Code — it handles every user message from input to final response. It's implemented as an async generator state machine in query.ts, wrapped by a stateful per-conversation QueryEngine.ts.
QueryEngineis the stateful wrapper for a conversation sessionsubmitMessage()is the main entry — takes user text, assembles context, calls query()- Manages message history persistence, session storage, transcript recording
- Handles file state caching and commit attribution
The query function is an async generator that yields events. Each iteration is one "turn":
┌─────────────────────────────────┐
│ Setup Phase │
│ - Assemble system prompt │
│ - Initialize tool context │
│ - Prefetch skill discovery │
└──────────────┬──────────────────┘
v
┌─────────────────────────────────┐
│ API Call Phase │◄──────────────┐
│ - Normalize messages │ │
│ - callModel() [streaming] │ │
│ - Yield StreamEvents │ │
│ - Accumulate assistant message │ │
└──────────────┬──────────────────┘ │
v │
┌──────────────┐ │
│ Has tool use? │──── No ──► Return │
└──────┬───────┘ │
│ Yes │
v │
┌─────────────────────────────────┐ │
│ Tool Execution Phase │ │
│ - Partition: read-only vs write│ │
│ - Permission check per tool │ │
│ - Execute (concurrent or serial│) │
│ - Merge results into history │ │
└──────────────┬──────────────────┘ │
│ │
└──────────────────────────────────┘
- System prompt assembly from multiple sources: default sections + memory prompt + appended context
- Tool permission context initialization
- Query state initialization: message history, tool context, budget trackers
- Skill discovery prefetch kicked off asynchronously
- Messages normalized for API format (compact boundary handling, tool use summaries)
- Streaming call via
callModel()dependency injection - Each
StreamEventyielded to caller for real-time terminal rendering toolUseContextupdated as response accumulates
- Tools partitioned: read-only tools run concurrently, write tools run serially
- Permission check per tool via
canUseTool()— may block on user prompt - Tool invocation via
toolOrchestration.runTools() - Results merged into message history for next turn
After each assistant message:
- Has tool use? → Loop back (continue turn)
- Stop reason = end_turn? → Evaluate stop hooks, return or resume
- Token budget exceeded? → Trigger compaction, re-enter loop
- No more work? → Return terminal event
The pipeline has built-in recovery for common failure modes:
| Error | Recovery Strategy | Max Retries |
|---|---|---|
max_output_tokens |
Resume with "pick up mid-thought" message | 3 |
prompt_too_long |
Reactive compaction (summarize old messages), retry | 1 |
| Rate limit | Yield error event, return (terminal) | 0 |
| Auth error | Yield error event, return (terminal) | 0 |
The query loop accepts a QueryDeps object for testability:
callModel()— Claude API streaming (pluggable for testing)parseAndValidateToolInputs()— Zod-based input validation- Compact/collapse modules — feature-gated, lazy-required
The entire pipeline is built on async generators:
query()yieldsStreamEventobjects- Caller (QueryEngine → UI) consumes events for real-time rendering
- Permission prompts pause the stream — user responds — stream resumes
- This enables non-blocking, interactive recovery without callbacks
When the conversation grows too long:
- Token warning threshold triggers compaction check
- Old messages summarized via a separate LLM call
- Summarized history replaces original messages
- Query loop re-enters with compressed context
- Tracking state prevents infinite compaction loops
Generator-Based State Machine: Each continue statement in the loop is a state transition. The generator maintains its stack frame across yields, so the entire recovery/retry/compaction logic lives in a single function without callbacks or separate state objects. This is the most distinctive architectural choice in the codebase.