Skip to content

Latest commit

 

History

History
111 lines (91 loc) · 5.53 KB

File metadata and controls

111 lines (91 loc) · 5.53 KB

Query Pipeline

Overview

The query pipeline is the core of Claude Code — it handles every user message from input to final response. It's implemented as an async generator state machine in query.ts, wrapped by a stateful per-conversation QueryEngine.ts.

Architecture

Entry Point: QueryEngine

  • QueryEngine is the stateful wrapper for a conversation session
  • submitMessage() is the main entry — takes user text, assembles context, calls query()
  • Manages message history persistence, session storage, transcript recording
  • Handles file state caching and commit attribution

Core Loop: query()

The query function is an async generator that yields events. Each iteration is one "turn":

┌─────────────────────────────────┐
│        Setup Phase              │
│  - Assemble system prompt       │
│  - Initialize tool context      │
│  - Prefetch skill discovery     │
└──────────────┬──────────────────┘
               v
┌─────────────────────────────────┐
│        API Call Phase           │◄──────────────┐
│  - Normalize messages           │               │
│  - callModel() [streaming]      │               │
│  - Yield StreamEvents           │               │
│  - Accumulate assistant message │               │
└──────────────┬──────────────────┘               │
               v                                  │
        ┌──────────────┐                          │
        │ Has tool use? │──── No ──► Return        │
        └──────┬───────┘                          │
               │ Yes                              │
               v                                  │
┌─────────────────────────────────┐               │
│     Tool Execution Phase        │               │
│  - Partition: read-only vs write│               │
│  - Permission check per tool    │               │
│  - Execute (concurrent or serial│)              │
│  - Merge results into history   │               │
└──────────────┬──────────────────┘               │
               │                                  │
               └──────────────────────────────────┘

Phase 1: Setup (~280 lines of logic)

  • System prompt assembly from multiple sources: default sections + memory prompt + appended context
  • Tool permission context initialization
  • Query state initialization: message history, tool context, budget trackers
  • Skill discovery prefetch kicked off asynchronously

Phase 2: API Call

  • Messages normalized for API format (compact boundary handling, tool use summaries)
  • Streaming call via callModel() dependency injection
  • Each StreamEvent yielded to caller for real-time terminal rendering
  • toolUseContext updated as response accumulates

Phase 3: Tool Execution

  • Tools partitioned: read-only tools run concurrently, write tools run serially
  • Permission check per tool via canUseTool() — may block on user prompt
  • Tool invocation via toolOrchestration.runTools()
  • Results merged into message history for next turn

Phase 4: Decision Point

After each assistant message:

  • Has tool use? → Loop back (continue turn)
  • Stop reason = end_turn? → Evaluate stop hooks, return or resume
  • Token budget exceeded? → Trigger compaction, re-enter loop
  • No more work? → Return terminal event

Error Recovery

The pipeline has built-in recovery for common failure modes:

Error Recovery Strategy Max Retries
max_output_tokens Resume with "pick up mid-thought" message 3
prompt_too_long Reactive compaction (summarize old messages), retry 1
Rate limit Yield error event, return (terminal) 0
Auth error Yield error event, return (terminal) 0

Dependency Injection

The query loop accepts a QueryDeps object for testability:

  • callModel() — Claude API streaming (pluggable for testing)
  • parseAndValidateToolInputs() — Zod-based input validation
  • Compact/collapse modules — feature-gated, lazy-required

Streaming Architecture

The entire pipeline is built on async generators:

  • query() yields StreamEvent objects
  • Caller (QueryEngine → UI) consumes events for real-time rendering
  • Permission prompts pause the stream — user responds — stream resumes
  • This enables non-blocking, interactive recovery without callbacks

Auto-Compaction

When the conversation grows too long:

  1. Token warning threshold triggers compaction check
  2. Old messages summarized via a separate LLM call
  3. Summarized history replaces original messages
  4. Query loop re-enters with compressed context
  5. Tracking state prevents infinite compaction loops

Key Design Pattern

Generator-Based State Machine: Each continue statement in the loop is a state transition. The generator maintains its stack frame across yields, so the entire recovery/retry/compaction logic lives in a single function without callbacks or separate state objects. This is the most distinctive architectural choice in the codebase.