Skip to content

Latest commit

 

History

History
353 lines (259 loc) · 14.7 KB

File metadata and controls

353 lines (259 loc) · 14.7 KB

02 - Deep Analysis of the Main Loop: The Heart of the Agent


1. The State Type: The Story Behind Each Field

// query.ts:204-217
type State = {
  messages: Message[]
  toolUseContext: ToolUseContext
  autoCompactTracking: AutoCompactTrackingState | undefined
  maxOutputTokensRecoveryCount: number
  hasAttemptedReactiveCompact: boolean
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
  stopHookActive: boolean | undefined
  turnCount: number
  transition: Continue | undefined
}

1.1 messages: Message[]

The entire conversation history. On each continue, this array is fully replaced (not pushed to):

state = {
  messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
  // ...
}

Why replacement instead of push? Because messagesForQuery may have already been compacted (snip, microcompact, context collapse), making it different from the original state.messages. Using push would require you to first swap the compacted messages back in, making the logic considerably more complex.

1.2 toolUseContext: ToolUseContext

The shared context for tool execution. It contains:

  • options.tools — the list of available tools
  • options.mainLoopModel — the currently active model (may be swapped during fallback)
  • abortController — the cancellation signal
  • setInProgressToolUseIDs — tracks tools currently executing
  • File caches, session state, and more

Note that this is the only field in State that is mutated within an iteration — all other fields are replaced wholesale at continue sites.

1.3 autoCompactTracking: AutoCompactTrackingState | undefined

Tracks the state of automatic compaction. Contains the token usage from the most recent API call, used to determine whether autocompact should be triggered.

undefined means "no tracking data yet" — this is always undefined on the first iteration.

1.4 maxOutputTokensRecoveryCount: number

When the model's output is truncated (max_output_tokens), Claude Code injects a continuation prompt and retries. This counter records how many retries have occurred, up to a maximum of 3 (MAX_OUTPUT_TOKENS_RECOVERY_LIMIT).

1.5 hasAttemptedReactiveCompact: boolean

Single-fire protection. Reactive compact is an expensive operation (it requires an additional API call to generate a summary). If a 413 error persists after the first compact, the problem is not that the context is too long — compacting again won't help. This boolean prevents an infinite compaction loop.

This field is deliberately preserved (not reset to false) at the stop hook blocking continue site, because the team once encountered a real bug:

// Comment from source (line 1293-1296):
// Preserve the reactive compact guard — if compact already ran and
// couldn't recover from prompt-too-long, retrying after a stop-hook
// blocking error will produce the same result. Resetting to false
// here caused an infinite loop: compact → still too long → error →
// stop hook blocking → compact → … burning thousands of API calls.

This comment documents a real production incident. Resetting this boolean caused an infinite loop that burned through a large number of API calls.

1.6 maxOutputTokensOverride: number | undefined

When a max_output_tokens escalation is triggered (8k → 64k), this field is set to ESCALATED_MAX_TOKENS. The escalation is one-shot — if 64k is still not enough, the multi-round recovery path is used instead.

1.7 pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined

This is a clever async optimization. Tool-call summaries are generated by the Haiku model (~1 second), but the main loop does not wait for them — the summary is awaited at the start of the next iteration:

// The previous iteration's summary is consumed here
if (pendingToolUseSummary) {
  const summary = await pendingToolUseSummary
  if (summary) yield summary
}

// This iteration's summary is kicked off here (asynchronously)
state = {
  pendingToolUseSummary: generateToolUseSummary(assistantMessages),
  // ...
}

This means summary generation and the model call overlap — while the main model is handling the next iteration, Haiku is generating the summary for the previous one.

1.8 stopHookActive: boolean | undefined

When a stop hook returns blocking errors, this field is set to true to prevent the same hook from being executed again.

1.9 turnCount: number

The number of loop iterations. Used to enforce the maxTurns limit — when turnCount exceeds maxTurns, the loop is forcibly terminated.

1.10 transition: Continue | undefined

The most subtle field. It is a tagged union type:

type Continue =
  | { reason: 'collapse_drain_retry'; committed: number }
  | { reason: 'reactive_compact_retry' }
  | { reason: 'max_output_tokens_escalate' }
  | { reason: 'max_output_tokens_recovery'; attempt: number }
  | { reason: 'stop_hook_blocking' }
  | { reason: 'token_budget_continuation' }
  | { reason: 'next_turn' }

It serves three purposes:

  1. Loop protectionstate.transition?.reason !== 'collapse_drain_retry' prevents duplicate drain attempts
  2. Test assertions — tests can inspect transition.reason to verify which recovery path was taken
  3. Metadata propagation — some continue reasons carry extra information (e.g., the committed count)

2. Complete Analysis of the Seven Continue Sites

Site 1: Model Fallback (Line 950)

// Trigger condition: FallbackTriggeredError
catch (error) {
  if (error instanceof FallbackTriggeredError) {
    // 1. Emit tombstone messages to retract already-streamed content
    yield* yieldTombstoneMessages(assistantMessages)
    
    // 2. Clear intermediate state
    assistantMessages.length = 0
    toolResults.length = 0
    
    // 3. Discard pending results in the StreamingToolExecutor
    streamingToolExecutor.discard()
    
    // 4. Switch models
    toolUseContext.options.mainLoopModel = fallbackModel
    
    // 5. Strip thinking signatures (the fallback model may not support them)
    messagesForQuery = stripSignatureBlocks(messagesForQuery)
    
    // 6. Notify the user
    yield createSystemMessage('Switched to fallback model...')
    
    continue  // Retry with the original messagesForQuery
  }
}

This is the only continue site that does not create a new State object. Model fallback happens inside the catch block of an API call, at which point state has not yet been modified. A bare continue returns to the top of the loop with the existing state, where the retry happens with the new model.

Tombstone handling is the tricky part here — messages that have already been yielded cannot be "taken back"; a tombstone is the only way to tell the UI "forget these messages."

Site 2: Context Collapse Drain (Line 1115)

if (feature('CONTEXT_COLLAPSE') && contextCollapse &&
    state.transition?.reason !== 'collapse_drain_retry') {
  const drained = contextCollapse.recoverFromOverflow(messagesForQuery, querySource)
  if (drained.committed > 0) {
    state = {
      messages: drained.messages,
      // ...
      transition: { reason: 'collapse_drain_retry', committed: drained.committed },
    }
    continue
  }
}

The key guard: state.transition?.reason !== 'collapse_drain_retry' — if the previous iteration already performed a drain, no further attempt is made. This prevents the "drain → still 413 → drain again → ..." infinite loop.

Site 3: Reactive Compaction (Line 1165)

const compacted = await reactiveCompact.tryReactiveCompact({
  hasAttempted: hasAttemptedReactiveCompact,
  messages: messagesForQuery,
  // ...
})
if (compacted) {
  const postCompactMessages = buildPostCompactMessages(compacted)
  state = {
    messages: postCompactMessages,
    hasAttemptedReactiveCompact: true,  // Single-fire protection
    autoCompactTracking: undefined,     // Reset tracking after compaction
    // ...
  }
  continue
}

Note autoCompactTracking: undefined — after compaction the token count changes dramatically, making any previous tracking data meaningless.

Site 4: Max Output Tokens Escalation (Line 1220)

if (capEnabled && maxOutputTokensOverride === undefined &&
    !process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS) {
  state = {
    messages: messagesForQuery,  // Retry with the same input
    maxOutputTokensOverride: ESCALATED_MAX_TOKENS,  // 8k → 64k
    transition: { reason: 'max_output_tokens_escalate' },
    // ...
  }
  continue
}

Silent escalation — no meta message is injected, so the user sees nothing. The output limit is simply raised from 8k to 64k and the retry uses exactly the same input.

Site 5: Max Output Tokens Recovery (Line 1251)

const recoveryMessage = createUserMessage({
  content: 'Output token limit hit. Resume directly — no apology, ' +
           'no recap of what you were doing. Pick up mid-thought...',
  isMeta: true,
})

state = {
  messages: [...messagesForQuery, ...assistantMessages, recoveryMessage],
  maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
  transition: { reason: 'max_output_tokens_recovery', attempt: count + 1 },
  // ...
}
continue

Note that ...assistantMessages is retained — the truncated response remains in the message history. The model will see its own half-finished response followed by the recovery message, allowing it to know exactly where to resume.

isMeta: true marks this message as a "system meta-message"; the UI typically does not display it directly to the user.

Site 6: Stop Hook Blocking (Line 1305)

if (stopHookResult.blockingErrors.length > 0) {
  state = {
    messages: [...messagesForQuery, ...assistantMessages, ...blockingErrors],
    stopHookActive: true,                     // Prevent the hook from running again
    hasAttemptedReactiveCompact,              // Preserved! Not reset!
    maxOutputTokensRecoveryCount: 0,          // Reset
    // ...
  }
  continue
}

hasAttemptedReactiveCompact is preserved while maxOutputTokensRecoveryCount is reset — two seemingly contradictory decisions driven by different reasoning:

  • hasAttemptedReactiveCompact preserved: prevents the compact infinite loop (the production incident above)
  • maxOutputTokensRecoveryCount reset: after a hook error the model will produce a new response, and that new response deserves the full 3 recovery attempts

Site 7: Token Budget Continuation (Line 1340)

if (decision.action === 'continue') {
  state = {
    messages: [...messagesForQuery, ...assistantMessages,
               createUserMessage({ content: decision.nudgeMessage, isMeta: true })],
    maxOutputTokensRecoveryCount: 0,
    hasAttemptedReactiveCompact: false,   // Reset!
    transition: { reason: 'token_budget_continuation' },
    // ...
  }
  continue
}

This is the only continue site that resets both hasAttemptedReactiveCompact and maxOutputTokensRecoveryCount. The reason: token budget continuation is a "normal continuation" — the model successfully finished its current work and the budget simply allows it to do more. All recovery mechanisms should be reset here, because what follows is a fresh interaction.


3. Loop Entry: The Immutable Configuration Snapshot

// query.ts:293-295
const config = buildQueryConfig()

buildQueryConfig() takes a one-time snapshot of all immutable environment configuration at loop entry:

  • Statsig feature flags
  • Environment variables
  • Session configuration

Why snapshot once instead of re-reading on every iteration?

  1. Consistency — the entire loop execution sees the same configuration, preventing inconsistent behavior from mid-run config changes
  2. Performance — feature flag lookups may involve network requests or disk I/O; doing it once avoids repeated overhead
  3. Debuggability — the entire config object can be logged, making it easy to know exactly what configuration was in effect for a given turn

Note that the source contains a comment explicitly explaining why feature() calls are excluded from config — because feature() is a compile-time constant (dead-code eliminated by the Bun bundler) and does not need a runtime snapshot.


4. taskBudgetRemaining: Budget Tracking Across Compaction Boundaries

// query.ts:282-291
let taskBudgetRemaining: number | undefined = undefined

This variable is declared outside the loop, rather than being placed in State. The source comment explains why:

"Loop-local (not on State) to avoid touching the 7 continue sites."

If it were in State, every continue site would need to forward this value. In practice, it only needs to be updated when compaction occurs — in the vast majority of continue sites it is unchanged.

Placing it outside the loop makes it a cross-iteration closure variable, modified only inside the compaction logic and automatically inherited elsewhere. This is a pragmatic engineering trade-off — reducing boilerplate across 7 continue sites at the cost of slightly less "pure" state management.


5. Memory Prefetch: RAII-Style Resource Management

// query.ts:301-304
using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
  state.messages,
  state.toolUseContext,
)

The using keyword (TC39 Stage 3 proposal, TypeScript 5.2+) implements the RAII (Resource Acquisition Is Initialization) pattern:

  1. On entering the loop, memory prefetch starts (asynchronously queries relevant memories)
  2. On exiting the loop, it is automatically disposed — whether via a normal return, a throw, or the generator being .return()ed

startRelevantMemoryPrefetch only starts the query on the first iteration — the prompt does not change between iterations, so repeat queries are pointless. Subsequent iterations check settledAt and use the already-cached result directly.


6. Summary: The Design Philosophy of the Main Loop

Claude Code's main loop embodies several key design principles:

  1. Wholesale replacement over field-by-field mutation — the TypeScript type system ensures nothing is missed
  2. Single-fire protection — boolean guards prevent infinite loops
  3. Comments as incident records — source comments are carriers of team knowledge
  4. Pragmatic state management — what belongs in State goes in State; what belongs in a closure goes in a closure
  5. Async overlap — summary generation overlaps with the model call; memory prefetch runs in parallel with the loop

This is not an "elegant" loop — it spans 1,488 lines, has 7 continue sites, and 10 State fields. But it is a correct loop — every field, every continue, and every comment has an engineering reason for existing.