// query.ts:204-217
type State = {
messages: Message[]
toolUseContext: ToolUseContext
autoCompactTracking: AutoCompactTrackingState | undefined
maxOutputTokensRecoveryCount: number
hasAttemptedReactiveCompact: boolean
maxOutputTokensOverride: number | undefined
pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
stopHookActive: boolean | undefined
turnCount: number
transition: Continue | undefined
}The entire conversation history. On each continue, this array is fully replaced (not pushed to):
state = {
messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
// ...
}Why replacement instead of push? Because messagesForQuery may have already been compacted (snip, microcompact, context collapse), making it different from the original state.messages. Using push would require you to first swap the compacted messages back in, making the logic considerably more complex.
The shared context for tool execution. It contains:
options.tools— the list of available toolsoptions.mainLoopModel— the currently active model (may be swapped during fallback)abortController— the cancellation signalsetInProgressToolUseIDs— tracks tools currently executing- File caches, session state, and more
Note that this is the only field in State that is mutated within an iteration — all other fields are replaced wholesale at continue sites.
Tracks the state of automatic compaction. Contains the token usage from the most recent API call, used to determine whether autocompact should be triggered.
undefined means "no tracking data yet" — this is always undefined on the first iteration.
When the model's output is truncated (max_output_tokens), Claude Code injects a continuation prompt and retries. This counter records how many retries have occurred, up to a maximum of 3 (MAX_OUTPUT_TOKENS_RECOVERY_LIMIT).
Single-fire protection. Reactive compact is an expensive operation (it requires an additional API call to generate a summary). If a 413 error persists after the first compact, the problem is not that the context is too long — compacting again won't help. This boolean prevents an infinite compaction loop.
This field is deliberately preserved (not reset to false) at the stop hook blocking continue site, because the team once encountered a real bug:
// Comment from source (line 1293-1296):
// Preserve the reactive compact guard — if compact already ran and
// couldn't recover from prompt-too-long, retrying after a stop-hook
// blocking error will produce the same result. Resetting to false
// here caused an infinite loop: compact → still too long → error →
// stop hook blocking → compact → … burning thousands of API calls.This comment documents a real production incident. Resetting this boolean caused an infinite loop that burned through a large number of API calls.
When a max_output_tokens escalation is triggered (8k → 64k), this field is set to ESCALATED_MAX_TOKENS. The escalation is one-shot — if 64k is still not enough, the multi-round recovery path is used instead.
This is a clever async optimization. Tool-call summaries are generated by the Haiku model (~1 second), but the main loop does not wait for them — the summary is awaited at the start of the next iteration:
// The previous iteration's summary is consumed here
if (pendingToolUseSummary) {
const summary = await pendingToolUseSummary
if (summary) yield summary
}
// This iteration's summary is kicked off here (asynchronously)
state = {
pendingToolUseSummary: generateToolUseSummary(assistantMessages),
// ...
}This means summary generation and the model call overlap — while the main model is handling the next iteration, Haiku is generating the summary for the previous one.
When a stop hook returns blocking errors, this field is set to true to prevent the same hook from being executed again.
The number of loop iterations. Used to enforce the maxTurns limit — when turnCount exceeds maxTurns, the loop is forcibly terminated.
The most subtle field. It is a tagged union type:
type Continue =
| { reason: 'collapse_drain_retry'; committed: number }
| { reason: 'reactive_compact_retry' }
| { reason: 'max_output_tokens_escalate' }
| { reason: 'max_output_tokens_recovery'; attempt: number }
| { reason: 'stop_hook_blocking' }
| { reason: 'token_budget_continuation' }
| { reason: 'next_turn' }It serves three purposes:
- Loop protection —
state.transition?.reason !== 'collapse_drain_retry'prevents duplicate drain attempts - Test assertions — tests can inspect
transition.reasonto verify which recovery path was taken - Metadata propagation — some continue reasons carry extra information (e.g., the
committedcount)
// Trigger condition: FallbackTriggeredError
catch (error) {
if (error instanceof FallbackTriggeredError) {
// 1. Emit tombstone messages to retract already-streamed content
yield* yieldTombstoneMessages(assistantMessages)
// 2. Clear intermediate state
assistantMessages.length = 0
toolResults.length = 0
// 3. Discard pending results in the StreamingToolExecutor
streamingToolExecutor.discard()
// 4. Switch models
toolUseContext.options.mainLoopModel = fallbackModel
// 5. Strip thinking signatures (the fallback model may not support them)
messagesForQuery = stripSignatureBlocks(messagesForQuery)
// 6. Notify the user
yield createSystemMessage('Switched to fallback model...')
continue // Retry with the original messagesForQuery
}
}This is the only continue site that does not create a new State object. Model fallback happens inside the catch block of an API call, at which point state has not yet been modified. A bare continue returns to the top of the loop with the existing state, where the retry happens with the new model.
Tombstone handling is the tricky part here — messages that have already been yielded cannot be "taken back"; a tombstone is the only way to tell the UI "forget these messages."
if (feature('CONTEXT_COLLAPSE') && contextCollapse &&
state.transition?.reason !== 'collapse_drain_retry') {
const drained = contextCollapse.recoverFromOverflow(messagesForQuery, querySource)
if (drained.committed > 0) {
state = {
messages: drained.messages,
// ...
transition: { reason: 'collapse_drain_retry', committed: drained.committed },
}
continue
}
}The key guard: state.transition?.reason !== 'collapse_drain_retry' — if the previous iteration already performed a drain, no further attempt is made. This prevents the "drain → still 413 → drain again → ..." infinite loop.
const compacted = await reactiveCompact.tryReactiveCompact({
hasAttempted: hasAttemptedReactiveCompact,
messages: messagesForQuery,
// ...
})
if (compacted) {
const postCompactMessages = buildPostCompactMessages(compacted)
state = {
messages: postCompactMessages,
hasAttemptedReactiveCompact: true, // Single-fire protection
autoCompactTracking: undefined, // Reset tracking after compaction
// ...
}
continue
}Note autoCompactTracking: undefined — after compaction the token count changes dramatically, making any previous tracking data meaningless.
if (capEnabled && maxOutputTokensOverride === undefined &&
!process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS) {
state = {
messages: messagesForQuery, // Retry with the same input
maxOutputTokensOverride: ESCALATED_MAX_TOKENS, // 8k → 64k
transition: { reason: 'max_output_tokens_escalate' },
// ...
}
continue
}Silent escalation — no meta message is injected, so the user sees nothing. The output limit is simply raised from 8k to 64k and the retry uses exactly the same input.
const recoveryMessage = createUserMessage({
content: 'Output token limit hit. Resume directly — no apology, ' +
'no recap of what you were doing. Pick up mid-thought...',
isMeta: true,
})
state = {
messages: [...messagesForQuery, ...assistantMessages, recoveryMessage],
maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
transition: { reason: 'max_output_tokens_recovery', attempt: count + 1 },
// ...
}
continueNote that ...assistantMessages is retained — the truncated response remains in the message history. The model will see its own half-finished response followed by the recovery message, allowing it to know exactly where to resume.
isMeta: true marks this message as a "system meta-message"; the UI typically does not display it directly to the user.
if (stopHookResult.blockingErrors.length > 0) {
state = {
messages: [...messagesForQuery, ...assistantMessages, ...blockingErrors],
stopHookActive: true, // Prevent the hook from running again
hasAttemptedReactiveCompact, // Preserved! Not reset!
maxOutputTokensRecoveryCount: 0, // Reset
// ...
}
continue
}hasAttemptedReactiveCompact is preserved while maxOutputTokensRecoveryCount is reset — two seemingly contradictory decisions driven by different reasoning:
hasAttemptedReactiveCompactpreserved: prevents the compact infinite loop (the production incident above)maxOutputTokensRecoveryCountreset: after a hook error the model will produce a new response, and that new response deserves the full 3 recovery attempts
if (decision.action === 'continue') {
state = {
messages: [...messagesForQuery, ...assistantMessages,
createUserMessage({ content: decision.nudgeMessage, isMeta: true })],
maxOutputTokensRecoveryCount: 0,
hasAttemptedReactiveCompact: false, // Reset!
transition: { reason: 'token_budget_continuation' },
// ...
}
continue
}This is the only continue site that resets both hasAttemptedReactiveCompact and maxOutputTokensRecoveryCount. The reason: token budget continuation is a "normal continuation" — the model successfully finished its current work and the budget simply allows it to do more. All recovery mechanisms should be reset here, because what follows is a fresh interaction.
// query.ts:293-295
const config = buildQueryConfig()buildQueryConfig() takes a one-time snapshot of all immutable environment configuration at loop entry:
- Statsig feature flags
- Environment variables
- Session configuration
Why snapshot once instead of re-reading on every iteration?
- Consistency — the entire loop execution sees the same configuration, preventing inconsistent behavior from mid-run config changes
- Performance — feature flag lookups may involve network requests or disk I/O; doing it once avoids repeated overhead
- Debuggability — the entire
configobject can be logged, making it easy to know exactly what configuration was in effect for a given turn
Note that the source contains a comment explicitly explaining why feature() calls are excluded from config — because feature() is a compile-time constant (dead-code eliminated by the Bun bundler) and does not need a runtime snapshot.
// query.ts:282-291
let taskBudgetRemaining: number | undefined = undefinedThis variable is declared outside the loop, rather than being placed in State. The source comment explains why:
"Loop-local (not on State) to avoid touching the 7 continue sites."
If it were in State, every continue site would need to forward this value. In practice, it only needs to be updated when compaction occurs — in the vast majority of continue sites it is unchanged.
Placing it outside the loop makes it a cross-iteration closure variable, modified only inside the compaction logic and automatically inherited elsewhere. This is a pragmatic engineering trade-off — reducing boilerplate across 7 continue sites at the cost of slightly less "pure" state management.
// query.ts:301-304
using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
state.messages,
state.toolUseContext,
)The using keyword (TC39 Stage 3 proposal, TypeScript 5.2+) implements the RAII (Resource Acquisition Is Initialization) pattern:
- On entering the loop, memory prefetch starts (asynchronously queries relevant memories)
- On exiting the loop, it is automatically disposed — whether via a normal
return, athrow, or the generator being.return()ed
startRelevantMemoryPrefetch only starts the query on the first iteration — the prompt does not change between iterations, so repeat queries are pointless. Subsequent iterations check settledAt and use the already-cached result directly.
Claude Code's main loop embodies several key design principles:
- Wholesale replacement over field-by-field mutation — the TypeScript type system ensures nothing is missed
- Single-fire protection — boolean guards prevent infinite loops
- Comments as incident records — source comments are carriers of team knowledge
- Pragmatic state management — what belongs in
Stategoes inState; what belongs in a closure goes in a closure - Async overlap — summary generation overlaps with the model call; memory prefetch runs in parallel with the loop
This is not an "elegant" loop — it spans 1,488 lines, has 7 continue sites, and 10 State fields. But it is a correct loop — every field, every continue, and every comment has an engineering reason for existing.