|
| 1 | +# Context compaction: infinite conversations |
| 2 | + |
| 3 | +Our agent has come a long way. It runs commands, reads and writes files, tracks its own work, delegates to subagents, and loads skills on demand — seven tools, one loop. But every one of those capabilities adds to the same growing resource: the messages array. A single `read_file` on a 1,000-line source file costs roughly 4,000 tokens. Load a skill body, and that's another 2,000. After reading 30 files and running 20 bash commands across a long session, the context pushes past 100,000 tokens. At that point, the agent either hits the API's context window limit and errors out, or — more subtly — the model's response quality degrades as the relevant information gets buried in a sea of stale tool results. |
| 4 | + |
| 5 | +This is the threshold that separates a demo from a useful tool. Everything we've built so far assumes the context has room. Once it doesn't, the agent has a hard ceiling on how much work it can do in a single session. That's where context compaction comes in: a three-layer compression strategy that progressively shrinks the messages array — quietly trimming old results, automatically summarizing when a threshold is crossed, and letting the model request compression explicitly. With these three layers working together, the agent can run indefinitely. |
| 6 | + |
| 7 | +In this guide, let's build `ContextCompactor` — the type that implements all three layers — and wire it into the agent loop. This is the beginning of Act III in our series: the agent now needs to manage its own memory. |
| 8 | + |
| 9 | +_The complete source code for this stage is available at the [`06-context-compaction`](https://github.com/ivan-magda/swift-claude-code/tree/06-context-compaction/Sources) tag on GitHub. Code blocks below show key excerpts._ |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +### Three layers, three strategies |
| 14 | + |
| 15 | +The compression strategy works in layers, each more aggressive than the last. Layer 1 — **micro-compact** — runs silently before every API call. It scans the messages array for old tool results (anything beyond the three most recent) and replaces their content with a short placeholder like `"[Previous: used read_file]"`. The model still sees that a tool was called and what kind it was, but the actual output — the 500-line file, the verbose bash output — is gone. This is the quiet housekeeping layer: no API call required, no information loss that the model would typically need, and it runs every single turn. |
| 16 | + |
| 17 | +Layer 2 — **auto-compact** — triggers when the estimated token count crosses a threshold (50,000 by default). This is the dramatic one: the agent saves the entire conversation transcript to disk as a JSONL file, then asks the LLM itself to summarize the conversation. The summary replaces the entire messages array — every prior turn collapses into two messages: a user message containing the compressed summary and an assistant acknowledgment. The conversation continues from there with a clean slate and full context of what happened. |
| 18 | + |
| 19 | +Layer 3 — the **compact tool** — is the same summarization as layer 2, but triggered deliberately. The model calls `compact` when it decides compression would help, optionally specifying a `focus` parameter to guide what the summary should preserve. It's the difference between automatic garbage collection and an explicit `free()` — sometimes the model knows best when to compress. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +### The ContextCompactor type |
| 24 | + |
| 25 | +Let's start with the type that owns all three layers. `ContextCompactor` holds two configuration values — the path where transcripts are saved and the token threshold that triggers auto-compaction — and exposes methods for each layer: |
| 26 | + |
| 27 | +```swift |
| 28 | +// Sources/Core/ContextCompactor.swift |
| 29 | +public struct ContextCompactor: Sendable { |
| 30 | + public static let keepRecent = 3 |
| 31 | + public static let minContentLength = 100 |
| 32 | + |
| 33 | + public let transcriptDirectory: String |
| 34 | + public let tokenThreshold: Int |
| 35 | + |
| 36 | + public init( |
| 37 | + transcriptDirectory: String, |
| 38 | + tokenThreshold: Int = Limits.defaultTokenThreshold |
| 39 | + ) { |
| 40 | + self.transcriptDirectory = transcriptDirectory |
| 41 | + self.tokenThreshold = tokenThreshold |
| 42 | + } |
| 43 | +} |
| 44 | +``` |
| 45 | + |
| 46 | +The `keepRecent` and `minContentLength` constants control micro-compact's behavior: keep the three most recent tool results untouched, and only replace results longer than 100 characters. Anything shorter isn't worth compacting. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +### Micro-compact: the quiet layer |
| 51 | + |
| 52 | +The `microCompact` method scans the messages array for every `.toolResult` content block, identifies which ones are old enough to compress, and replaces their content with a placeholder. One thing to keep in mind here is that `Message.content` is a `let` property — we can't mutate a content block in place. Instead, we reconstruct entire `Message` values with new content arrays: |
| 53 | + |
| 54 | +```swift |
| 55 | +public func microCompact(messages: inout [Message]) { |
| 56 | + let toolResultLocations = findToolResultLocations(in: messages) |
| 57 | + guard toolResultLocations.count > Self.keepRecent else { |
| 58 | + return |
| 59 | + } |
| 60 | + |
| 61 | + let toolNameMap = buildToolNameMap(from: messages) |
| 62 | + let oldResults = toolResultLocations.dropLast(Self.keepRecent) |
| 63 | + var modifiedContents: [Int: [ContentBlock]] = [:] |
| 64 | + |
| 65 | + for (msgIdx, contentIdx) in oldResults { |
| 66 | + guard |
| 67 | + case .toolResult(let toolUseId, let content, let isError) = messages[msgIdx].content[contentIdx], |
| 68 | + content.count > Self.minContentLength |
| 69 | + else { |
| 70 | + continue |
| 71 | + } |
| 72 | + |
| 73 | + let toolName = toolNameMap[toolUseId] ?? "unknown" |
| 74 | + let replacement = ContentBlock.toolResult( |
| 75 | + toolUseId: toolUseId, |
| 76 | + content: "[Previous: used \(toolName)]", |
| 77 | + isError: isError |
| 78 | + ) |
| 79 | + |
| 80 | + if modifiedContents[msgIdx] == nil { |
| 81 | + modifiedContents[msgIdx] = messages[msgIdx].content |
| 82 | + } |
| 83 | + modifiedContents[msgIdx]![contentIdx] = replacement |
| 84 | + } |
| 85 | + |
| 86 | + for (msgIdx, newContent) in modifiedContents { |
| 87 | + messages[msgIdx] = Message(role: messages[msgIdx].role, content: newContent) |
| 88 | + } |
| 89 | +} |
| 90 | +``` |
| 91 | + |
| 92 | +The method is intentionally synchronous — it's pure data transformation with no reason to await anything. Two private helpers do the scanning: `findToolResultLocations` collects every `toolResult` position in the array, and `buildToolNameMap` walks assistant messages to map each `toolUseId` back to its tool name — bridging a gap in the API's data model where `toolResult` blocks carry an ID but no name. |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +### Auto-compact: threshold-triggered summarization |
| 97 | + |
| 98 | +Layer 2 needs to answer a question before it can act: how many tokens are we using? The API doesn't tell us the context size mid-conversation, so we estimate: |
| 99 | + |
| 100 | +```swift |
| 101 | +public func estimateTokens(from messages: [Message]) -> Int { |
| 102 | + let data = (try? JSONEncoder().encode(messages)) ?? Data() |
| 103 | + return data.count / 4 |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +The divide-by-four heuristic is rough, but it's close enough for a threshold check — and JSON encoding closely matches the actual API payload size, which is what we care about. |
| 108 | + |
| 109 | +When the estimate crosses the threshold, `autoCompact` takes over. It saves the full transcript to disk first — nothing is truly lost — then asks the LLM to summarize: |
| 110 | + |
| 111 | +```swift |
| 112 | +public func autoCompact( |
| 113 | + messages: [Message], |
| 114 | + using apiClient: APIClientProtocol, |
| 115 | + model: String, |
| 116 | + focus: String? |
| 117 | +) async -> [Message] { |
| 118 | + do { |
| 119 | + let path = try saveTranscript(messages) |
| 120 | + |
| 121 | + let encoder = JSONEncoder() |
| 122 | + let data = (try? encoder.encode(messages)) ?? Data() |
| 123 | + |
| 124 | + var transcript = String(data: data, encoding: .utf8) ?? "[]" |
| 125 | + if transcript.count > Self.maxSummaryInputLength { |
| 126 | + transcript = String(transcript.prefix(Self.maxSummaryInputLength)) + "\n[truncated]" |
| 127 | + } |
| 128 | + |
| 129 | + var prompt = "" |
| 130 | + if let focus, !focus.isEmpty { |
| 131 | + prompt += "Focus on: \(focus). " |
| 132 | + } |
| 133 | + prompt += """ |
| 134 | + Summarize this conversation for continuity. Include: \ |
| 135 | + 1) What was accomplished, 2) Current state, 3) Key decisions made. \ |
| 136 | + Be concise but preserve critical details. |
| 137 | +
|
| 138 | + \(transcript) |
| 139 | + """ |
| 140 | + |
| 141 | + let request = APIRequest( |
| 142 | + model: model, |
| 143 | + maxTokens: 2000, |
| 144 | + messages: [.user(prompt)] |
| 145 | + ) |
| 146 | + let response = try await apiClient.createMessage(request: request) |
| 147 | + let summary = response.content.textContent |
| 148 | + |
| 149 | + return [ |
| 150 | + .user("[Conversation compressed. Transcript: \(path)]\n\n\(summary)"), |
| 151 | + .assistant("Understood. I have the context from the summary. Continuing.") |
| 152 | + ] |
| 153 | + } catch { |
| 154 | + print("[warning] Auto-compact failed: \(error). Keeping original messages.") |
| 155 | + return messages |
| 156 | + } |
| 157 | +} |
| 158 | +``` |
| 159 | + |
| 160 | +The `do/catch` wrapping the entire method body is a deliberate safety net — compaction failure should never crash the agent loop. If the API call fails or the transcript can't be written, the method prints a warning and returns the original messages unchanged. The agent continues with a full context rather than no context. |
| 161 | + |
| 162 | +The `saveTranscript` method writes each message as a single JSON line to a `.transcripts/` directory. One early version used a bare Unix timestamp for the filename, which created collisions when two compactions happened in the same second. The fix appends a UUID prefix: |
| 163 | + |
| 164 | +```swift |
| 165 | +let timestamp = Int(Date().timeIntervalSince1970) |
| 166 | +let unique = UUID().uuidString.prefix(8) |
| 167 | +let path = "\(transcriptDirectory)/transcript_\(timestamp)_\(unique).jsonl" |
| 168 | +``` |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +### The compact tool and two-phase dispatch |
| 173 | + |
| 174 | +Layer 3 gives the model direct control over compression. The `compact` tool definition includes an optional `focus` parameter that lets the model specify what the summary should preserve: |
| 175 | + |
| 176 | +```swift |
| 177 | +ToolDefinition( |
| 178 | + name: "compact", |
| 179 | + description: "Compress conversation history to free context space. Use when working on long tasks.", |
| 180 | + inputSchema: .object([ |
| 181 | + "type": "object", |
| 182 | + "properties": .object([ |
| 183 | + "focus": .object([ |
| 184 | + "type": "string", |
| 185 | + "description": "What to preserve in the summary (e.g., 'file paths edited', 'current task progress')" |
| 186 | + ]) |
| 187 | + ]), |
| 188 | + "required": .array([]) |
| 189 | + ]) |
| 190 | +) |
| 191 | +``` |
| 192 | + |
| 193 | +The handler, though, is surprising — it doesn't actually compact anything: |
| 194 | + |
| 195 | +```swift |
| 196 | +private func executeCompact(_ input: JSONValue) async -> Result<String, ToolError> { |
| 197 | + .success("Compressing...") |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +This is the two-phase dispatch pattern. The `compact` tool can't perform the actual compaction because tool handlers return `Result<String, ToolError>` — they don't have access to the messages array. The real work needs to happen in the loop, where `messages` is a local `var`. So the handler returns a marker string, and `processToolUses` captures the focus parameter as a signal: |
| 202 | + |
| 203 | +```swift |
| 204 | +struct ToolProcessingResult { |
| 205 | + let results: [ContentBlock] |
| 206 | + let didUseTodo: Bool |
| 207 | + let compactFocus: String? |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +The `compactFocus` field is `nil` when compact wasn't called, and holds the focus value (or an empty string for no focus) when it was. This replaces the growing tuple that `processToolUses` previously returned — a named struct with a clear `nil`-vs-present semantic is easier to reason about than a third tuple element. |
| 212 | + |
| 213 | +Inside `processToolUses`, the compact detection is a simple check alongside the existing `didUseTodo` tracking: |
| 214 | + |
| 215 | +```swift |
| 216 | +if name == "compact" { |
| 217 | + compactFocus = input["focus"]?.stringValue ?? "" |
| 218 | +} |
| 219 | +``` |
| 220 | + |
| 221 | +--- |
| 222 | + |
| 223 | +### Wiring into the agent loop |
| 224 | + |
| 225 | +With all three layers built, let's connect them. The `applyCompaction` helper runs layers 1 and 2 in sequence: |
| 226 | + |
| 227 | +```swift |
| 228 | +private func applyCompaction(_ messages: [Message]) async -> [Message] { |
| 229 | + var compacted = messages |
| 230 | + contextCompactor.microCompact(messages: &compacted) |
| 231 | + |
| 232 | + if contextCompactor.estimateTokens(from: compacted) > contextCompactor.tokenThreshold { |
| 233 | + print("[auto_compact triggered]") |
| 234 | + return await contextCompactor.autoCompact( |
| 235 | + messages: compacted, using: apiClient, model: model, focus: nil |
| 236 | + ) |
| 237 | + } |
| 238 | + |
| 239 | + return compacted |
| 240 | +} |
| 241 | +``` |
| 242 | + |
| 243 | +Micro-compact runs first (every turn), then the threshold check determines whether auto-compact fires. The method takes messages by value and returns a new array — the same pure-value pattern we've used since extracting `agentLoop` for subagents. |
| 244 | + |
| 245 | +In the loop itself, `applyCompaction` runs before each API call, and manual compaction runs after tool results are appended: |
| 246 | + |
| 247 | +```swift |
| 248 | +while true { |
| 249 | + try Task.checkCancellation() |
| 250 | + |
| 251 | + iteration += 1 |
| 252 | + if iteration > config.maxIterations { |
| 253 | + return (lastAssistantText + "\n(\(config.label) reached iteration limit)", messages) |
| 254 | + } |
| 255 | + |
| 256 | + messages = await applyCompaction(messages) |
| 257 | + |
| 258 | + let request = APIRequest( |
| 259 | + model: model, maxTokens: Limits.defaultMaxTokens, |
| 260 | + system: systemPrompt, messages: messages, tools: config.tools |
| 261 | + ) |
| 262 | + |
| 263 | + let response = try await apiClient.createMessage(request: request) |
| 264 | + messages.append(Message(role: .assistant, content: response.content)) |
| 265 | + // ... print, check stop reason, process tools ... |
| 266 | + |
| 267 | + messages.append(Message(role: .user, content: toolResults)) |
| 268 | + |
| 269 | + if let compactFocus = toolProcessing.compactFocus { |
| 270 | + print("[manual compact]") |
| 271 | + messages = await contextCompactor.autoCompact( |
| 272 | + messages: messages, using: apiClient, model: model, focus: compactFocus |
| 273 | + ) |
| 274 | + } |
| 275 | +} |
| 276 | +``` |
| 277 | + |
| 278 | +The placement matters. Micro-compact and auto-compact run _before_ the API call, so the request always goes out with a trimmed context. Manual compact runs _after_ tool results are appended, so the summary includes the compact tool call itself — the model's explicit decision to compress is preserved in the transcript. |
| 279 | + |
| 280 | +The `compact` tool is excluded from `LoopConfig.subagent` alongside `agent` and `todo` — a subagent shouldn't be able to compress the parent's history. But micro-compact and auto-compact _do_ run in subagent loops, since subagents share the same `agentLoop` code path. A subagent making heavy `read_file` calls across its 30-iteration limit can benefit from the quiet cleanup. |
| 281 | + |
| 282 | +With that in place, we now have an agent that manages its own memory. Three layers of compression, one new type, and two injection points in the loop — before the API call and after tool processing. |
| 283 | + |
| 284 | +--- |
| 285 | + |
| 286 | +### Taking it for a spin |
| 287 | + |
| 288 | +Let's build and run: |
| 289 | + |
| 290 | +```bash |
| 291 | +swift build && swift run claude |
| 292 | +``` |
| 293 | + |
| 294 | +Try: `Read every Swift file in the Sources/ directory one by one.` Watch the terminal — after the first few files, earlier tool results in the context will start appearing as `"[Previous: used read_file]"` in subsequent API requests. That's micro-compact doing its work silently. |
| 295 | + |
| 296 | +For a more dramatic demonstration, keep reading files or ask the agent to explore a large codebase. When the estimated token count crosses 50,000, auto-compact triggers: the agent saves a full transcript to `.transcripts/`, asks the LLM for a summary, and continues with a fresh two-message context. Check the `.transcripts/` directory afterward — the full conversation history is preserved as JSONL. |
| 297 | + |
| 298 | +To see layer 3 in action, try: `Use the compact tool to compress this conversation, focusing on what files we've read.` The model calls `compact` with a focus parameter, the loop triggers summarization, and the conversation continues with a targeted summary. |
| 299 | + |
| 300 | +--- |
| 301 | + |
| 302 | +### What we've built and where it breaks |
| 303 | + |
| 304 | +We now have an agent that can work indefinitely. Micro-compact quietly trims old tool results every turn. Auto-compact summarizes the full conversation when the context gets large. The `compact` tool gives the model deliberate control. Transcripts on disk mean nothing is truly lost — just moved out of active context. |
| 305 | + |
| 306 | +The limitation is that compression is lossy. When auto-compact fires, the model loses access to the exact content of files it read, the precise error messages it encountered, the specific commands it ran. The summary preserves the _gist_ — what was accomplished, the current state, key decisions — but not the details. For a long-running task with dozens of steps, the model might forget exactly which files it edited or which approach it tried and abandoned. The loop is still the invariant; tools are still the variable. But now one of those tools can reshape the loop's own working memory — the first time in our series that the agent isn't just acting on the world, but acting on itself. In the next guide, we'll address the lossy-compression problem directly: a file-based task system that gives the agent durable state that survives compaction. Thanks for reading! |
0 commit comments