Skip to content

Commit 66bdafa

Browse files
committed
Create s06.md
1 parent c27fc45 commit 66bdafa

1 file changed

Lines changed: 306 additions & 0 deletions

File tree

docs/s06.md

Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
# Context compaction: infinite conversations
2+
3+
Our agent has come a long way. It runs commands, reads and writes files, tracks its own work, delegates to subagents, and loads skills on demand — seven tools, one loop. But every one of those capabilities adds to the same growing resource: the messages array. A single `read_file` on a 1,000-line source file costs roughly 4,000 tokens. Load a skill body, and that's another 2,000. After reading 30 files and running 20 bash commands across a long session, the context pushes past 100,000 tokens. At that point, the agent either hits the API's context window limit and errors out, or — more subtly — the model's response quality degrades as the relevant information gets buried in a sea of stale tool results.
4+
5+
This is the threshold that separates a demo from a useful tool. Everything we've built so far assumes the context has room. Once it doesn't, the agent has a hard ceiling on how much work it can do in a single session. That's where context compaction comes in: a three-layer compression strategy that progressively shrinks the messages array — quietly trimming old results, automatically summarizing when a threshold is crossed, and letting the model request compression explicitly. With these three layers working together, the agent can run indefinitely.
6+
7+
In this guide, let's build `ContextCompactor` — the type that implements all three layers — and wire it into the agent loop. This is the beginning of Act III in our series: the agent now needs to manage its own memory.
8+
9+
_The complete source code for this stage is available at the [`06-context-compaction`](https://github.com/ivan-magda/swift-claude-code/tree/06-context-compaction/Sources) tag on GitHub. Code blocks below show key excerpts._
10+
11+
---
12+
13+
### Three layers, three strategies
14+
15+
The compression strategy works in layers, each more aggressive than the last. Layer 1 — **micro-compact** — runs silently before every API call. It scans the messages array for old tool results (anything beyond the three most recent) and replaces their content with a short placeholder like `"[Previous: used read_file]"`. The model still sees that a tool was called and what kind it was, but the actual output — the 500-line file, the verbose bash output — is gone. This is the quiet housekeeping layer: no API call required, no information loss that the model would typically need, and it runs every single turn.
16+
17+
Layer 2 — **auto-compact** — triggers when the estimated token count crosses a threshold (50,000 by default). This is the dramatic one: the agent saves the entire conversation transcript to disk as a JSONL file, then asks the LLM itself to summarize the conversation. The summary replaces the entire messages array — every prior turn collapses into two messages: a user message containing the compressed summary and an assistant acknowledgment. The conversation continues from there with a clean slate and full context of what happened.
18+
19+
Layer 3 — the **compact tool** — is the same summarization as layer 2, but triggered deliberately. The model calls `compact` when it decides compression would help, optionally specifying a `focus` parameter to guide what the summary should preserve. It's the difference between automatic garbage collection and an explicit `free()` — sometimes the model knows best when to compress.
20+
21+
---
22+
23+
### The ContextCompactor type
24+
25+
Let's start with the type that owns all three layers. `ContextCompactor` holds two configuration values — the path where transcripts are saved and the token threshold that triggers auto-compaction — and exposes methods for each layer:
26+
27+
```swift
28+
// Sources/Core/ContextCompactor.swift
29+
public struct ContextCompactor: Sendable {
30+
public static let keepRecent = 3
31+
public static let minContentLength = 100
32+
33+
public let transcriptDirectory: String
34+
public let tokenThreshold: Int
35+
36+
public init(
37+
transcriptDirectory: String,
38+
tokenThreshold: Int = Limits.defaultTokenThreshold
39+
) {
40+
self.transcriptDirectory = transcriptDirectory
41+
self.tokenThreshold = tokenThreshold
42+
}
43+
}
44+
```
45+
46+
The `keepRecent` and `minContentLength` constants control micro-compact's behavior: keep the three most recent tool results untouched, and only replace results longer than 100 characters. Anything shorter isn't worth compacting.
47+
48+
---
49+
50+
### Micro-compact: the quiet layer
51+
52+
The `microCompact` method scans the messages array for every `.toolResult` content block, identifies which ones are old enough to compress, and replaces their content with a placeholder. One thing to keep in mind here is that `Message.content` is a `let` property — we can't mutate a content block in place. Instead, we reconstruct entire `Message` values with new content arrays:
53+
54+
```swift
55+
public func microCompact(messages: inout [Message]) {
56+
let toolResultLocations = findToolResultLocations(in: messages)
57+
guard toolResultLocations.count > Self.keepRecent else {
58+
return
59+
}
60+
61+
let toolNameMap = buildToolNameMap(from: messages)
62+
let oldResults = toolResultLocations.dropLast(Self.keepRecent)
63+
var modifiedContents: [Int: [ContentBlock]] = [:]
64+
65+
for (msgIdx, contentIdx) in oldResults {
66+
guard
67+
case .toolResult(let toolUseId, let content, let isError) = messages[msgIdx].content[contentIdx],
68+
content.count > Self.minContentLength
69+
else {
70+
continue
71+
}
72+
73+
let toolName = toolNameMap[toolUseId] ?? "unknown"
74+
let replacement = ContentBlock.toolResult(
75+
toolUseId: toolUseId,
76+
content: "[Previous: used \(toolName)]",
77+
isError: isError
78+
)
79+
80+
if modifiedContents[msgIdx] == nil {
81+
modifiedContents[msgIdx] = messages[msgIdx].content
82+
}
83+
modifiedContents[msgIdx]![contentIdx] = replacement
84+
}
85+
86+
for (msgIdx, newContent) in modifiedContents {
87+
messages[msgIdx] = Message(role: messages[msgIdx].role, content: newContent)
88+
}
89+
}
90+
```
91+
92+
The method is intentionally synchronous — it's pure data transformation with no reason to await anything. Two private helpers do the scanning: `findToolResultLocations` collects every `toolResult` position in the array, and `buildToolNameMap` walks assistant messages to map each `toolUseId` back to its tool name — bridging a gap in the API's data model where `toolResult` blocks carry an ID but no name.
93+
94+
---
95+
96+
### Auto-compact: threshold-triggered summarization
97+
98+
Layer 2 needs to answer a question before it can act: how many tokens are we using? The API doesn't tell us the context size mid-conversation, so we estimate:
99+
100+
```swift
101+
public func estimateTokens(from messages: [Message]) -> Int {
102+
let data = (try? JSONEncoder().encode(messages)) ?? Data()
103+
return data.count / 4
104+
}
105+
```
106+
107+
The divide-by-four heuristic is rough, but it's close enough for a threshold check — and JSON encoding closely matches the actual API payload size, which is what we care about.
108+
109+
When the estimate crosses the threshold, `autoCompact` takes over. It saves the full transcript to disk first — nothing is truly lost — then asks the LLM to summarize:
110+
111+
```swift
112+
public func autoCompact(
113+
messages: [Message],
114+
using apiClient: APIClientProtocol,
115+
model: String,
116+
focus: String?
117+
) async -> [Message] {
118+
do {
119+
let path = try saveTranscript(messages)
120+
121+
let encoder = JSONEncoder()
122+
let data = (try? encoder.encode(messages)) ?? Data()
123+
124+
var transcript = String(data: data, encoding: .utf8) ?? "[]"
125+
if transcript.count > Self.maxSummaryInputLength {
126+
transcript = String(transcript.prefix(Self.maxSummaryInputLength)) + "\n[truncated]"
127+
}
128+
129+
var prompt = ""
130+
if let focus, !focus.isEmpty {
131+
prompt += "Focus on: \(focus). "
132+
}
133+
prompt += """
134+
Summarize this conversation for continuity. Include: \
135+
1) What was accomplished, 2) Current state, 3) Key decisions made. \
136+
Be concise but preserve critical details.
137+
138+
\(transcript)
139+
"""
140+
141+
let request = APIRequest(
142+
model: model,
143+
maxTokens: 2000,
144+
messages: [.user(prompt)]
145+
)
146+
let response = try await apiClient.createMessage(request: request)
147+
let summary = response.content.textContent
148+
149+
return [
150+
.user("[Conversation compressed. Transcript: \(path)]\n\n\(summary)"),
151+
.assistant("Understood. I have the context from the summary. Continuing.")
152+
]
153+
} catch {
154+
print("[warning] Auto-compact failed: \(error). Keeping original messages.")
155+
return messages
156+
}
157+
}
158+
```
159+
160+
The `do/catch` wrapping the entire method body is a deliberate safety net — compaction failure should never crash the agent loop. If the API call fails or the transcript can't be written, the method prints a warning and returns the original messages unchanged. The agent continues with a full context rather than no context.
161+
162+
The `saveTranscript` method writes each message as a single JSON line to a `.transcripts/` directory. One early version used a bare Unix timestamp for the filename, which created collisions when two compactions happened in the same second. The fix appends a UUID prefix:
163+
164+
```swift
165+
let timestamp = Int(Date().timeIntervalSince1970)
166+
let unique = UUID().uuidString.prefix(8)
167+
let path = "\(transcriptDirectory)/transcript_\(timestamp)_\(unique).jsonl"
168+
```
169+
170+
---
171+
172+
### The compact tool and two-phase dispatch
173+
174+
Layer 3 gives the model direct control over compression. The `compact` tool definition includes an optional `focus` parameter that lets the model specify what the summary should preserve:
175+
176+
```swift
177+
ToolDefinition(
178+
name: "compact",
179+
description: "Compress conversation history to free context space. Use when working on long tasks.",
180+
inputSchema: .object([
181+
"type": "object",
182+
"properties": .object([
183+
"focus": .object([
184+
"type": "string",
185+
"description": "What to preserve in the summary (e.g., 'file paths edited', 'current task progress')"
186+
])
187+
]),
188+
"required": .array([])
189+
])
190+
)
191+
```
192+
193+
The handler, though, is surprising — it doesn't actually compact anything:
194+
195+
```swift
196+
private func executeCompact(_ input: JSONValue) async -> Result<String, ToolError> {
197+
.success("Compressing...")
198+
}
199+
```
200+
201+
This is the two-phase dispatch pattern. The `compact` tool can't perform the actual compaction because tool handlers return `Result<String, ToolError>` — they don't have access to the messages array. The real work needs to happen in the loop, where `messages` is a local `var`. So the handler returns a marker string, and `processToolUses` captures the focus parameter as a signal:
202+
203+
```swift
204+
struct ToolProcessingResult {
205+
let results: [ContentBlock]
206+
let didUseTodo: Bool
207+
let compactFocus: String?
208+
}
209+
```
210+
211+
The `compactFocus` field is `nil` when compact wasn't called, and holds the focus value (or an empty string for no focus) when it was. This replaces the growing tuple that `processToolUses` previously returned — a named struct with a clear `nil`-vs-present semantic is easier to reason about than a third tuple element.
212+
213+
Inside `processToolUses`, the compact detection is a simple check alongside the existing `didUseTodo` tracking:
214+
215+
```swift
216+
if name == "compact" {
217+
compactFocus = input["focus"]?.stringValue ?? ""
218+
}
219+
```
220+
221+
---
222+
223+
### Wiring into the agent loop
224+
225+
With all three layers built, let's connect them. The `applyCompaction` helper runs layers 1 and 2 in sequence:
226+
227+
```swift
228+
private func applyCompaction(_ messages: [Message]) async -> [Message] {
229+
var compacted = messages
230+
contextCompactor.microCompact(messages: &compacted)
231+
232+
if contextCompactor.estimateTokens(from: compacted) > contextCompactor.tokenThreshold {
233+
print("[auto_compact triggered]")
234+
return await contextCompactor.autoCompact(
235+
messages: compacted, using: apiClient, model: model, focus: nil
236+
)
237+
}
238+
239+
return compacted
240+
}
241+
```
242+
243+
Micro-compact runs first (every turn), then the threshold check determines whether auto-compact fires. The method takes messages by value and returns a new array — the same pure-value pattern we've used since extracting `agentLoop` for subagents.
244+
245+
In the loop itself, `applyCompaction` runs before each API call, and manual compaction runs after tool results are appended:
246+
247+
```swift
248+
while true {
249+
try Task.checkCancellation()
250+
251+
iteration += 1
252+
if iteration > config.maxIterations {
253+
return (lastAssistantText + "\n(\(config.label) reached iteration limit)", messages)
254+
}
255+
256+
messages = await applyCompaction(messages)
257+
258+
let request = APIRequest(
259+
model: model, maxTokens: Limits.defaultMaxTokens,
260+
system: systemPrompt, messages: messages, tools: config.tools
261+
)
262+
263+
let response = try await apiClient.createMessage(request: request)
264+
messages.append(Message(role: .assistant, content: response.content))
265+
// ... print, check stop reason, process tools ...
266+
267+
messages.append(Message(role: .user, content: toolResults))
268+
269+
if let compactFocus = toolProcessing.compactFocus {
270+
print("[manual compact]")
271+
messages = await contextCompactor.autoCompact(
272+
messages: messages, using: apiClient, model: model, focus: compactFocus
273+
)
274+
}
275+
}
276+
```
277+
278+
The placement matters. Micro-compact and auto-compact run _before_ the API call, so the request always goes out with a trimmed context. Manual compact runs _after_ tool results are appended, so the summary includes the compact tool call itself — the model's explicit decision to compress is preserved in the transcript.
279+
280+
The `compact` tool is excluded from `LoopConfig.subagent` alongside `agent` and `todo` — a subagent shouldn't be able to compress the parent's history. But micro-compact and auto-compact _do_ run in subagent loops, since subagents share the same `agentLoop` code path. A subagent making heavy `read_file` calls across its 30-iteration limit can benefit from the quiet cleanup.
281+
282+
With that in place, we now have an agent that manages its own memory. Three layers of compression, one new type, and two injection points in the loop — before the API call and after tool processing.
283+
284+
---
285+
286+
### Taking it for a spin
287+
288+
Let's build and run:
289+
290+
```bash
291+
swift build && swift run claude
292+
```
293+
294+
Try: `Read every Swift file in the Sources/ directory one by one.` Watch the terminal — after the first few files, earlier tool results in the context will start appearing as `"[Previous: used read_file]"` in subsequent API requests. That's micro-compact doing its work silently.
295+
296+
For a more dramatic demonstration, keep reading files or ask the agent to explore a large codebase. When the estimated token count crosses 50,000, auto-compact triggers: the agent saves a full transcript to `.transcripts/`, asks the LLM for a summary, and continues with a fresh two-message context. Check the `.transcripts/` directory afterward — the full conversation history is preserved as JSONL.
297+
298+
To see layer 3 in action, try: `Use the compact tool to compress this conversation, focusing on what files we've read.` The model calls `compact` with a focus parameter, the loop triggers summarization, and the conversation continues with a targeted summary.
299+
300+
---
301+
302+
### What we've built and where it breaks
303+
304+
We now have an agent that can work indefinitely. Micro-compact quietly trims old tool results every turn. Auto-compact summarizes the full conversation when the context gets large. The `compact` tool gives the model deliberate control. Transcripts on disk mean nothing is truly lost — just moved out of active context.
305+
306+
The limitation is that compression is lossy. When auto-compact fires, the model loses access to the exact content of files it read, the precise error messages it encountered, the specific commands it ran. The summary preserves the _gist_ — what was accomplished, the current state, key decisions — but not the details. For a long-running task with dozens of steps, the model might forget exactly which files it edited or which approach it tried and abandoned. The loop is still the invariant; tools are still the variable. But now one of those tools can reshape the loop's own working memory — the first time in our series that the agent isn't just acting on the world, but acting on itself. In the next guide, we'll address the lossy-compression problem directly: a file-based task system that gives the agent durable state that survives compaction. Thanks for reading!

0 commit comments

Comments
 (0)