Skip to content

Commit a2ddeba

Browse files
authored
Merge pull request #9177 from Kilo-Org/docs/context-condensing-accurate-defaults
docs: document actual compaction defaults and triggers
2 parents b24baf9 + 70b38bc commit a2ddeba

1 file changed

Lines changed: 119 additions & 63 deletions

File tree

packages/kilo-docs/pages/customize/context/context-condensing.md

Lines changed: 119 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ When working on complex tasks, conversations with Kilo Code can grow long and co
1111

1212
## The Problem: Context Window Limits
1313

14-
Every AI model has a maximum context window - a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:
14+
Every AI model has a maximum context window a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:
1515

1616
- Slower responses as the model processes more tokens
1717
- Higher API costs due to increased token usage
@@ -22,130 +22,177 @@ Every AI model has a maximum context window - a limit on how much text it can pr
2222

2323
## The Solution: Auto-Compaction
2424

25-
The new platform uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
25+
Kilo Code uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
2626

2727
- The overall goal of the session
28-
- Key discoveries made along the way
28+
- Instructions given along the way
29+
- Key discoveries made
2930
- What has been accomplished so far
30-
- Files that were modified
31+
- Relevant files and directories
3132

3233
This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.
3334

34-
## How Compaction Works
35+
## How Compaction Triggers
3536

36-
### Automatic Compaction
37+
### Automatic trigger
3738

38-
Compaction triggers automatically when the conversation reaches the `usableWindow` token threshold. The full conversation history is sent to a dedicated **compaction agent**, which produces a structured summary. This happens in the background without interrupting your workflow.
39+
Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.
3940

40-
### Context Pruning
41+
How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.
4142

42-
In addition to compaction, the system can **prune** old tool outputs to reclaim context space incrementally. Tool results older than a 40,000-token recency window are replaced with `"[Old tool result content cleared]"`. This is a lighter-weight mechanism that runs alongside full compaction.
43+
Custom models that do not declare a context window are not tracked, and auto-compaction does not run for them.
4344

44-
### Manual Compaction
45+
### Context Pruning
4546

46-
You can also trigger compaction manually:
47+
Between turns, Kilo also runs a lighter **prune** pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with `"[Old tool result content cleared]"`. Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.
4748

48-
- **CLI TUI**: Press `<leader>c` to compact the current session
49-
- **Extension Webview**: Send a `CompactRequest` message to trigger compaction
49+
### Manual Compaction
5050

51-
{% callout type="info" %}
52-
There is no `/condense` chat command on the new platform. Use the keybinding or message-based invocation instead.
53-
{% /callout %}
51+
You can trigger compaction at any time:
5452

55-
### The Compaction Process
53+
- **Slash command**: type `/compact` in chat (also findable by typing `smol` or `condense`)
54+
- **Task header button**: click the compact icon in the active task header
55+
- **Settings**: toggle auto-compaction in **Settings → Context**
5656

57-
When compaction is triggered:
57+
## Defaults
5858

59-
1. **Threshold Check**: The system detects that context usage has reached the `usableWindow` limit
60-
2. **Agent Summarization**: The full conversation history is sent to a dedicated compaction agent
61-
3. **Structured Summary**: The agent produces a summary covering the goal, discoveries, accomplishments, and modified files
62-
4. **Replacement**: The detailed history is replaced with the compacted summary
63-
5. **Continuation**: You continue working with the freed-up context space
59+
| Setting | Default | Effect |
60+
| --------------------- | -------------------------------------- | -------------------------------------------------------------------------------------- |
61+
| `compaction.auto` | `true` | Automatically compact when the usable window is reached |
62+
| `compaction.prune` | `true` | Clear old tool outputs beyond the 40K recency window |
63+
| `compaction.reserved` | `min(20,000, model_max_output_tokens)` | Token headroom kept free for the next turn — also defines the compaction trigger point |
6464

65-
## Configuration Options
65+
## Configuration
6666

6767
Compaction is configured in your `kilo.jsonc` file:
6868

6969
```jsonc
7070
{
7171
"compaction": {
7272
"auto": true, // Enable or disable automatic compaction
73-
"reserved": 4096, // Number of tokens to reserve (keep free) after compaction
7473
"prune": true, // Enable pruning of old tool outputs beyond the recency window
74+
"reserved": 20000, // Token buffer kept free; smaller = later trigger, larger = earlier trigger
75+
},
76+
}
77+
```
78+
79+
| Option | Type | Default | Description |
80+
| --------------------- | ------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
81+
| `compaction.auto` | boolean | `true` | Enable or disable automatic compaction when the usable window is reached |
82+
| `compaction.prune` | boolean | `true` | Enable pruning of old tool outputs outside the 40K token recency window |
83+
| `compaction.reserved` | number | `min(20000, model_max_output)` | Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead. |
84+
85+
### Use a different model for compaction
86+
87+
Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:
88+
89+
```jsonc
90+
{
91+
"agent": {
92+
"compaction": {
93+
"model": "anthropic/claude-haiku-4-5",
94+
},
7595
},
7696
}
7797
```
7898

79-
| Option | Type | Description |
80-
| --------------------- | ------- | ------------------------------------------------------------------------ |
81-
| `compaction.auto` | boolean | Enable or disable automatic compaction when the context threshold is hit |
82-
| `compaction.reserved` | number | Number of tokens to reserve after compaction |
83-
| `compaction.prune` | boolean | Enable pruning of old tool outputs outside the 40K token recency window |
99+
If no compaction agent is set, the current session's model is used.
100+
101+
### Environment overrides
102+
103+
| Variable | Effect |
104+
| ------------------------------------ | ------------------------------------------------- |
105+
| `KILO_DISABLE_AUTOCOMPACT=1` | Forces `compaction.auto = false` |
106+
| `KILO_DISABLE_PRUNE=1` | Forces `compaction.prune = false` |
107+
| `KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAX` | Overrides the 32,000 default output-token ceiling |
84108

85109
{% /tab %}
86110
{% tab label="CLI" %}
87111

88112
## The Solution: Auto-Compaction
89113

90-
The new platform uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
114+
Kilo CLI uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
91115

92116
- The overall goal of the session
93-
- Key discoveries made along the way
117+
- Instructions given along the way
118+
- Key discoveries made
94119
- What has been accomplished so far
95-
- Files that were modified
120+
- Relevant files and directories
96121

97122
This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.
98123

99-
## How Compaction Works
124+
## How Compaction Triggers
100125

101-
### Automatic Compaction
126+
### Automatic trigger
102127

103-
Compaction triggers automatically when the conversation reaches the `usableWindow` token threshold. The full conversation history is sent to a dedicated **compaction agent**, which produces a structured summary. This happens in the background without interrupting your workflow.
128+
Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.
104129

105-
### Context Pruning
130+
How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.
106131

107-
In addition to compaction, the system can **prune** old tool outputs to reclaim context space incrementally. Tool results older than a 40,000-token recency window are replaced with `"[Old tool result content cleared]"`. This is a lighter-weight mechanism that runs alongside full compaction.
132+
[Custom models](/docs/code-with-ai/agents/custom-models) that do not declare a context window are not tracked, and auto-compaction does not run for them.
108133

109-
### Manual Compaction
134+
### Context Pruning
110135

111-
You can also trigger compaction manually:
136+
Between turns, Kilo also runs a lighter **prune** pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with `"[Old tool result content cleared]"`. Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.
112137

113-
- **CLI TUI**: Press `<leader>c` to compact the current session
114-
- **Extension Webview**: Send a `CompactRequest` message to trigger compaction
138+
### Manual Compaction
115139

116-
{% callout type="info" %}
117-
There is no `/condense` chat command on the new platform. Use the keybinding or message-based invocation instead.
118-
{% /callout %}
140+
You can trigger compaction at any time:
119141

120-
### The Compaction Process
142+
- **Slash command**: type `/compact` in the TUI (alias: `/summarize`)
143+
- **Keybinding**: press `<leader>c` in the TUI
121144

122-
When compaction is triggered:
145+
## Defaults
123146

124-
1. **Threshold Check**: The system detects that context usage has reached the `usableWindow` limit
125-
2. **Agent Summarization**: The full conversation history is sent to a dedicated compaction agent
126-
3. **Structured Summary**: The agent produces a summary covering the goal, discoveries, accomplishments, and modified files
127-
4. **Replacement**: The detailed history is replaced with the compacted summary
128-
5. **Continuation**: You continue working with the freed-up context space
147+
| Setting | Default | Effect |
148+
| --------------------- | -------------------------------------- | -------------------------------------------------------------------------------------- |
149+
| `compaction.auto` | `true` | Automatically compact when the usable window is reached |
150+
| `compaction.prune` | `true` | Clear old tool outputs beyond the 40K recency window |
151+
| `compaction.reserved` | `min(20,000, model_max_output_tokens)` | Token headroom kept free for the next turn — also defines the compaction trigger point |
129152

130-
## Configuration Options
153+
## Configuration
131154

132155
Compaction is configured in your `kilo.jsonc` file:
133156

134157
```jsonc
135158
{
136159
"compaction": {
137160
"auto": true, // Enable or disable automatic compaction
138-
"reserved": 4096, // Number of tokens to reserve (keep free) after compaction
139161
"prune": true, // Enable pruning of old tool outputs beyond the recency window
162+
"reserved": 20000, // Token buffer kept free; smaller = later trigger, larger = earlier trigger
163+
},
164+
}
165+
```
166+
167+
| Option | Type | Default | Description |
168+
| --------------------- | ------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
169+
| `compaction.auto` | boolean | `true` | Enable or disable automatic compaction when the usable window is reached |
170+
| `compaction.prune` | boolean | `true` | Enable pruning of old tool outputs outside the 40K token recency window |
171+
| `compaction.reserved` | number | `min(20000, model_max_output)` | Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead. |
172+
173+
### Use a different model for compaction
174+
175+
Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:
176+
177+
```jsonc
178+
{
179+
"agent": {
180+
"compaction": {
181+
"model": "anthropic/claude-haiku-4-5",
182+
},
140183
},
141184
}
142185
```
143186

144-
| Option | Type | Description |
145-
| --------------------- | ------- | ------------------------------------------------------------------------ |
146-
| `compaction.auto` | boolean | Enable or disable automatic compaction when the context threshold is hit |
147-
| `compaction.reserved` | number | Number of tokens to reserve after compaction |
148-
| `compaction.prune` | boolean | Enable pruning of old tool outputs outside the 40K token recency window |
187+
If no compaction agent is set, the current session's model is used.
188+
189+
### Environment overrides
190+
191+
| Variable | Effect |
192+
| ------------------------------------ | ------------------------------------------------- |
193+
| `KILO_DISABLE_AUTOCOMPACT=1` | Forces `compaction.auto = false` |
194+
| `KILO_DISABLE_PRUNE=1` | Forces `compaction.prune = false` |
195+
| `KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAX` | Overrides the 32,000 default output-token ceiling |
149196

150197
{% /tab %}
151198
{% tab label="VSCode (Legacy)" %}
@@ -219,17 +266,26 @@ If the condensed summary doesn't capture important details:
219266

220267
## Best Practices
221268

222-
### When to Condense
269+
### When to Compact
223270

224271
- **Long sessions**: If you've been working for an extended period on a complex task
225272
- **Before major transitions**: When switching to a different aspect of your project
226-
- **When prompted**: When Kilo Code suggests condensing or compaction due to context limits
273+
- **When approaching limits**: Run `/compact` manually before hitting the automatic trigger if you want control over _when_ the summary is produced
274+
275+
### Tuning `compaction.reserved`
276+
277+
On models that advertise a separate input limit, the `reserved` value is a trade-off:
278+
279+
- **Lower value** (e.g. `10000`) → compaction triggers later, you get more turns out of the raw window, but you risk a mid-turn context overflow if a single response is larger than the buffer.
280+
- **Higher value** (e.g. `40000`) → compaction triggers earlier, fewer overflow errors, but shorter effective conversations between summaries.
281+
282+
The default of `~20K` is tuned to leave room for a full-size assistant response plus tool output. The setting has no effect on models with a single context window, which always reserve their full output cap instead.
227283

228284
### Maintaining Context Quality
229285

230286
- **Be specific in your initial task**: A clear task description helps create better summaries
231-
- **Use AGENTS.md**: Combine with [AGENTS.md](/docs/customize/agents-md) for persistent project context that doesn't need to be condensed
232-
- **Review the summary**: After condensing or compaction, the summary is visible in your chat history
287+
- **Use AGENTS.md**: Combine with [AGENTS.md](/docs/customize/agents-md) for persistent project context that doesn't need to be compacted
288+
- **Review the summary**: After compaction, the summary is visible in your chat history
233289

234290
## Related Features
235291

0 commit comments

Comments
 (0)