You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -11,7 +11,7 @@ When working on complex tasks, conversations with Kilo Code can grow long and co
11
11
12
12
## The Problem: Context Window Limits
13
13
14
-
Every AI model has a maximum context window - a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:
14
+
Every AI model has a maximum context window — a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:
15
15
16
16
- Slower responses as the model processes more tokens
17
17
- Higher API costs due to increased token usage
@@ -22,130 +22,177 @@ Every AI model has a maximum context window - a limit on how much text it can pr
22
22
23
23
## The Solution: Auto-Compaction
24
24
25
-
The new platform uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
25
+
Kilo Code uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
26
26
27
27
- The overall goal of the session
28
-
- Key discoveries made along the way
28
+
- Instructions given along the way
29
+
- Key discoveries made
29
30
- What has been accomplished so far
30
-
-Files that were modified
31
+
-Relevant files and directories
31
32
32
33
This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.
33
34
34
-
## How Compaction Works
35
+
## How Compaction Triggers
35
36
36
-
### Automatic Compaction
37
+
### Automatic trigger
37
38
38
-
Compaction triggers automatically when the conversation reaches the `usableWindow` token threshold. The full conversation history is sent to a dedicated **compaction agent**, which produces a structured summary. This happens in the background without interrupting your workflow.
39
+
Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.
39
40
40
-
### Context Pruning
41
+
How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.
41
42
42
-
In addition to compaction, the system can **prune** old tool outputs to reclaim context space incrementally. Tool results older than a 40,000-token recency window are replaced with `"[Old tool result content cleared]"`. This is a lighter-weight mechanism that runs alongside full compaction.
43
+
Custom models that do not declare a context window are not tracked, and auto-compaction does not run for them.
43
44
44
-
### Manual Compaction
45
+
### Context Pruning
45
46
46
-
You can also trigger compaction manually:
47
+
Between turns, Kilo also runs a lighter **prune** pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with `"[Old tool result content cleared]"`. Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.
47
48
48
-
-**CLI TUI**: Press `<leader>c` to compact the current session
49
-
-**Extension Webview**: Send a `CompactRequest` message to trigger compaction
49
+
### Manual Compaction
50
50
51
-
{% callout type="info" %}
52
-
There is no `/condense` chat command on the new platform. Use the keybinding or message-based invocation instead.
53
-
{% /callout %}
51
+
You can trigger compaction at any time:
54
52
55
-
### The Compaction Process
53
+
-**Slash command**: type `/compact` in chat (also findable by typing `smol` or `condense`)
54
+
-**Task header button**: click the compact icon in the active task header
55
+
-**Settings**: toggle auto-compaction in **Settings → Context**
56
56
57
-
When compaction is triggered:
57
+
## Defaults
58
58
59
-
1.**Threshold Check**: The system detects that context usage has reached the `usableWindow` limit
60
-
2.**Agent Summarization**: The full conversation history is sent to a dedicated compaction agent
61
-
3.**Structured Summary**: The agent produces a summary covering the goal, discoveries, accomplishments, and modified files
62
-
4.**Replacement**: The detailed history is replaced with the compacted summary
63
-
5.**Continuation**: You continue working with the freed-up context space
|`compaction.auto`|`true`| Automatically compact when the usable window is reached |
62
+
|`compaction.prune`|`true`| Clear old tool outputs beyond the 40K recency window |
63
+
|`compaction.reserved`|`min(20,000, model_max_output_tokens)`| Token headroom kept free for the next turn — also defines the compaction trigger point |
64
64
65
-
## Configuration Options
65
+
## Configuration
66
66
67
67
Compaction is configured in your `kilo.jsonc` file:
68
68
69
69
```jsonc
70
70
{
71
71
"compaction": {
72
72
"auto":true, // Enable or disable automatic compaction
73
-
"reserved":4096, // Number of tokens to reserve (keep free) after compaction
74
73
"prune":true, // Enable pruning of old tool outputs beyond the recency window
|`compaction.auto`| boolean |`true`| Enable or disable automatic compaction when the usable window is reached |
82
+
|`compaction.prune`| boolean |`true`| Enable pruning of old tool outputs outside the 40K token recency window |
83
+
|`compaction.reserved`| number |`min(20000, model_max_output)`| Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead. |
84
+
85
+
### Use a different model for compaction
86
+
87
+
Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:
|`KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAX`| Overrides the 32,000 default output-token ceiling |
84
108
85
109
{% /tab %}
86
110
{% tab label="CLI" %}
87
111
88
112
## The Solution: Auto-Compaction
89
113
90
-
The new platform uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
114
+
Kilo CLI uses a **Compaction** system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
91
115
92
116
- The overall goal of the session
93
-
- Key discoveries made along the way
117
+
- Instructions given along the way
118
+
- Key discoveries made
94
119
- What has been accomplished so far
95
-
-Files that were modified
120
+
-Relevant files and directories
96
121
97
122
This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.
98
123
99
-
## How Compaction Works
124
+
## How Compaction Triggers
100
125
101
-
### Automatic Compaction
126
+
### Automatic trigger
102
127
103
-
Compaction triggers automatically when the conversation reaches the `usableWindow` token threshold. The full conversation history is sent to a dedicated **compaction agent**, which produces a structured summary. This happens in the background without interrupting your workflow.
128
+
Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.
104
129
105
-
### Context Pruning
130
+
How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.
106
131
107
-
In addition to compaction, the system can **prune** old tool outputs to reclaim context space incrementally. Tool results older than a 40,000-token recency window are replaced with `"[Old tool result content cleared]"`. This is a lighter-weight mechanism that runs alongside full compaction.
132
+
[Custom models](/docs/code-with-ai/agents/custom-models) that do not declare a context window are not tracked, and auto-compaction does not run for them.
108
133
109
-
### Manual Compaction
134
+
### Context Pruning
110
135
111
-
You can also trigger compaction manually:
136
+
Between turns, Kilo also runs a lighter **prune** pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with `"[Old tool result content cleared]"`. Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.
112
137
113
-
-**CLI TUI**: Press `<leader>c` to compact the current session
114
-
-**Extension Webview**: Send a `CompactRequest` message to trigger compaction
138
+
### Manual Compaction
115
139
116
-
{% callout type="info" %}
117
-
There is no `/condense` chat command on the new platform. Use the keybinding or message-based invocation instead.
118
-
{% /callout %}
140
+
You can trigger compaction at any time:
119
141
120
-
### The Compaction Process
142
+
-**Slash command**: type `/compact` in the TUI (alias: `/summarize`)
143
+
-**Keybinding**: press `<leader>c` in the TUI
121
144
122
-
When compaction is triggered:
145
+
## Defaults
123
146
124
-
1.**Threshold Check**: The system detects that context usage has reached the `usableWindow` limit
125
-
2.**Agent Summarization**: The full conversation history is sent to a dedicated compaction agent
126
-
3.**Structured Summary**: The agent produces a summary covering the goal, discoveries, accomplishments, and modified files
127
-
4.**Replacement**: The detailed history is replaced with the compacted summary
128
-
5.**Continuation**: You continue working with the freed-up context space
|`compaction.auto`|`true`| Automatically compact when the usable window is reached |
150
+
|`compaction.prune`|`true`| Clear old tool outputs beyond the 40K recency window |
151
+
|`compaction.reserved`|`min(20,000, model_max_output_tokens)`| Token headroom kept free for the next turn — also defines the compaction trigger point |
129
152
130
-
## Configuration Options
153
+
## Configuration
131
154
132
155
Compaction is configured in your `kilo.jsonc` file:
133
156
134
157
```jsonc
135
158
{
136
159
"compaction": {
137
160
"auto":true, // Enable or disable automatic compaction
138
-
"reserved":4096, // Number of tokens to reserve (keep free) after compaction
139
161
"prune":true, // Enable pruning of old tool outputs beyond the recency window
|`compaction.auto`| boolean |`true`| Enable or disable automatic compaction when the usable window is reached |
170
+
|`compaction.prune`| boolean |`true`| Enable pruning of old tool outputs outside the 40K token recency window |
171
+
|`compaction.reserved`| number |`min(20000, model_max_output)`| Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead. |
172
+
173
+
### Use a different model for compaction
174
+
175
+
Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:
|`KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAX`| Overrides the 32,000 default output-token ceiling |
149
196
150
197
{% /tab %}
151
198
{% tab label="VSCode (Legacy)" %}
@@ -219,17 +266,26 @@ If the condensed summary doesn't capture important details:
219
266
220
267
## Best Practices
221
268
222
-
### When to Condense
269
+
### When to Compact
223
270
224
271
-**Long sessions**: If you've been working for an extended period on a complex task
225
272
-**Before major transitions**: When switching to a different aspect of your project
226
-
-**When prompted**: When Kilo Code suggests condensing or compaction due to context limits
273
+
-**When approaching limits**: Run `/compact` manually before hitting the automatic trigger if you want control over _when_ the summary is produced
274
+
275
+
### Tuning `compaction.reserved`
276
+
277
+
On models that advertise a separate input limit, the `reserved` value is a trade-off:
278
+
279
+
-**Lower value** (e.g. `10000`) → compaction triggers later, you get more turns out of the raw window, but you risk a mid-turn context overflow if a single response is larger than the buffer.
280
+
-**Higher value** (e.g. `40000`) → compaction triggers earlier, fewer overflow errors, but shorter effective conversations between summaries.
281
+
282
+
The default of `~20K` is tuned to leave room for a full-size assistant response plus tool output. The setting has no effect on models with a single context window, which always reserve their full output cap instead.
227
283
228
284
### Maintaining Context Quality
229
285
230
286
-**Be specific in your initial task**: A clear task description helps create better summaries
231
-
-**Use AGENTS.md**: Combine with [AGENTS.md](/docs/customize/agents-md) for persistent project context that doesn't need to be condensed
232
-
-**Review the summary**: After condensing or compaction, the summary is visible in your chat history
287
+
-**Use AGENTS.md**: Combine with [AGENTS.md](/docs/customize/agents-md) for persistent project context that doesn't need to be compacted
288
+
-**Review the summary**: After compaction, the summary is visible in your chat history
0 commit comments