Skip to content

Commit c60b2a0

Browse files
anandgupta42claude
andcommitted
feat: context management improvements — overflow recovery, observation masking, token estimation
Fix critical bugs in context overflow detection and add production-hardening for long-running agent sessions: - Fix NamedError.isInstance(null) crash that would kill the agent if a provider returned a null error object - Fix isOverflow() headroom gap when limit.input is set — compaction now correctly reserves space for output tokens on models with separate input/output limits (fixes upstream bugs #10634, #8089, #11086) - Add compaction loop protection (max 3 attempts) to prevent infinite compact→overflow→compact cycles - Replace generic "[Old tool result content cleared]" with observation masks that preserve tool name, args, output size, and first-line fingerprint for better model continuity after pruning - Content-aware token estimation (code: 3.0, JSON: 3.2, SQL: 3.5, text: 4.0) replacing flat chars/4 heuristic - Add Azure OpenAI overflow detection patterns - DE-aware compaction template preserving warehouse, schema, dbt, lineage, and FinOps context during summarization - Guard against empty observation masks sending empty tool_result - Add context management documentation page 153 tests across 4 test files, 1332 total suite passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c05720a commit c60b2a0

16 files changed

Lines changed: 1491 additions & 15 deletions

File tree

docs/docs/configure/config.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Configuration is loaded from multiple sources, with later sources overriding ear
5757
| `skills` | `object` | Skill paths and URLs |
5858
| `plugin` | `string[]` | Plugin specifiers |
5959
| `instructions` | `string[]` | Glob patterns for instruction files |
60-
| `compaction` | `object` | Context compaction settings |
60+
| `compaction` | `object` | Context compaction settings (see [Context Management](context-management.md)) |
6161
| `experimental` | `object` | Experimental feature flags |
6262

6363
## Value Substitution
@@ -132,4 +132,4 @@ Control how context is managed when conversations grow long:
132132
| `reserved` || Token buffer to reserve |
133133

134134
!!! info
135-
Compaction automatically summarizes older messages to free up context window space, allowing longer conversations without losing important context.
135+
Compaction automatically summarizes older messages to free up context window space, allowing longer conversations without losing important context. See [Context Management](context-management.md) for full details.
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Context Management
2+
3+
altimate-code automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors — all without losing the important details of your work.
4+
5+
## How It Works
6+
7+
Every LLM has a finite context window. As you work, each message, tool call, and tool result adds tokens to the conversation. When the conversation approaches the model's limit, altimate-code takes action:
8+
9+
1. **Prune** — Old tool outputs (file reads, command results, query results) are replaced with compact summaries
10+
2. **Compact** — The entire conversation history is summarized into a continuation prompt
11+
3. **Continue** — The agent picks up where it left off using the summary
12+
13+
This happens automatically by default. You do not need to manually manage context.
14+
15+
## Auto-Compaction
16+
17+
When enabled (the default), altimate-code monitors token usage after each model response. If the conversation is approaching the context limit, it triggers compaction automatically.
18+
19+
During compaction:
20+
21+
- A dedicated compaction agent summarizes the full conversation
22+
- The summary captures goals, progress, discoveries, relevant files, and next steps
23+
- The original messages are retained in session history but the model continues from the summary
24+
- After compaction, the agent automatically continues working if there are clear next steps
25+
26+
You will see a compaction indicator in the TUI when this happens. The conversation continues seamlessly.
27+
28+
!!! tip
29+
If you notice compaction happening frequently, consider using a model with a larger context window or breaking your task into smaller sessions.
30+
31+
## Observation Masking (Pruning)
32+
33+
Before compaction, altimate-code prunes old tool outputs to reclaim context space. This is called "observation masking."
34+
35+
When a tool output is pruned, it is replaced with a brief fingerprint:
36+
37+
```
38+
[Tool output cleared — read_file(file: src/main.ts) returned 42 lines, 1.2 KB — "import { App } from './app'"]
39+
```
40+
41+
This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result — enough to maintain continuity without consuming tokens.
42+
43+
**Pruning rules:**
44+
45+
- Only tool outputs older than the most recent 2 turns are eligible
46+
- The most recent ~40,000 tokens of tool outputs are always preserved
47+
- Pruning only fires when at least 20,000 tokens can be reclaimed
48+
- `skill` tool outputs are never pruned (they contain critical session context)
49+
50+
## Data Engineering Context
51+
52+
Compaction is aware of data engineering workflows. When summarizing a conversation, the compaction prompt preserves:
53+
54+
- **Warehouse connections** — which databases or warehouses are connected
55+
- **Schema context** — discovered tables, columns, and relationships
56+
- **dbt project state** — models, sources, tests, and project structure
57+
- **Lineage findings** — upstream and downstream dependencies
58+
- **Query patterns** — SQL dialects, anti-patterns, and optimization opportunities
59+
- **FinOps context** — cost findings and warehouse sizing recommendations
60+
61+
This means you can run a long data exploration session and compaction will not lose track of what schemas you discovered, what dbt models you were working with, or what cost optimizations you identified.
62+
63+
## Provider Overflow Detection
64+
65+
If compaction does not trigger in time and the model returns a context overflow error, altimate-code detects it and automatically compacts the conversation.
66+
67+
Overflow detection works with all major providers:
68+
69+
| Provider | Detection |
70+
|----------|-----------|
71+
| Anthropic | "prompt is too long" |
72+
| OpenAI | "exceeds the context window" |
73+
| AWS Bedrock | "input is too long for requested model" |
74+
| Google Gemini | "input token count exceeds the maximum" |
75+
| Azure OpenAI | "the request was too long" |
76+
| Groq | "reduce the length of the messages" |
77+
| OpenRouter / DeepSeek | "maximum context length is N tokens" |
78+
| xAI (Grok) | "maximum prompt length is N" |
79+
| GitHub Copilot | "exceeds the limit of N" |
80+
| Ollama / llama.cpp / LM Studio | Various local server messages |
81+
82+
When an overflow is detected, the CLI automatically compacts and retries. No action is needed on your part.
83+
84+
!!! note
85+
Some providers (such as z.ai) may accept oversized inputs silently. For these, the automatic token-based compaction trigger is the primary safeguard.
86+
87+
## Configuration
88+
89+
Control context management behavior in `altimate-code.json`:
90+
91+
```json
92+
{
93+
"compaction": {
94+
"auto": true,
95+
"prune": true,
96+
"reserved": 4096
97+
}
98+
}
99+
```
100+
101+
| Field | Type | Default | Description |
102+
|-------|------|---------|-------------|
103+
| `auto` | `boolean` | `true` | Automatically compact when the context window is nearly full |
104+
| `prune` | `boolean` | `true` | Prune old tool outputs before compaction |
105+
| `reserved` | `number` | `20000` | Token buffer to reserve below the context limit. Increase this if you see frequent overflow errors |
106+
107+
### Disabling Auto-Compaction
108+
109+
If you prefer to manage context manually (for example, by starting new sessions), disable auto-compaction:
110+
111+
```json
112+
{
113+
"compaction": {
114+
"auto": false
115+
}
116+
}
117+
```
118+
119+
!!! warning
120+
With auto-compaction disabled, you may hit context overflow errors during long sessions. The CLI will still detect and recover from these, but the experience will be less smooth.
121+
122+
### Manual Compaction
123+
124+
You can trigger compaction at any time from the TUI by pressing `leader` + `c`, or by using the `/compact` command in conversation. This is useful when you want to create a checkpoint before switching tasks.
125+
126+
## Token Estimation
127+
128+
altimate-code uses content-aware heuristics to estimate token counts without calling a tokenizer. This keeps overhead low while maintaining accuracy.
129+
130+
The estimator detects content type and adjusts its ratio:
131+
132+
| Content Type | Characters per Token | Detection |
133+
|--------------|---------------------|-----------|
134+
| Code | ~3.0 | High density of `{}();=` characters |
135+
| JSON | ~3.2 | Starts with `{` or `[`, high density of `{}[]:,"` |
136+
| SQL | ~3.5 | Contains SQL keywords (`SELECT`, `FROM`, `JOIN`, etc.) |
137+
| Plain text | ~4.0 | Default for prose and markdown |
138+
| Mixed | ~3.7 | Fallback for content that does not match a specific type |
139+
140+
These ratios are tuned against the cl100k_base tokenizer used by Claude and GPT-4 models. The estimator samples the first 500 characters of content to classify it, so the overhead is negligible.

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ nav:
9191
- Behavior:
9292
- Rules: configure/rules.md
9393
- Permissions: configure/permissions.md
94+
- Context Management: configure/context-management.md
9495
- Formatters: configure/formatters.md
9596
- Appearance:
9697
- Themes: configure/themes.md

packages/altimate-code/src/agent/prompt/compaction.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@ Focus on information that would be helpful for continuing the conversation, incl
99
- Key user requests, constraints, or preferences that should persist
1010
- Important technical decisions and why they were made
1111

12+
For data engineering conversations, also preserve:
13+
- Warehouse connections and discovered schemas/tables
14+
- dbt project context (models, sources, tests)
15+
- Lineage findings and query patterns
16+
- SQL dialects and translation contexts
17+
- FinOps findings (costs, warehouse sizing)
18+
1219
Your summary should be comprehensive enough to provide context but concise enough to be quickly understood.
1320

1421
Do not respond to any questions in the conversation, only output the summary.

packages/altimate-code/src/provider/error.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ export namespace ProviderError {
1818
/greater than the context length/i, // LM Studio
1919
/context window exceeds limit/i, // MiniMax
2020
/exceeded model token limit/i, // Kimi For Coding, Moonshot
21+
/the request was too long/i, // Azure OpenAI
22+
/maximum tokens for requested operation/i, // Azure OpenAI
2123
/context[_ ]length[_ ]exceeded/i, // Generic fallback
2224
]
2325

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Paid Context Management Features
2+
3+
These features are planned for implementation in altimate-core (Rust) and gated behind license key verification.
4+
5+
## 1. Precise Token Counting
6+
7+
**Bridge method:** `context.count_tokens(text, model_family) -> number`
8+
9+
Uses tiktoken-rs in altimate-core for exact model-specific token counts. Replaces the heuristic estimation in `token.ts`. Supports cl100k_base (GPT-4/Claude), o200k_base (GPT-4o), and future tokenizers.
10+
11+
**Benefits:**
12+
- Eliminates 20-30% estimation error
13+
- Precise compaction triggering — no late/early compaction
14+
- Accurate token budget allocation
15+
16+
## 2. Smart Context Scoring
17+
18+
**Bridge method:** `context.score_relevance(items[], query) -> scored_items[]`
19+
20+
Embedding-based relevance scoring for context items. Used before compaction to drop lowest-scoring items first, preserving the most relevant conversation history. Uses a local embeddings model (no external API calls required).
21+
22+
**Benefits:**
23+
- Drops irrelevant context before compaction
24+
- Preserves high-value conversation segments
25+
- Reduces unnecessary compaction cycles
26+
27+
## 3. Schema Compression
28+
29+
**Bridge method:** `context.compress_schema(schema_ddl, token_budget) -> compressed_schema`
30+
31+
Schemonic-style ILP (Integer Linear Programming) optimization. Extends the existing `altimate_core_optimize_context` tool. Achieves ~2x token reduction on schema DDL without accuracy loss by intelligently abbreviating column names, removing redundant constraints, and merging similar table definitions.
32+
33+
**Benefits:**
34+
- Fits 2x more schema context in the same token budget
35+
- No accuracy loss on downstream SQL generation
36+
- Works with all warehouse dialects
37+
38+
## 4. Lineage-Aware Context Selection
39+
40+
**Bridge method:** `context.select_by_lineage(model_name, manifest, hops) -> relevant_tables[]`
41+
42+
Uses dbt DAG / lineage graph to scope relevant tables. PageRank-style relevance scoring weights tables by proximity and importance in the dependency graph. Configurable hop distance for breadth of context.
43+
44+
**Benefits:**
45+
- Only includes tables relevant to the current model/query
46+
- Reduces schema context by 60-80% for large warehouses
47+
- Leverages existing dbt manifest parsing
48+
49+
## 5. Semantic Schema Catalog
50+
51+
**Bridge method:** `context.generate_catalog(schema, sample_data) -> yaml_catalog`
52+
53+
YAML-based semantic views (similar to Snowflake Cortex Analyst). Auto-generates business descriptions, data types, and relationships from schema + sample data. Serves as a compressed, human-readable schema representation.
54+
55+
**Benefits:**
56+
- Business-friendly context for the LLM
57+
- More token-efficient than raw DDL
58+
- Auto-generates from existing schema metadata
59+
60+
## 6. Context Budget Allocator
61+
62+
**Bridge method:** `context.allocate_budget(model_limit, task_type) -> { system, schema, conversation, output }`
63+
64+
Explicit token allocation across categories. Dynamic adjustment based on task type (query writing vs. debugging vs. optimization). Prevents any single category from consuming the entire context window.
65+
66+
**Benefits:**
67+
- Prevents schema from crowding out conversation history
68+
- Task-appropriate allocation (more schema for query writing, more conversation for debugging)
69+
- Works with the compaction system to respect budgets

packages/altimate-code/src/session/compaction.ts

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,48 @@ import { ProviderTransform } from "@/provider/transform"
1818
export namespace SessionCompaction {
1919
const log = Log.create({ service: "session.compaction" })
2020

21+
function formatBytes(bytes: number): string {
22+
if (bytes < 1024) return `${bytes} B`
23+
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`
24+
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`
25+
}
26+
27+
function truncateArgs(input: Record<string, any> | null | undefined, maxLen: number): string {
28+
if (!input || typeof input !== "object") return ""
29+
let str: string
30+
try {
31+
str = Object.entries(input)
32+
.map(([k, v]) => `${k}: ${JSON.stringify(v)}`)
33+
.join(", ")
34+
} catch {
35+
return "[unserializable]"
36+
}
37+
if (str.length <= maxLen) return str
38+
// Avoid slicing mid-surrogate pair by finding a safe boundary
39+
let end = maxLen
40+
const code = str.charCodeAt(end - 1)
41+
if (code >= 0xd800 && code <= 0xdbff) end--
42+
return str.slice(0, end) + "…"
43+
}
44+
45+
export function createObservationMask(part: MessageV2.ToolPart): string {
46+
const output =
47+
(part.state.status === "completed" ? part.state.output : "") || ""
48+
const lines = output.split("\n").length
49+
const bytes = Buffer.byteLength(output, "utf8")
50+
const args = truncateArgs(
51+
part.state.status === "completed" ||
52+
part.state.status === "running" ||
53+
part.state.status === "error"
54+
? part.state.input
55+
: {},
56+
80,
57+
)
58+
const firstLine = output.split("\n")[0]?.slice(0, 80) || ""
59+
const fingerprint = firstLine ? ` — "${firstLine}"` : ""
60+
return `[Tool output cleared — ${part.tool}(${args}) returned ${lines} lines, ${formatBytes(bytes)}${fingerprint}]`
61+
}
62+
2163
export const Event = {
2264
Compacted: BusEvent.define(
2365
"session.compacted",
@@ -39,11 +81,11 @@ export namespace SessionCompaction {
3981
input.tokens.total ||
4082
input.tokens.input + input.tokens.output + input.tokens.cache.read + input.tokens.cache.write
4183

42-
const reserved =
43-
config.compaction?.reserved ?? Math.min(COMPACTION_BUFFER, ProviderTransform.maxOutputTokens(input.model))
84+
const maxOutput = ProviderTransform.maxOutputTokens(input.model)
85+
const reserved = config.compaction?.reserved ?? COMPACTION_BUFFER
4486
const usable = input.model.limit.input
45-
? input.model.limit.input - reserved
46-
: context - ProviderTransform.maxOutputTokens(input.model)
87+
? input.model.limit.input - Math.max(reserved, maxOutput)
88+
: context - maxOutput - reserved
4789
return count >= usable
4890
}
4991

@@ -90,7 +132,12 @@ export namespace SessionCompaction {
90132
if (pruned > PRUNE_MINIMUM) {
91133
for (const part of toPrune) {
92134
if (part.state.status === "completed") {
135+
const mask = createObservationMask(part)
93136
part.state.time.compacted = Date.now()
137+
part.state.metadata = {
138+
...part.state.metadata,
139+
observation_mask: mask,
140+
}
94141
await Session.updatePart(part)
95142
}
96143
}
@@ -163,6 +210,15 @@ When constructing the summary, try to stick to this template:
163210
- [What important instructions did the user give you that are relevant]
164211
- [If there is a plan or spec, include information about it so next agent can continue using it]
165212
213+
## Data Context
214+
215+
- [What warehouse(s) or database(s) are we connected to?]
216+
- [What schemas, tables, or columns were discovered or are relevant?]
217+
- [What dbt models, sources, or tests are involved?]
218+
- [Any lineage findings (upstream/downstream dependencies)?]
219+
- [Any query patterns, anti-patterns, or optimization opportunities found?]
220+
- [Skip this section entirely if the task is not data-engineering related]
221+
166222
## Discoveries
167223
168224
[What notable things were learned during this conversation that would be useful for the next agent to know when continuing the work]

packages/altimate-code/src/session/message-v2.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,10 @@ export namespace MessageV2 {
617617
if (part.type === "tool") {
618618
toolNames.add(part.tool)
619619
if (part.state.status === "completed") {
620-
const outputText = part.state.time.compacted ? "[Old tool result content cleared]" : part.state.output
620+
const mask = part.state.metadata?.observation_mask
621+
const outputText = part.state.time.compacted
622+
? (typeof mask === "string" && mask.length > 0 ? mask : "[Old tool result content cleared]")
623+
: part.state.output
621624
const attachments = part.state.time.compacted ? [] : (part.state.attachments ?? [])
622625

623626
// For providers that don't support media in tool results, extract media files

packages/altimate-code/src/session/processor.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,10 @@ export namespace SessionProcessor {
354354
})
355355
const error = MessageV2.fromError(e, { providerID: input.model.providerID })
356356
if (MessageV2.ContextOverflowError.isInstance(error)) {
357-
// TODO: Handle context overflow error
357+
log.info("context overflow detected, triggering compaction")
358+
needsCompaction = true
359+
// TODO: Telemetry — emit context_overflow_recovered event
360+
break
358361
}
359362
const retry = SessionRetry.retryable(error)
360363
if (retry !== undefined) {

0 commit comments

Comments
 (0)