Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/docs/configure/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Configuration is loaded from multiple sources, with later sources overriding ear
| `skills` | `object` | Skill paths and URLs |
| `plugin` | `string[]` | Plugin specifiers |
| `instructions` | `string[]` | Glob patterns for instruction files |
| `compaction` | `object` | Context compaction settings |
| `compaction` | `object` | Context compaction settings (see [Context Management](context-management.md)) |
| `experimental` | `object` | Experimental feature flags |

## Value Substitution
Expand Down Expand Up @@ -132,4 +132,4 @@ Control how context is managed when conversations grow long:
| `reserved` | — | Token buffer to reserve |

!!! info
Compaction automatically summarizes older messages to free up context window space, allowing longer conversations without losing important context.
Compaction automatically summarizes older messages to free up context window space, allowing longer conversations without losing important context. See [Context Management](context-management.md) for full details.
147 changes: 147 additions & 0 deletions docs/docs/configure/context-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Context Management

altimate-code automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors — all without losing the important details of your work.

## How It Works

Every LLM has a finite context window. As you work, each message, tool call, and tool result adds tokens to the conversation. When the conversation approaches the model's limit, altimate-code takes action:

1. **Prune** — Old tool outputs (file reads, command results, query results) are replaced with compact summaries
2. **Compact** — The entire conversation history is summarized into a continuation prompt
3. **Continue** — The agent picks up where it left off using the summary

This happens automatically by default. You do not need to manually manage context.

## Auto-Compaction

When enabled (the default), altimate-code monitors token usage after each model response. If the conversation is approaching the context limit, it triggers compaction automatically.

During compaction:

- A dedicated compaction agent summarizes the full conversation
- The summary captures goals, progress, discoveries, relevant files, and next steps
- The original messages are retained in session history but the model continues from the summary
- After compaction, the agent automatically continues working if there are clear next steps

You will see a compaction indicator in the TUI when this happens. The conversation continues seamlessly.

!!! tip
If you notice compaction happening frequently, consider using a model with a larger context window or breaking your task into smaller sessions.

## Observation Masking (Pruning)

Before compaction, altimate-code prunes old tool outputs to reclaim context space. This is called "observation masking."

When a tool output is pruned, it is replaced with a brief fingerprint:

```
[Tool output cleared — read_file(file: src/main.ts) returned 42 lines, 1.2 KB — "import { App } from './app'"]
```

This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result — enough to maintain continuity without consuming tokens.

**Pruning rules:**

- Only tool outputs older than the most recent 2 turns are eligible
- The most recent ~40,000 tokens of tool outputs are always preserved
- Pruning only fires when at least 20,000 tokens can be reclaimed
- `skill` tool outputs are never pruned (they contain critical session context)

## Data Engineering Context

Compaction is aware of data engineering workflows. When summarizing a conversation, the compaction prompt preserves:

- **Warehouse connections** — which databases or warehouses are connected
- **Schema context** — discovered tables, columns, and relationships
- **dbt project state** — models, sources, tests, and project structure
- **Lineage findings** — upstream and downstream dependencies
- **Query patterns** — SQL dialects, anti-patterns, and optimization opportunities
- **FinOps context** — cost findings and warehouse sizing recommendations

This means you can run a long data exploration session and compaction will not lose track of what schemas you discovered, what dbt models you were working with, or what cost optimizations you identified.

## Provider Overflow Detection

If compaction does not trigger in time and the model returns a context overflow error, altimate-code detects it and automatically compacts the conversation.

Overflow detection works with all major providers:

| Provider | Detection |
|----------|-----------|
| Anthropic | "prompt is too long" |
| OpenAI | "exceeds the context window" |
| AWS Bedrock | "input is too long for requested model" |
| Google Gemini | "input token count exceeds the maximum" |
| Azure OpenAI | "the request was too long" |
| Groq | "reduce the length of the messages" |
| OpenRouter / DeepSeek | "maximum context length is N tokens" |
| xAI (Grok) | "maximum prompt length is N" |
| GitHub Copilot | "exceeds the limit of N" |
| Ollama / llama.cpp / LM Studio | Various local server messages |

When an overflow is detected, the CLI automatically compacts and retries. No action is needed on your part.

### Loop Protection

If compaction fails to reduce context sufficiently and overflow keeps recurring, altimate-code stops after 3 consecutive compaction attempts within the same turn. You will see a message asking you to start a new conversation. The counter resets after each successful processing step, so compactions spread across different turns do not count against the limit.

!!! note
Some providers (such as z.ai) may accept oversized inputs silently. For these, the automatic token-based compaction trigger is the primary safeguard.

## Configuration

Control context management behavior in `altimate-code.json`:

```json
{
"compaction": {
"auto": true,
"prune": true,
"reserved": 20000
}
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `auto` | `boolean` | `true` | Automatically compact when the context window is nearly full |
| `prune` | `boolean` | `true` | Prune old tool outputs before compaction |
| `reserved` | `number` | `20000` | Token buffer to reserve below the context limit. The actual headroom is `max(reserved, model_max_output)`, so this value only takes effect when it exceeds the model's output token limit. Increase if you see frequent overflow errors |

### Disabling Auto-Compaction

If you prefer to manage context manually (for example, by starting new sessions), disable auto-compaction:

```json
{
"compaction": {
"auto": false
}
}
```

!!! warning
With auto-compaction disabled, you may hit context overflow errors during long sessions. The CLI will still detect and recover from these, but the experience will be less smooth.

### Manual Compaction

You can trigger compaction at any time from the TUI by pressing `leader` + `c`, or by using the `/compact` command in conversation. This is useful when you want to create a checkpoint before switching tasks.

## Token Estimation

altimate-code uses content-aware heuristics to estimate token counts without calling a tokenizer. This keeps overhead low while maintaining accuracy.

The estimator detects content type and adjusts its ratio:

| Content Type | Characters per Token | Detection |
|--------------|---------------------|-----------|
| Code | ~3.0 | High density of `{}();=` characters |
| JSON | ~3.2 | Starts with `{` or `[`, high density of `{}[]:,"` |
| SQL | ~3.5 | Contains SQL keywords (`SELECT`, `FROM`, `JOIN`, etc.) |
| Plain text | ~4.0 | Default for prose and markdown |
| Mixed | ~3.7 | Fallback for content that does not match a specific type |

These ratios are tuned against the cl100k_base tokenizer used by Claude and GPT-4 models. The estimator samples the first 500 characters of content to classify it, so the overhead is negligible.

!!! note "Limitations"
The heuristic uses JavaScript string length (UTF-16 code units), which over-estimates tokens for emoji (2 code units but ~1-2 tokens) and CJK characters. For precise token counting, a future update will integrate a native tokenizer.
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ nav:
- Behavior:
- Rules: configure/rules.md
- Permissions: configure/permissions.md
- Context Management: configure/context-management.md
- Formatters: configure/formatters.md
- Appearance:
- Themes: configure/themes.md
Expand Down
7 changes: 7 additions & 0 deletions packages/altimate-code/src/agent/prompt/compaction.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ Focus on information that would be helpful for continuing the conversation, incl
- Key user requests, constraints, or preferences that should persist
- Important technical decisions and why they were made

For data engineering conversations, also preserve:
- Warehouse connections and discovered schemas/tables
- dbt project context (models, sources, tests)
- Lineage findings and query patterns
- SQL dialects and translation contexts
- FinOps findings (costs, warehouse sizing)

Your summary should be comprehensive enough to provide context but concise enough to be quickly understood.

Do not respond to any questions in the conversation, only output the summary.
2 changes: 2 additions & 0 deletions packages/altimate-code/src/provider/error.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ export namespace ProviderError {
/greater than the context length/i, // LM Studio
/context window exceeds limit/i, // MiniMax
/exceeded model token limit/i, // Kimi For Coding, Moonshot
/the request was too long/i, // Azure OpenAI
/maximum tokens for requested operation/i, // Azure OpenAI
/context[_ ]length[_ ]exceeded/i, // Generic fallback
]

Expand Down
69 changes: 69 additions & 0 deletions packages/altimate-code/src/session/PAID_CONTEXT_FEATURES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Paid Context Management Features

These features are planned for implementation in altimate-core (Rust) and gated behind license key verification.

## 1. Precise Token Counting

**Bridge method:** `context.count_tokens(text, model_family) -> number`

Uses tiktoken-rs in altimate-core for exact model-specific token counts. Replaces the heuristic estimation in `token.ts`. Supports cl100k_base (GPT-4/Claude), o200k_base (GPT-4o), and future tokenizers.

**Benefits:**
- Eliminates 20-30% estimation error
- Precise compaction triggering — no late/early compaction
- Accurate token budget allocation

## 2. Smart Context Scoring

**Bridge method:** `context.score_relevance(items[], query) -> scored_items[]`

Embedding-based relevance scoring for context items. Used before compaction to drop lowest-scoring items first, preserving the most relevant conversation history. Uses a local embeddings model (no external API calls required).

**Benefits:**
- Drops irrelevant context before compaction
- Preserves high-value conversation segments
- Reduces unnecessary compaction cycles

## 3. Schema Compression

**Bridge method:** `context.compress_schema(schema_ddl, token_budget) -> compressed_schema`

Schemonic-style ILP (Integer Linear Programming) optimization. Extends the existing `altimate_core_optimize_context` tool. Achieves ~2x token reduction on schema DDL without accuracy loss by intelligently abbreviating column names, removing redundant constraints, and merging similar table definitions.

**Benefits:**
- Fits 2x more schema context in the same token budget
- No accuracy loss on downstream SQL generation
- Works with all warehouse dialects

## 4. Lineage-Aware Context Selection

**Bridge method:** `context.select_by_lineage(model_name, manifest, hops) -> relevant_tables[]`

Uses dbt DAG / lineage graph to scope relevant tables. PageRank-style relevance scoring weights tables by proximity and importance in the dependency graph. Configurable hop distance for breadth of context.

**Benefits:**
- Only includes tables relevant to the current model/query
- Reduces schema context by 60-80% for large warehouses
- Leverages existing dbt manifest parsing

## 5. Semantic Schema Catalog

**Bridge method:** `context.generate_catalog(schema, sample_data) -> yaml_catalog`

YAML-based semantic views (similar to Snowflake Cortex Analyst). Auto-generates business descriptions, data types, and relationships from schema + sample data. Serves as a compressed, human-readable schema representation.

**Benefits:**
- Business-friendly context for the LLM
- More token-efficient than raw DDL
- Auto-generates from existing schema metadata

## 6. Context Budget Allocator

**Bridge method:** `context.allocate_budget(model_limit, task_type) -> { system, schema, conversation, output }`

Explicit token allocation across categories. Dynamic adjustment based on task type (query writing vs. debugging vs. optimization). Prevents any single category from consuming the entire context window.

**Benefits:**
- Prevents schema from crowding out conversation history
- Task-appropriate allocation (more schema for query writing, more conversation for debugging)
- Works with the compaction system to respect budgets
76 changes: 70 additions & 6 deletions packages/altimate-code/src/session/compaction.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,53 @@ import { Agent } from "@/agent/agent"
import { Plugin } from "@/plugin"
import { Config } from "@/config/config"
import { ProviderTransform } from "@/provider/transform"
import { Telemetry } from "@/telemetry"

export namespace SessionCompaction {
const log = Log.create({ service: "session.compaction" })

function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`
}

function truncateArgs(input: Record<string, any> | null | undefined, maxLen: number): string {
if (!input || typeof input !== "object") return ""
let str: string
try {
str = Object.entries(input)
.map(([k, v]) => `${k}: ${JSON.stringify(v)}`)
.join(", ")
} catch {
return "[unserializable]"
}
if (str.length <= maxLen) return str
// Avoid slicing mid-surrogate pair by finding a safe boundary
let end = maxLen
const code = str.charCodeAt(end - 1)
if (code >= 0xd800 && code <= 0xdbff) end--
return str.slice(0, end) + "…"
}

export function createObservationMask(part: MessageV2.ToolPart): string {
const output =
(part.state.status === "completed" ? part.state.output : "") || ""
const lines = output.split("\n").length
const bytes = Buffer.byteLength(output, "utf8")
const args = truncateArgs(
part.state.status === "completed" ||
part.state.status === "running" ||
part.state.status === "error"
? part.state.input
: {},
80,
)
const firstLine = output.split("\n")[0]?.slice(0, 80) || ""
const fingerprint = firstLine ? ` — "${firstLine}"` : ""
return `[Tool output cleared — ${part.tool}(${args}) returned ${lines} lines, ${formatBytes(bytes)}${fingerprint}]`
}

export const Event = {
Compacted: BusEvent.define(
"session.compacted",
Expand All @@ -39,12 +82,12 @@ export namespace SessionCompaction {
input.tokens.total ||
input.tokens.input + input.tokens.output + input.tokens.cache.read + input.tokens.cache.write

const reserved =
config.compaction?.reserved ?? Math.min(COMPACTION_BUFFER, ProviderTransform.maxOutputTokens(input.model))
const usable = input.model.limit.input
? input.model.limit.input - reserved
: context - ProviderTransform.maxOutputTokens(input.model)
return count >= usable
const maxOutput = ProviderTransform.maxOutputTokens(input.model)
const reserved = config.compaction?.reserved ?? COMPACTION_BUFFER
const headroom = Math.max(reserved, maxOutput)
const base = input.model.limit.input ?? context
if (base <= headroom) return false
return count >= base - headroom
}

export const PRUNE_MINIMUM = 20_000
Expand Down Expand Up @@ -90,11 +133,23 @@ export namespace SessionCompaction {
if (pruned > PRUNE_MINIMUM) {
for (const part of toPrune) {
if (part.state.status === "completed") {
const mask = createObservationMask(part)
part.state.time.compacted = Date.now()
part.state.metadata = {
...part.state.metadata,
observation_mask: mask,
}
await Session.updatePart(part)
}
}
log.info("pruned", { count: toPrune.length })
Telemetry.track({
type: "tool_outputs_pruned",
timestamp: Date.now(),
session_id: input.sessionID,
count: toPrune.length,
tokens_pruned: pruned,
})
}
}

Expand Down Expand Up @@ -163,6 +218,15 @@ When constructing the summary, try to stick to this template:
- [What important instructions did the user give you that are relevant]
- [If there is a plan or spec, include information about it so next agent can continue using it]

## Data Context

- [What warehouse(s) or database(s) are we connected to?]
- [What schemas, tables, or columns were discovered or are relevant?]
- [What dbt models, sources, or tests are involved?]
- [Any lineage findings (upstream/downstream dependencies)?]
- [Any query patterns, anti-patterns, or optimization opportunities found?]
- [Skip this section entirely if the task is not data-engineering related]

## Discoveries

[What notable things were learned during this conversation that would be useful for the next agent to know when continuing the work]
Expand Down
5 changes: 4 additions & 1 deletion packages/altimate-code/src/session/message-v2.ts
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,10 @@ export namespace MessageV2 {
if (part.type === "tool") {
toolNames.add(part.tool)
if (part.state.status === "completed") {
const outputText = part.state.time.compacted ? "[Old tool result content cleared]" : part.state.output
const mask = part.state.metadata?.observation_mask
const outputText = part.state.time.compacted
? (typeof mask === "string" && mask.length > 0 ? mask : "[Old tool result content cleared]")
: part.state.output
const attachments = part.state.time.compacted ? [] : (part.state.attachments ?? [])

// For providers that don't support media in tool results, extract media files
Expand Down
Loading