|
| 1 | +# Context Management |
| 2 | + |
| 3 | +altimate-code automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors — all without losing the important details of your work. |
| 4 | + |
| 5 | +## How It Works |
| 6 | + |
| 7 | +Every LLM has a finite context window. As you work, each message, tool call, and tool result adds tokens to the conversation. When the conversation approaches the model's limit, altimate-code takes action: |
| 8 | + |
| 9 | +1. **Prune** — Old tool outputs (file reads, command results, query results) are replaced with compact summaries |
| 10 | +2. **Compact** — The entire conversation history is summarized into a continuation prompt |
| 11 | +3. **Continue** — The agent picks up where it left off using the summary |
| 12 | + |
| 13 | +This happens automatically by default. You do not need to manually manage context. |
| 14 | + |
| 15 | +## Auto-Compaction |
| 16 | + |
| 17 | +When enabled (the default), altimate-code monitors token usage after each model response. If the conversation is approaching the context limit, it triggers compaction automatically. |
| 18 | + |
| 19 | +During compaction: |
| 20 | + |
| 21 | +- A dedicated compaction agent summarizes the full conversation |
| 22 | +- The summary captures goals, progress, discoveries, relevant files, and next steps |
| 23 | +- The original messages are retained in session history but the model continues from the summary |
| 24 | +- After compaction, the agent automatically continues working if there are clear next steps |
| 25 | + |
| 26 | +You will see a compaction indicator in the TUI when this happens. The conversation continues seamlessly. |
| 27 | + |
| 28 | +!!! tip |
| 29 | + If you notice compaction happening frequently, consider using a model with a larger context window or breaking your task into smaller sessions. |
| 30 | + |
| 31 | +## Observation Masking (Pruning) |
| 32 | + |
| 33 | +Before compaction, altimate-code prunes old tool outputs to reclaim context space. This is called "observation masking." |
| 34 | + |
| 35 | +When a tool output is pruned, it is replaced with a brief fingerprint: |
| 36 | + |
| 37 | +``` |
| 38 | +[Tool output cleared — read_file(file: src/main.ts) returned 42 lines, 1.2 KB — "import { App } from './app'"] |
| 39 | +``` |
| 40 | + |
| 41 | +This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result — enough to maintain continuity without consuming tokens. |
| 42 | + |
| 43 | +**Pruning rules:** |
| 44 | + |
| 45 | +- Only tool outputs older than the most recent 2 turns are eligible |
| 46 | +- The most recent ~40,000 tokens of tool outputs are always preserved |
| 47 | +- Pruning only fires when at least 20,000 tokens can be reclaimed |
| 48 | +- `skill` tool outputs are never pruned (they contain critical session context) |
| 49 | + |
| 50 | +## Data Engineering Context |
| 51 | + |
| 52 | +Compaction is aware of data engineering workflows. When summarizing a conversation, the compaction prompt preserves: |
| 53 | + |
| 54 | +- **Warehouse connections** — which databases or warehouses are connected |
| 55 | +- **Schema context** — discovered tables, columns, and relationships |
| 56 | +- **dbt project state** — models, sources, tests, and project structure |
| 57 | +- **Lineage findings** — upstream and downstream dependencies |
| 58 | +- **Query patterns** — SQL dialects, anti-patterns, and optimization opportunities |
| 59 | +- **FinOps context** — cost findings and warehouse sizing recommendations |
| 60 | + |
| 61 | +This means you can run a long data exploration session and compaction will not lose track of what schemas you discovered, what dbt models you were working with, or what cost optimizations you identified. |
| 62 | + |
| 63 | +## Provider Overflow Detection |
| 64 | + |
| 65 | +If compaction does not trigger in time and the model returns a context overflow error, altimate-code detects it and automatically compacts the conversation. |
| 66 | + |
| 67 | +Overflow detection works with all major providers: |
| 68 | + |
| 69 | +| Provider | Detection | |
| 70 | +|----------|-----------| |
| 71 | +| Anthropic | "prompt is too long" | |
| 72 | +| OpenAI | "exceeds the context window" | |
| 73 | +| AWS Bedrock | "input is too long for requested model" | |
| 74 | +| Google Gemini | "input token count exceeds the maximum" | |
| 75 | +| Azure OpenAI | "the request was too long" | |
| 76 | +| Groq | "reduce the length of the messages" | |
| 77 | +| OpenRouter / DeepSeek | "maximum context length is N tokens" | |
| 78 | +| xAI (Grok) | "maximum prompt length is N" | |
| 79 | +| GitHub Copilot | "exceeds the limit of N" | |
| 80 | +| Ollama / llama.cpp / LM Studio | Various local server messages | |
| 81 | + |
| 82 | +When an overflow is detected, the CLI automatically compacts and retries. No action is needed on your part. |
| 83 | + |
| 84 | +### Loop Protection |
| 85 | + |
| 86 | +If compaction fails to reduce context sufficiently and overflow keeps recurring, altimate-code stops after 3 consecutive compaction attempts within the same turn. You will see a message asking you to start a new conversation. The counter resets after each successful processing step, so compactions spread across different turns do not count against the limit. |
| 87 | + |
| 88 | +!!! note |
| 89 | + Some providers (such as z.ai) may accept oversized inputs silently. For these, the automatic token-based compaction trigger is the primary safeguard. |
| 90 | + |
| 91 | +## Configuration |
| 92 | + |
| 93 | +Control context management behavior in `altimate-code.json`: |
| 94 | + |
| 95 | +```json |
| 96 | +{ |
| 97 | + "compaction": { |
| 98 | + "auto": true, |
| 99 | + "prune": true, |
| 100 | + "reserved": 20000 |
| 101 | + } |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +| Field | Type | Default | Description | |
| 106 | +|-------|------|---------|-------------| |
| 107 | +| `auto` | `boolean` | `true` | Automatically compact when the context window is nearly full | |
| 108 | +| `prune` | `boolean` | `true` | Prune old tool outputs before compaction | |
| 109 | +| `reserved` | `number` | `20000` | Token buffer to reserve below the context limit. The actual headroom is `max(reserved, model_max_output)`, so this value only takes effect when it exceeds the model's output token limit. Increase if you see frequent overflow errors | |
| 110 | + |
| 111 | +### Disabling Auto-Compaction |
| 112 | + |
| 113 | +If you prefer to manage context manually (for example, by starting new sessions), disable auto-compaction: |
| 114 | + |
| 115 | +```json |
| 116 | +{ |
| 117 | + "compaction": { |
| 118 | + "auto": false |
| 119 | + } |
| 120 | +} |
| 121 | +``` |
| 122 | + |
| 123 | +!!! warning |
| 124 | + With auto-compaction disabled, you may hit context overflow errors during long sessions. The CLI will still detect and recover from these, but the experience will be less smooth. |
| 125 | + |
| 126 | +### Manual Compaction |
| 127 | + |
| 128 | +You can trigger compaction at any time from the TUI by pressing `leader` + `c`, or by using the `/compact` command in conversation. This is useful when you want to create a checkpoint before switching tasks. |
| 129 | + |
| 130 | +## Token Estimation |
| 131 | + |
| 132 | +altimate-code uses content-aware heuristics to estimate token counts without calling a tokenizer. This keeps overhead low while maintaining accuracy. |
| 133 | + |
| 134 | +The estimator detects content type and adjusts its ratio: |
| 135 | + |
| 136 | +| Content Type | Characters per Token | Detection | |
| 137 | +|--------------|---------------------|-----------| |
| 138 | +| Code | ~3.0 | High density of `{}();=` characters | |
| 139 | +| JSON | ~3.2 | Starts with `{` or `[`, high density of `{}[]:,"` | |
| 140 | +| SQL | ~3.5 | Contains SQL keywords (`SELECT`, `FROM`, `JOIN`, etc.) | |
| 141 | +| Plain text | ~4.0 | Default for prose and markdown | |
| 142 | +| Mixed | ~3.7 | Fallback for content that does not match a specific type | |
| 143 | + |
| 144 | +These ratios are tuned against the cl100k_base tokenizer used by Claude and GPT-4 models. The estimator samples the first 500 characters of content to classify it, so the overhead is negligible. |
| 145 | + |
| 146 | +!!! note "Limitations" |
| 147 | + The heuristic uses JavaScript string length (UTF-16 code units), which over-estimates tokens for emoji (2 code units but ~1-2 tokens) and CJK characters. For precise token counting, a future update will integrate a native tokenizer. |
0 commit comments