-
Notifications
You must be signed in to change notification settings - Fork 57
feat: context management improvements — overflow recovery, observation masking, token estimation #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: context management improvements — overflow recovery, observation masking, token estimation #35
Changes from 1 commit
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
64c004a
feat: context management improvements — overflow recovery, observatio…
anandgupta42 5206379
fix: reset compactionAttempts counter after successful processing step
anandgupta42 599537e
test: add compaction loop protection tests
anandgupta42 5aab4bc
fix: remove double-deduction in isOverflow for non-limit.input models
anandgupta42 3df53b5
fix: address multi-model code review findings on PR #35
anandgupta42 5a46b9f
docs: update context management docs for review findings
anandgupta42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # Context Management | ||
|
|
||
| altimate-code automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors — all without losing the important details of your work. | ||
|
|
||
| ## How It Works | ||
|
|
||
| Every LLM has a finite context window. As you work, each message, tool call, and tool result adds tokens to the conversation. When the conversation approaches the model's limit, altimate-code takes action: | ||
|
|
||
| 1. **Prune** — Old tool outputs (file reads, command results, query results) are replaced with compact summaries | ||
| 2. **Compact** — The entire conversation history is summarized into a continuation prompt | ||
| 3. **Continue** — The agent picks up where it left off using the summary | ||
|
|
||
| This happens automatically by default. You do not need to manually manage context. | ||
|
|
||
| ## Auto-Compaction | ||
|
|
||
| When enabled (the default), altimate-code monitors token usage after each model response. If the conversation is approaching the context limit, it triggers compaction automatically. | ||
|
|
||
| During compaction: | ||
|
|
||
| - A dedicated compaction agent summarizes the full conversation | ||
| - The summary captures goals, progress, discoveries, relevant files, and next steps | ||
| - The original messages are retained in session history but the model continues from the summary | ||
| - After compaction, the agent automatically continues working if there are clear next steps | ||
|
|
||
| You will see a compaction indicator in the TUI when this happens. The conversation continues seamlessly. | ||
|
|
||
| !!! tip | ||
| If you notice compaction happening frequently, consider using a model with a larger context window or breaking your task into smaller sessions. | ||
|
|
||
| ## Observation Masking (Pruning) | ||
|
|
||
| Before compaction, altimate-code prunes old tool outputs to reclaim context space. This is called "observation masking." | ||
|
|
||
| When a tool output is pruned, it is replaced with a brief fingerprint: | ||
|
|
||
| ``` | ||
| [Tool output cleared — read_file(file: src/main.ts) returned 42 lines, 1.2 KB — "import { App } from './app'"] | ||
| ``` | ||
|
|
||
| This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result — enough to maintain continuity without consuming tokens. | ||
|
|
||
| **Pruning rules:** | ||
|
|
||
| - Only tool outputs older than the most recent 2 turns are eligible | ||
| - The most recent ~40,000 tokens of tool outputs are always preserved | ||
| - Pruning only fires when at least 20,000 tokens can be reclaimed | ||
| - `skill` tool outputs are never pruned (they contain critical session context) | ||
|
|
||
| ## Data Engineering Context | ||
|
|
||
| Compaction is aware of data engineering workflows. When summarizing a conversation, the compaction prompt preserves: | ||
|
|
||
| - **Warehouse connections** — which databases or warehouses are connected | ||
| - **Schema context** — discovered tables, columns, and relationships | ||
| - **dbt project state** — models, sources, tests, and project structure | ||
| - **Lineage findings** — upstream and downstream dependencies | ||
| - **Query patterns** — SQL dialects, anti-patterns, and optimization opportunities | ||
| - **FinOps context** — cost findings and warehouse sizing recommendations | ||
|
|
||
| This means you can run a long data exploration session and compaction will not lose track of what schemas you discovered, what dbt models you were working with, or what cost optimizations you identified. | ||
|
|
||
| ## Provider Overflow Detection | ||
|
|
||
| If compaction does not trigger in time and the model returns a context overflow error, altimate-code detects it and automatically compacts the conversation. | ||
|
|
||
| Overflow detection works with all major providers: | ||
|
|
||
| | Provider | Detection | | ||
| |----------|-----------| | ||
| | Anthropic | "prompt is too long" | | ||
| | OpenAI | "exceeds the context window" | | ||
| | AWS Bedrock | "input is too long for requested model" | | ||
| | Google Gemini | "input token count exceeds the maximum" | | ||
| | Azure OpenAI | "the request was too long" | | ||
| | Groq | "reduce the length of the messages" | | ||
| | OpenRouter / DeepSeek | "maximum context length is N tokens" | | ||
| | xAI (Grok) | "maximum prompt length is N" | | ||
| | GitHub Copilot | "exceeds the limit of N" | | ||
| | Ollama / llama.cpp / LM Studio | Various local server messages | | ||
|
|
||
| When an overflow is detected, the CLI automatically compacts and retries. No action is needed on your part. | ||
|
|
||
| !!! note | ||
| Some providers (such as z.ai) may accept oversized inputs silently. For these, the automatic token-based compaction trigger is the primary safeguard. | ||
|
|
||
| ## Configuration | ||
|
|
||
| Control context management behavior in `altimate-code.json`: | ||
|
|
||
| ```json | ||
| { | ||
| "compaction": { | ||
| "auto": true, | ||
| "prune": true, | ||
| "reserved": 4096 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| | Field | Type | Default | Description | | ||
| |-------|------|---------|-------------| | ||
| | `auto` | `boolean` | `true` | Automatically compact when the context window is nearly full | | ||
| | `prune` | `boolean` | `true` | Prune old tool outputs before compaction | | ||
| | `reserved` | `number` | `20000` | Token buffer to reserve below the context limit. Increase this if you see frequent overflow errors | | ||
|
|
||
| ### Disabling Auto-Compaction | ||
|
|
||
| If you prefer to manage context manually (for example, by starting new sessions), disable auto-compaction: | ||
|
|
||
| ```json | ||
| { | ||
| "compaction": { | ||
| "auto": false | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| !!! warning | ||
| With auto-compaction disabled, you may hit context overflow errors during long sessions. The CLI will still detect and recover from these, but the experience will be less smooth. | ||
|
|
||
| ### Manual Compaction | ||
|
|
||
| You can trigger compaction at any time from the TUI by pressing `leader` + `c`, or by using the `/compact` command in conversation. This is useful when you want to create a checkpoint before switching tasks. | ||
|
|
||
| ## Token Estimation | ||
|
|
||
| altimate-code uses content-aware heuristics to estimate token counts without calling a tokenizer. This keeps overhead low while maintaining accuracy. | ||
|
|
||
| The estimator detects content type and adjusts its ratio: | ||
|
|
||
| | Content Type | Characters per Token | Detection | | ||
| |--------------|---------------------|-----------| | ||
| | Code | ~3.0 | High density of `{}();=` characters | | ||
| | JSON | ~3.2 | Starts with `{` or `[`, high density of `{}[]:,"` | | ||
| | SQL | ~3.5 | Contains SQL keywords (`SELECT`, `FROM`, `JOIN`, etc.) | | ||
| | Plain text | ~4.0 | Default for prose and markdown | | ||
| | Mixed | ~3.7 | Fallback for content that does not match a specific type | | ||
|
|
||
| These ratios are tuned against the cl100k_base tokenizer used by Claude and GPT-4 models. The estimator samples the first 500 characters of content to classify it, so the overhead is negligible. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 69 additions & 0 deletions
69
packages/altimate-code/src/session/PAID_CONTEXT_FEATURES.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # Paid Context Management Features | ||
|
|
||
| These features are planned for implementation in altimate-core (Rust) and gated behind license key verification. | ||
|
|
||
| ## 1. Precise Token Counting | ||
|
|
||
| **Bridge method:** `context.count_tokens(text, model_family) -> number` | ||
|
|
||
| Uses tiktoken-rs in altimate-core for exact model-specific token counts. Replaces the heuristic estimation in `token.ts`. Supports cl100k_base (GPT-4/Claude), o200k_base (GPT-4o), and future tokenizers. | ||
|
|
||
| **Benefits:** | ||
| - Eliminates 20-30% estimation error | ||
| - Precise compaction triggering — no late/early compaction | ||
| - Accurate token budget allocation | ||
|
|
||
| ## 2. Smart Context Scoring | ||
|
|
||
| **Bridge method:** `context.score_relevance(items[], query) -> scored_items[]` | ||
|
|
||
| Embedding-based relevance scoring for context items. Used before compaction to drop lowest-scoring items first, preserving the most relevant conversation history. Uses a local embeddings model (no external API calls required). | ||
|
|
||
| **Benefits:** | ||
| - Drops irrelevant context before compaction | ||
| - Preserves high-value conversation segments | ||
| - Reduces unnecessary compaction cycles | ||
|
|
||
| ## 3. Schema Compression | ||
|
|
||
| **Bridge method:** `context.compress_schema(schema_ddl, token_budget) -> compressed_schema` | ||
|
|
||
| Schemonic-style ILP (Integer Linear Programming) optimization. Extends the existing `altimate_core_optimize_context` tool. Achieves ~2x token reduction on schema DDL without accuracy loss by intelligently abbreviating column names, removing redundant constraints, and merging similar table definitions. | ||
|
|
||
| **Benefits:** | ||
| - Fits 2x more schema context in the same token budget | ||
| - No accuracy loss on downstream SQL generation | ||
| - Works with all warehouse dialects | ||
|
|
||
| ## 4. Lineage-Aware Context Selection | ||
|
|
||
| **Bridge method:** `context.select_by_lineage(model_name, manifest, hops) -> relevant_tables[]` | ||
|
|
||
| Uses dbt DAG / lineage graph to scope relevant tables. PageRank-style relevance scoring weights tables by proximity and importance in the dependency graph. Configurable hop distance for breadth of context. | ||
|
|
||
| **Benefits:** | ||
| - Only includes tables relevant to the current model/query | ||
| - Reduces schema context by 60-80% for large warehouses | ||
| - Leverages existing dbt manifest parsing | ||
|
|
||
| ## 5. Semantic Schema Catalog | ||
|
|
||
| **Bridge method:** `context.generate_catalog(schema, sample_data) -> yaml_catalog` | ||
|
|
||
| YAML-based semantic views (similar to Snowflake Cortex Analyst). Auto-generates business descriptions, data types, and relationships from schema + sample data. Serves as a compressed, human-readable schema representation. | ||
|
|
||
| **Benefits:** | ||
| - Business-friendly context for the LLM | ||
| - More token-efficient than raw DDL | ||
| - Auto-generates from existing schema metadata | ||
|
|
||
| ## 6. Context Budget Allocator | ||
|
|
||
| **Bridge method:** `context.allocate_budget(model_limit, task_type) -> { system, schema, conversation, output }` | ||
|
|
||
| Explicit token allocation across categories. Dynamic adjustment based on task type (query writing vs. debugging vs. optimization). Prevents any single category from consuming the entire context window. | ||
|
|
||
| **Benefits:** | ||
| - Prevents schema from crowding out conversation history | ||
| - Task-appropriate allocation (more schema for query writing, more conversation for debugging) | ||
| - Works with the compaction system to respect budgets |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This comment was marked as outdated.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid finding — fixed. The non-
limit.inputpath was incorrectly subtracting bothmaxOutputandreserved(double-deduction of 20K tokens). Simplified both paths to use a singleheadroom = Math.max(reserved, maxOutput)deduction, which matches the original behavior for default configs (maxOutputtypically dominates at 32K) while still respecting customreservedconfig when set higher.