feat: context management improvements — overflow recovery, observation masking, token estimation#35
Conversation
| let structuredOutput: unknown | undefined | ||
|
|
||
| let step = 0 | ||
| let compactionAttempts = 0 | ||
| const MAX_COMPACTION_ATTEMPTS = 3 | ||
| const session = await Session.get(sessionID) | ||
| while (true) { | ||
| SessionStatus.set(sessionID, { type: "busy" }) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Fixed. The compactionAttempts counter now resets to 0 on result === "continue". This preserves loop protection within a single turn while preventing accumulation across unrelated turns.
…n masking, token estimation Fix critical bugs in context overflow detection and add production-hardening for long-running agent sessions: - Fix NamedError.isInstance(null) crash that would kill the agent if a provider returned a null error object - Fix isOverflow() headroom gap when limit.input is set — compaction now correctly reserves space for output tokens on models with separate input/output limits (fixes upstream bugs #10634, #8089, #11086) - Add compaction loop protection (max 3 attempts) to prevent infinite compact→overflow→compact cycles - Replace generic "[Old tool result content cleared]" with observation masks that preserve tool name, args, output size, and first-line fingerprint for better model continuity after pruning - Content-aware token estimation (code: 3.0, JSON: 3.2, SQL: 3.5, text: 4.0) replacing flat chars/4 heuristic - Add Azure OpenAI overflow detection patterns - DE-aware compaction template preserving warehouse, schema, dbt, lineage, and FinOps context during summarization - Guard against empty observation masks sending empty tool_result - Add context management documentation page 153 tests across 4 test files, 1332 total suite passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c60b2a0 to
64c004a
Compare
| ? input.model.limit.input - Math.max(reserved, maxOutput) | ||
| : context - maxOutput - reserved |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Valid finding — fixed. The non-limit.input path was incorrectly subtracting both maxOutput and reserved (double-deduction of 20K tokens). Simplified both paths to use a single headroom = Math.max(reserved, maxOutput) deduction, which matches the original behavior for default configs (maxOutput typically dominates at 32K) while still respecting custom reserved config when set higher.
The counter was only incremented but never reset, causing it to accumulate across unrelated user turns within a session. After 3 successful compactions spread across many turns, the 4th would incorrectly trigger the "max attempts" error. Now resets to 0 after each successful non-compaction step. Fixes Sentry review comment on PR #35. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jontsai
left a comment
There was a problem hiding this comment.
LGTM
Solid context management improvements. The P0 fixes are correct and well-tested:
isOverflow headroom fix — The old code with limit.input only subtracted reserved without accounting for maxOutputTokens, meaning compaction triggered too late. The new Math.max(reserved, maxOutput) properly reserves space for the model's response. Good catch — this explains the upstream overflow bugs.
NamedError null guard — Simple and correct. input != null before typeof prevents the crash path.
Compaction loop protection — Max 3 attempts with proper reset on successful non-compaction steps is a clean design. The duplicate code block in prompt.ts (overflow_detection vs error_recovery paths) could be extracted into a helper, but that's a nit.
Observation masks — Nice fingerprinting approach. Surrogate pair safety in truncateArgs is a good detail. The fallback to generic message in message-v2.ts when no mask exists maintains backward compat.
Token estimation — Content-aware ratios are a meaningful improvement over flat chars/4. The 500-char sample limit for classification is pragmatic.
✓ All source changes reviewed
✓ Telemetry events well-typed
✓ Test coverage is thorough (95+ new tests)
✓ Docs and DE-aware compaction template are good additions
✓ PAID_CONTEXT_FEATURES.md is useful roadmap documentation
Ship it! 🚢
32 tests covering the compactionAttempts counter state machine: - Basic counter increment/reset behavior - Sentry fix validation: counter resets between successful turns - Overflow detection and compact result paths share counter - MAX_COMPACTION_ATTEMPTS (3) loop protection - Realistic multi-turn session scenarios - isOverflow boundary conditions and config edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-Model Code Review — PR #35: feat/context-management-improvements
Verdict: REQUEST CHANGES
🔴 Critical1.
|
Sentry correctly flagged that the non-limit.input path was subtracting both maxOutput AND reserved (20K buffer), causing compaction to trigger 20K tokens too early for most production models. Simplified both paths to use a single headroom = Math.max(reserved, maxOutput). For default configs (maxOutput=32K > buffer=20K), this matches the original upstream behavior while preserving the P0 fix for limit.input models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix text-end overwriting time.start (processor.ts) — use spread to preserve original start timestamp, matching reasoning-end handler - Guard negative usable in isOverflow (compaction.ts) — return false when headroom exceeds base instead of producing negative usable that would trigger compaction on every turn - Remove dead "manual" trigger type from telemetry union - Add tests for negative usable edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the thorough multi-model review! Here's the triage and resolution: Fixed (3df53b5)
Not a real bug
Out of scope (pre-existing / other PRs)
All 1366 tests pass. |
- Document loop protection (max 3 attempts, reset between turns) - Fix reserved config example (was 4096, should be 20000) - Clarify reserved field uses max(reserved, max_output) as headroom - Add CJK/emoji token estimation limitation note Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n masking, token estimation (#35) * feat: context management improvements — overflow recovery, observation masking, token estimation Fix critical bugs in context overflow detection and add production-hardening for long-running agent sessions: - Fix NamedError.isInstance(null) crash that would kill the agent if a provider returned a null error object - Fix isOverflow() headroom gap when limit.input is set — compaction now correctly reserves space for output tokens on models with separate input/output limits (fixes upstream bugs #10634, #8089, #11086) - Add compaction loop protection (max 3 attempts) to prevent infinite compact→overflow→compact cycles - Replace generic "[Old tool result content cleared]" with observation masks that preserve tool name, args, output size, and first-line fingerprint for better model continuity after pruning - Content-aware token estimation (code: 3.0, JSON: 3.2, SQL: 3.5, text: 4.0) replacing flat chars/4 heuristic - Add Azure OpenAI overflow detection patterns - DE-aware compaction template preserving warehouse, schema, dbt, lineage, and FinOps context during summarization - Guard against empty observation masks sending empty tool_result - Add context management documentation page 153 tests across 4 test files, 1332 total suite passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: reset compactionAttempts counter after successful processing step The counter was only incremented but never reset, causing it to accumulate across unrelated user turns within a session. After 3 successful compactions spread across many turns, the 4th would incorrectly trigger the "max attempts" error. Now resets to 0 after each successful non-compaction step. Fixes Sentry review comment on PR #35. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add compaction loop protection tests 32 tests covering the compactionAttempts counter state machine: - Basic counter increment/reset behavior - Sentry fix validation: counter resets between successful turns - Overflow detection and compact result paths share counter - MAX_COMPACTION_ATTEMPTS (3) loop protection - Realistic multi-turn session scenarios - isOverflow boundary conditions and config edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-deduction in isOverflow for non-limit.input models Sentry correctly flagged that the non-limit.input path was subtracting both maxOutput AND reserved (20K buffer), causing compaction to trigger 20K tokens too early for most production models. Simplified both paths to use a single headroom = Math.max(reserved, maxOutput). For default configs (maxOutput=32K > buffer=20K), this matches the original upstream behavior while preserving the P0 fix for limit.input models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address multi-model code review findings on PR #35 - Fix text-end overwriting time.start (processor.ts) — use spread to preserve original start timestamp, matching reasoning-end handler - Guard negative usable in isOverflow (compaction.ts) — return false when headroom exceeds base instead of producing negative usable that would trigger compaction on every turn - Remove dead "manual" trigger type from telemetry union - Add tests for negative usable edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update context management docs for review findings - Document loop protection (max 3 attempts, reset between turns) - Fix reserved config example (was 4096, should be 20000) - Clarify reserved field uses max(reserved, max_output) as headroom - Add CJK/emoji token estimation limitation note Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
packages/util/src/error.ts)limit.input(e.g., Claude with prompt caching) now correctly reserve space for output tokens, triggering compaction before overflow instead of after (fixes upstream bugs #10634, #8089, #11086)prompt.ts)[Tool output cleared — bash(command: "ls") returned 42 lines, 1.2 KB — "file1.txt"]instead of generic[Old tool result content cleared]tool_resultcontent to model APIsFiles changed (16 files, +1491 / -15)
Source (8 files)
packages/util/src/error.tsNamedError.isInstancesrc/session/compaction.tsisOverflowheadroom fix + observation masks + DE templatesrc/session/processor.tssrc/session/prompt.tssrc/session/message-v2.tssrc/provider/error.tssrc/util/token.tssrc/agent/prompt/compaction.txtTests (4 files)
test/util/token.test.tstest/session/compaction.test.tsisOverflowbug repros, surrogate pairstest/session/context-overflow.test.tsfromErroredge casestest/session/message-v2.test.tsDocs (3 files)
docs/docs/configure/context-management.mddocs/docs/configure/config.mddocs/mkdocs.ymlCompetitive comparison
Test plan
🤖 Generated with Claude Code