feat(core)!: redesign auto-compaction thresholds with three-tier ladder#4168
feat(core)!: redesign auto-compaction thresholds with three-tier ladder#4168LaZzyMan wants to merge 4 commits into
Conversation
📋 Review SummaryThis PR redesigns qwen-code's auto-compaction threshold system from a single 70% proportional threshold to a three-tier ladder (warn/auto/hard) combining proportional fallback with absolute reservation. The implementation aligns with claude-code's design, recovers significant wasted context on large windows (~267K on 1M models), and bundles several related improvements: 3-strike failure circuit breaker, local token estimation for accurate threshold gating, and predictable output budget control via 🔍 General Feedback
🎯 Specific Feedback🟡 High Priority Issues
🟢 Medium Priority Issues
🔵 Low Priority Suggestions
✅ Highlights
|
Replaces the single 70% proportional threshold with a three-tier ladder (warn/auto/hard) that combines proportional fallback with absolute reservation. Large-window models (>=128K) now reserve ~33K instead of 30% of the window, freeing tens of thousands of context tokens that the old formula wasted. Other improvements bundled in the same redesign: - Compression sideQuery now disables thinking and caps maxOutputTokens at 20K, matching claude-code so the buffer math is predictable across providers (Anthropic/OpenAI/Gemini handle thinking budgets inconsistently) - Failure handling upgraded from one-shot permanent lock to a 3-strike circuit breaker; reactive overflow still latches immediately - New estimatePromptTokens helper closes the lag-by-one-turn and first-send-is-0 gaps in lastPromptTokenCount - Hard-tier rescue pulls reactive overflow recovery forward to before the API call, saving an oversized round-trip - /context command displays the three-tier ladder + current tier - tipRegistry's context-* tips track the new thresholds instead of fixed 50/80/95 percentages BREAKING CHANGE: chatCompression.contextPercentageThreshold setting is removed. Settings files containing the field log a one-line deprecation warning at startup and the value is ignored; behaviour is now controlled by built-in thresholds via the new computeThresholds() function. Design: docs/design/auto-compaction-threshold-redesign.md Plan: docs/plans/2026-05-14-auto-compaction-threshold-redesign.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss test A pre-existing test case at chatCompressionService.test.ts:678 still passed `hasFailedCompressionAttempt: false` in the CompressOptions shape; rebasing onto current main surfaced this as a typecheck error because the field was renamed to `consecutiveFailures` (Task 7 of the three-tier ladder migration). Update to `consecutiveFailures: 0` — semantically equivalent, the test asserts the side-query is called when `force: true`, no other behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1dcef8c to
d270af0
Compare
Adds a defensive guard in ChatCompressionService.compress() that detects when the side-query summary hit COMPACT_MAX_OUTPUT_TOKENS (20K). In that case the summary is likely truncated mid-content, so we drop it and return NOOP rather than persist a half-summary. The next send re-tries; reactive overflow still catches the catastrophic case where the API rejects the next request as too large. Documented in the design doc as risk #2; the bot reviewer on PR #4168 correctly pushed for it to land alongside the threshold redesign rather than as a follow-up since the new 20K cap is what makes truncation likely in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review response — commit
|
| # | Outcome | Notes |
|---|---|---|
🟡 H1 — MAX_TOKENS guard |
✅ Fixed | Added a defensive check in compress() that NOOPs when compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS so a truncated summary isn't persisted. Includes a unit test asserting the path. Commit 6ce81e73c. |
🟡 H2 — estimateContentTokens footgun |
❌ Declined (false-positive) | The "MUST pass precomputedCharCounts" warning the comment references is on findCompressSplitPoint, not estimateContentTokens. The latter's imageTokenEstimate parameter has a benign default (DEFAULT_IMAGE_TOKEN_ESTIMATE = 1600) that matches the splitter's default, keeping the two estimators in sync. Different functions, different contracts. |
🟢 M1 — Hide TOOL_ROUND_RETAIN_COUNT |
❌ Declined (out of PR scope) | TOOL_ROUND_RETAIN_COUNT was already exported in chatCompressionService.ts before this PR. Reducing export visibility is a separate cleanup that I don't want to bundle into a threshold-redesign change. |
🟢 M2 — contextCommand.ts:177-183 still references the deprecated field |
❌ Declined (stale read) | This was actually rewritten by Task 11 of the redesign — contextCommand.ts now imports computeThresholds (line 28) and uses computeThresholds(contextWindowSize) (line 190). Grep contextPercentageThreshold in packages/cli/src/ui/commands/contextCommand.ts is empty. |
🟢 M3 — consecutiveFailures across --continue |
❌ Declined (works as intended) | consecutiveFailures is a private field on GeminiChat initialized to 0. --continue constructs a fresh GeminiChat (history is restored separately), so the counter naturally resets — which is the correct semantics (a restarted session should get a fresh 3-strike budget rather than inheriting a latched breaker from a previous run). |
🔵 L1 — JSDoc @example with threshold table |
❌ Declined (filter 3) | The same table lives in docs/design/auto-compaction-threshold-redesign.md (committed in this PR). Duplicating it in JSDoc creates two sources of truth that can drift independently when the constants are tuned. |
🔵 L2 — Missing BYTES_PER_TOKEN_JSON = 2 |
❌ Declined (not in code) | BYTES_PER_TOKEN_JSON doesn't exist in tokenEstimation.ts. The design doc only mentions it as a future possibility for json-dense content; the implementation deliberately uses a single BYTES_PER_TOKEN = 4 ratio (matching claude-code's approach). |
| 🔵 L3 / L4 — Verify files in PR | N/A | Both packages/cli/src/services/tips/tipRegistry.ts and packages/core/src/index.ts are in this PR (commit 28eb867a8); please re-check the changed-files view. |
Net: 1 fix accepted, 5 declined, 2 hallucinations dismissed. Force-pushed earlier (rebase onto main + the consecutiveFailures test fixup d270af030); this comment lands on top of the new merge-conflict-free branch tip 6ce81e73c.
🤖 Drafted with Claude Code using the review-response skill.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.
Adds a "Compaction thresholds" section after the per-category breakdown:
- Effective window
- Warn / Auto / Hard threshold rows with a ▶ marker on the row the
current usage has crossed
- Current tier label coloured by severity (safe→green, warn/auto→
yellow, hard→red)
The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.
Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E 测试报告跟进 review/CI 反馈,针对核心功能补做了多轮 E2E 验证。最终覆盖矩阵: ✅ 真实模型 E2E(最重要)配置:
实测 97.7% 上下文 reduction,所有 6 项断言通过:
✅ TUI
|
Summary
maxOutputTokenson the compression sideQuery, upgrades failure handling from a one-shot lock to a 3-strike circuit breaker, adds a local token estimator for the cheap-gate, plumbs a hard-tier rescue intosendMessageStream, rewires the/contextcommand andtipRegistrytips around the new thresholds, and removes thechatCompression.contextPercentageThresholdsetting.--continuecoverage (lastPromptTokenCount = 0previously bypassed all gates), and predictable buffer math across providers (thinking budget semantics vary).packages/core/src/services/chatCompressionService.ts— newcomputeThresholds(), tier constants, cheap-gatepackages/core/src/services/tokenEstimation.ts— local char/4 estimatorpackages/core/src/core/geminiChat.ts— hard-tier rescue +consecutiveFailuresbreakerpackages/cli/src/ui/commands/contextCommand.ts—/contextdisplayValidation
```bash
npm run typecheck # clean (4 workspaces)
npm run lint # clean (project files; pre-existing e2e-testing/scripts/*.js untouched)
cd packages/core && npx vitest run src/services src/core # 1930/1930 pass
cd packages/cli && npx vitest run # 5995/5995 pass + 9 skipped
```
Threshold table across windows (matches design doc):
Scope / Risk
Testing Matrix
Testing matrix notes:
Design references
Both are committed in this PR so the rationale is visible alongside the code.
🤖 Generated with Claude Code