Skip to content

Commit 1f856ba

Browse files
committed
fix(core,cli): address PR #4345 round-2 review feedback
- geminiChat: remove pre-call consecutiveFailures reset in hard-rescue. force=true already bypasses the breaker check in chatCompressionService; the pre-reset was redundant on success (post-call L614 already handles it) and *broke* the breaker on failure paths — hard-rescue failures don't increment via tryCompress (force=true skips that branch), only the reactive overflow path at L992 explicitly increments. With the pre-reset the counter oscillated 0↔1 every send and MAX_CONSECUTIVE_FAILURES=3 was unreachable. Wrote a RED test asserting the forwarded counter is the latched value, not zero; the test failed against the old code and passes with the reset removed. - geminiChat: log hard-tier-rescue triggers via debugLogger.warn including effectiveTokens, hard, and the current consecutiveFailures so operators debugging "compaction stopped working" have a breadcrumb. - chatCompressionService: clamp effectiveWindow to >= 0 in computeThresholds so the value surfaced in /context stays meaningful for tiny windows (window < SUMMARY_RESERVE). auto/warn/hard outputs are unaffected because each is Math.max(proportional, absolute) and the proportional branch dominates whenever the absolute branch goes negative. - turn.ts: rewrite COMPRESSION_FAILED_OUTPUT_TRUNCATED docstring. Drop the misleading "compression succeeded" framing (the summary is dropped and isCompressionFailureStatus returns true) and reference the full enum name COMPRESSION_FAILED_EMPTY_SUMMARY instead of the abbreviation. - contextCommand.test.ts: reword the no-API-data-session test comment. collectContextData classifies estimated sessions against rawOverhead; with default fixtures rawOverhead lands in `safe`, but heavy system-prompt / skill / MCP loads can push it into warn/auto/hard. - design doc Background: prepend a blockquote clarifying the section describes pre-redesign behavior and that the inline file:line references point at code before PR #4345 (which removes them). - ui/types: replace the duplicated ContextThresholds interface with a type alias to the core's CompactionThresholds. Field-by-field copy in contextCommand.ts becomes a direct spread. ContextUsage.tsx keeps its CompactionThresholds React component name — the alias avoids the collision a direct import would have caused. - contextCommand: interpolate the actual reserve value into the "(window − 20K reserve)" annotation so SUMMARY_RESERVE retuning doesn't leave the text stale.
1 parent f91b7e4 commit 1f856ba

8 files changed

Lines changed: 85 additions & 46 deletions

File tree

docs/design/auto-compaction-threshold-redesign.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
## 背景
66

7+
> 本节描述本 PR 落地**之前**的状态(pre-redesign behavior)。下文出现的 `COMPRESSION_TOKEN_THRESHOLD``thinkingConfig.includeThoughts = true``hasFailedCompressionAttempt`、以及具体的 file:line 引用都对应 PR #4345 合入前的代码——合入后这些符号 / 行号会不再有效。
8+
79
当前 qwen-code 的自动压缩仅使用单一比例阈值 `COMPRESSION_TOKEN_THRESHOLD = 0.7``chatCompressionService.ts:33`),所有窗口大小共用同一比例。对比 claude-code 的「绝对 token 梯子」(autoCompact.ts:62-65),qwen-code 存在三个具体问题:
810

911
1. **大窗口下预留过多**:1M 模型 70% 阈值在 700K 触发,剩余 300K 远超摘要 + 输出实际所需的 ~33K

packages/cli/src/ui/commands/contextCommand.test.ts

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -148,10 +148,14 @@ describe('/context shows three-tier thresholds', () => {
148148
});
149149

150150
it('treats no-API-data sessions as safe and omits the threshold section from text', async () => {
151-
// lastPromptTokenCount = 0 → collectContextData uses the estimated branch:
152-
// currentTier should be `safe` regardless of overhead size, and
153-
// formatContextUsageText must NOT emit the "Compaction thresholds" section
154-
// because the estimated path renders a different layout.
151+
// lastPromptTokenCount = 0 → collectContextData uses the estimated branch
152+
// (classifies against `rawOverhead`, not apiTotalTokens). With these
153+
// default fixtures rawOverhead lands well below `warn`, so currentTier
154+
// resolves to `safe`. On heavy system-prompt / skill / MCP loads the
155+
// estimated branch can return warn/auto/hard — this test only covers
156+
// the default-fixture safe case. formatContextUsageText must NOT emit
157+
// the "Compaction thresholds" section because the estimated path
158+
// renders a different layout.
155159
mockGetLastPromptTokenCount.mockReturnValue(0);
156160
const data = await collectContextData(makeMockConfig(200_000), false);
157161
expect(data.breakdown.currentTier).toBe('safe');

packages/cli/src/ui/commands/contextCommand.ts

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -332,12 +332,7 @@ export async function collectContextData(
332332
messages: messagesTokens,
333333
freeSpace,
334334
autocompactBuffer,
335-
thresholds: {
336-
effectiveWindow: thresholds.effectiveWindow,
337-
warn: thresholds.warn,
338-
auto: thresholds.auto,
339-
hard: thresholds.hard,
340-
},
335+
thresholds,
341336
currentTier: currentTier(tierTokens, thresholds),
342337
};
343338

@@ -428,7 +423,7 @@ export function formatContextUsageText(data: HistoryItemContextUsage): string {
428423
lines.push('');
429424
lines.push('**Compaction thresholds**');
430425
lines.push(
431-
` Effective window: ${formatNum(breakdown.thresholds.effectiveWindow)} (window − 20K reserve)`,
426+
` Effective window: ${formatNum(breakdown.thresholds.effectiveWindow)} (window − ${formatNum(contextWindowSize - breakdown.thresholds.effectiveWindow)} reserve)`,
432427
);
433428
lines.push(` Warn threshold: ${formatNum(breakdown.thresholds.warn)}`);
434429
lines.push(` Auto threshold: ${formatNum(breakdown.thresholds.auto)}`);

packages/cli/src/ui/types.ts

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
*/
66

77
import type {
8+
CompactionThresholds,
89
CompressionStatus,
910
MCPServerConfig,
1011
ThoughtSummary,
@@ -344,16 +345,14 @@ export type HistoryItemMcpStatus = HistoryItemBase & {
344345

345346
export type ContextTier = 'safe' | 'warn' | 'auto' | 'hard';
346347

347-
export interface ContextThresholds {
348-
/** Window minus 20K summary reserve — the budget available for input + summary. */
349-
effectiveWindow: number;
350-
/** Token count at which the warn tier triggers. */
351-
warn: number;
352-
/** Token count at which auto-compaction triggers. */
353-
auto: number;
354-
/** Token count at which auto-compaction is forced (resets failure counter). */
355-
hard: number;
356-
}
348+
/**
349+
* Alias for the core compaction-thresholds shape. Re-exported under the
350+
* CLI-friendly name so consumers in this package don't pull on the core
351+
* module path; structurally identical to `CompactionThresholds`. The
352+
* `readonly` modifiers on the core type are immaterial for UI rendering,
353+
* but kept implicitly through the alias.
354+
*/
355+
export type ContextThresholds = CompactionThresholds;
357356

358357
export interface ContextCategoryBreakdown {
359358
systemPrompt: number;

packages/core/src/core/geminiChat.test.ts

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2067,16 +2067,30 @@ describe('GeminiChat', async () => {
20672067
).toBe(true);
20682068
});
20692069

2070-
it('resets consecutiveFailures before forcing when hard threshold crossed', async () => {
2071-
// Pre-latch the breaker by failing the unforced cheap-gate
2072-
// MAX_CONSECUTIVE_FAILURES times below the hard threshold.
2070+
it('forwards latched consecutiveFailures into hard-rescue (no pre-call reset); success recovers via the post-call branch', async () => {
2071+
// Hard-rescue uses force=true, which already bypasses the
2072+
// chatCompressionService breaker (chatCompressionService.ts:339
2073+
// checks `!force`) regardless of the counter value — so a pre-call
2074+
// reset is unnecessary for "let the latched breaker recover".
2075+
//
2076+
// Pre-resetting would in fact DEFEAT the breaker on
2077+
// persistent-failure sessions: hard-rescue failures don't increment
2078+
// via tryCompress (force=true skips `if (!force)` at L627), and
2079+
// only the reactive overflow path at L992 explicitly increments.
2080+
// If hard-rescue zeroed the counter on every send, the L992
2081+
// increment would be wiped next send and the counter would
2082+
// oscillate 0↔1 indefinitely.
2083+
//
2084+
// Correct behavior asserted here: hard-rescue forwards the existing
2085+
// counter value as-is; on COMPRESSED success the post-call branch
2086+
// at geminiChat.ts:614 resets to 0 (recovering a latched session).
20732087
const compressSpy = vi.spyOn(
20742088
ChatCompressionService.prototype,
20752089
'compress',
20762090
);
20772091

2078-
// The latching sends never touch the hard tier; lastPromptTokenCount is
2079-
// small enough that effective < hard, so force stays false on each.
2092+
// Step 1: latch the breaker via MAX_CONSECUTIVE_FAILURES below-hard
2093+
// failures (cheap-gate path, force=false).
20802094
compressSpy.mockResolvedValue({
20812095
newHistory: null,
20822096
info: {
@@ -2101,14 +2115,15 @@ describe('GeminiChat', async () => {
21012115
}
21022116
expect(compressSpy.mock.calls[i][1].force).toBe(false);
21032117
}
2104-
// The counter is now at MAX_CONSECUTIVE_FAILURES (latched).
2118+
// Pre-increment semantic: i-th call sees i; counter on chat is now
2119+
// MAX_CONSECUTIVE_FAILURES (latched).
21052120
expect(compressSpy.mock.calls.at(-1)![1].consecutiveFailures).toBe(
21062121
MAX_CONSECUTIVE_FAILURES - 1,
21072122
);
21082123

2109-
// Now bump lastPromptTokenCount into hard tier and send again. The
2110-
// hard-tier rescue must reset the counter and force=true on the call —
2111-
// not short-circuit on the latched breaker.
2124+
// Step 2: bump lastPromptTokenCount into hard tier and send again.
2125+
// Hard-rescue fires (force=true) and the COMPRESSED result triggers
2126+
// the post-call reset at geminiChat.ts:614.
21122127
compressSpy.mockClear();
21132128
compressSpy.mockResolvedValueOnce({
21142129
newHistory: [
@@ -2125,17 +2140,18 @@ describe('GeminiChat', async () => {
21252140
const rescueStream = await chat.sendMessageStream(
21262141
'test-model',
21272142
{ message: 'rescue me' },
2128-
'prompt-hard-rescue-reset',
2143+
'prompt-hard-rescue-no-prereset',
21292144
);
21302145
for await (const _ of rescueStream) {
21312146
/* consume */
21322147
}
21332148

21342149
expect(compressSpy).toHaveBeenCalledTimes(1);
2135-
// Counter forwarded to the service must be 0 (reset before the call),
2136-
// not MAX_CONSECUTIVE_FAILURES (which would gate the cheap-gate).
2137-
expect(compressSpy.mock.calls[0][1].consecutiveFailures).toBe(0);
21382150
expect(compressSpy.mock.calls[0][1].force).toBe(true);
2151+
// Counter forwarded as-is — the LATCHED value, NOT zero.
2152+
expect(compressSpy.mock.calls[0][1].consecutiveFailures).toBe(
2153+
MAX_CONSECUTIVE_FAILURES,
2154+
);
21392155
});
21402156

21412157
it('does not force when tokens are below hard threshold (normal auto path)', async () => {

packages/core/src/core/geminiChat.ts

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -736,9 +736,7 @@ export class GeminiChat {
736736
// Hard-tier rescue: when the estimated prompt size is at or above the
737737
// hard threshold (effectiveWindow - HARD_BUFFER), force compaction in
738738
// this send instead of waiting for the API to reject the request as too
739-
// large. This also resets the consecutive-failure counter so a session
740-
// that previously latched the breaker can recover — hard implies the
741-
// next API call would very likely overflow without compaction.
739+
// large.
742740
//
743741
// We compute `effectiveTokens` ONCE here and pass it through to
744742
// tryCompress → service.compress so the cheap-gate doesn't redo the
@@ -747,6 +745,17 @@ export class GeminiChat {
747745
// hard-tier rescue used the default imageTokenEstimate while the
748746
// cheap-gate inside tryCompress used the user's resolved value.
749747
// (review #4168 R1.3 + R1.4)
748+
//
749+
// The consecutive-failure counter is NOT pre-reset here. force=true
750+
// already bypasses the breaker (chatCompressionService.ts:339 checks
751+
// `!force`), so a latched session can still attempt hard-rescue;
752+
// pre-resetting would defeat the breaker entirely because hard-rescue
753+
// failures don't increment via tryCompress (force=true skips the
754+
// `if (!force)` increment), and only reactive overflow at line 992
755+
// increments explicitly. With a pre-reset the counter would oscillate
756+
// 0↔1 across sends and never trip. On COMPRESSED success the post-call
757+
// branch at line 614 still resets to 0, which is the correct recovery
758+
// path for a previously-latched session.
750759
const contextLimit =
751760
this.config.getContentGeneratorConfig()?.contextWindowSize ??
752761
DEFAULT_TOKEN_LIMIT;
@@ -758,6 +767,11 @@ export class GeminiChat {
758767
// API-authoritative count + a tiny estimate of just the new user
759768
// message — it does NOT touch the history at all in that branch, so
760769
// skip the costly `getHistory(true)` clone on the steady-state path.
770+
// The lastPromptTokenCount=0 branch (first send after --continue
771+
// restore / subagent inheritance) walks history with a char/4
772+
// heuristic that can under-count by ~15-20K tokens; the reactive
773+
// overflow path at line 944 is the documented safety net when this
774+
// under-count causes hard-rescue to miss.
761775
const effectiveTokens = estimatePromptTokens(
762776
this.lastPromptTokenCount > 0 ? [] : this.getHistory(true),
763777
userContent,
@@ -766,7 +780,9 @@ export class GeminiChat {
766780
);
767781
const shouldForceFromHard = effectiveTokens >= hard;
768782
if (shouldForceFromHard) {
769-
this.consecutiveFailures = 0;
783+
debugLogger.warn(
784+
`[compaction] hard-tier rescue triggered: effectiveTokens=${effectiveTokens}, hard=${hard}, consecutiveFailures=${this.consecutiveFailures}.`,
785+
);
770786
}
771787

772788
compressionInfo = await this.tryCompress(

packages/core/src/core/turn.ts

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -173,13 +173,15 @@ export enum CompressionStatus {
173173
NOOP,
174174

175175
/**
176-
* The compression succeeded but the summary output hit
177-
* COMPACT_MAX_OUTPUT_TOKENS, suggesting truncation. Distinct from
178-
* `EMPTY_SUMMARY` so telemetry can separate prompt-quality failures
179-
* (empty / nonsensical summary) from capacity failures (output cap
180-
* hit, may need a higher cap or finer-grained splitter).
181-
* `isCompressionFailureStatus` treats this as a failure so it counts
182-
* toward the per-chat circuit breaker. (R5.2)
176+
* The compression call produced a summary, but the output hit
177+
* COMPACT_MAX_OUTPUT_TOKENS, indicating likely truncation. The summary
178+
* is dropped (newHistory=null) and the attempt is treated as a failure:
179+
* `isCompressionFailureStatus` returns true so it counts toward the
180+
* per-chat circuit breaker. Kept distinct from
181+
* `COMPRESSION_FAILED_EMPTY_SUMMARY` so telemetry can separate
182+
* prompt-quality failures (empty / nonsensical summary) from capacity
183+
* failures (output cap hit, may need a higher cap or finer-grained
184+
* splitter). (R5.2)
183185
*/
184186
COMPRESSION_FAILED_OUTPUT_TRUNCATED,
185187
}

packages/core/src/services/chatCompressionService.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,12 @@ export interface CompactionThresholds {
128128
* Pure function — no I/O, no shared state — safe to call repeatedly.
129129
*/
130130
export function computeThresholds(window: number): CompactionThresholds {
131-
const effectiveWindow = window - SUMMARY_RESERVE;
131+
// Clamp to 0 for tiny windows (window < SUMMARY_RESERVE) so the surfaced
132+
// value in `/context` stays meaningful. The Math.max guards on auto/warn/hard
133+
// below absorb the floor — clamping does not shift those outputs because
134+
// each is `max(proportional, absolute)` and the proportional branch
135+
// dominates whenever the absolute branch goes negative.
136+
const effectiveWindow = Math.max(0, window - SUMMARY_RESERVE);
132137

133138
const absAuto = effectiveWindow - AUTOCOMPACT_BUFFER;
134139
const auto = Math.max(DEFAULT_PCT * window, absAuto);

0 commit comments

Comments
 (0)