fix(core,cli): address PR #4345 round-2 review feedback

LaZzyMan · LaZzyMan · commit 1f856ba3776d · 2026-05-21T10:29:00.000+08:00
- geminiChat: remove pre-call consecutiveFailures reset in hard-rescue. force=true already bypasses the breaker check in chatCompressionService; the pre-reset was redundant on success (post-call L614 already handles it) and *broke* the breaker on failure paths — hard-rescue failures don't increment via tryCompress (force=true skips that branch), only the reactive overflow path at L992 explicitly increments. With the pre-reset the counter oscillated 0↔1 every send and MAX_CONSECUTIVE_FAILURES=3 was unreachable. Wrote a RED test asserting the forwarded counter is the latched value, not zero; the test failed against the old code and passes with the reset removed. - geminiChat: log hard-tier-rescue triggers via debugLogger.warn including effectiveTokens, hard, and the current consecutiveFailures so operators debugging "compaction stopped working" have a breadcrumb. - chatCompressionService: clamp effectiveWindow to >= 0 in computeThresholds so the value surfaced in /context stays meaningful for tiny windows (window < SUMMARY_RESERVE). auto/warn/hard outputs are unaffected because each is Math.max(proportional, absolute) and the proportional branch dominates whenever the absolute branch goes negative. - turn.ts: rewrite COMPRESSION_FAILED_OUTPUT_TRUNCATED docstring. Drop the misleading "compression succeeded" framing (the summary is dropped and isCompressionFailureStatus returns true) and reference the full enum name COMPRESSION_FAILED_EMPTY_SUMMARY instead of the abbreviation. - contextCommand.test.ts: reword the no-API-data-session test comment. collectContextData classifies estimated sessions against rawOverhead; with default fixtures rawOverhead lands in `safe`, but heavy system-prompt / skill / MCP loads can push it into warn/auto/hard. - design doc Background: prepend a blockquote clarifying the section describes pre-redesign behavior and that the inline file:line references point at code before PR #4345 (which removes them). - ui/types: replace the duplicated ContextThresholds interface with a type alias to the core's CompactionThresholds. Field-by-field copy in contextCommand.ts becomes a direct spread. ContextUsage.tsx keeps its CompactionThresholds React component name — the alias avoids the collision a direct import would have caused. - contextCommand: interpolate the actual reserve value into the "(window − 20K reserve)" annotation so SUMMARY_RESERVE retuning doesn't leave the text stale.
diff --git a/docs/design/auto-compaction-threshold-redesign.md b/docs/design/auto-compaction-threshold-redesign.md
@@ -4,6 +4,8 @@
 
 ## 背景
 
+> 本节描述本 PR 落地**之前**的状态（pre-redesign behavior）。下文出现的 `COMPRESSION_TOKEN_THRESHOLD`、`thinkingConfig.includeThoughts = true`、`hasFailedCompressionAttempt`、以及具体的 file:line 引用都对应 PR #4345 合入前的代码——合入后这些符号 / 行号会不再有效。
+
 当前 qwen-code 的自动压缩仅使用单一比例阈值 `COMPRESSION_TOKEN_THRESHOLD = 0.7`（`chatCompressionService.ts:33`），所有窗口大小共用同一比例。对比 claude-code 的「绝对 token 梯子」（autoCompact.ts:62-65），qwen-code 存在三个具体问题：
 
 1. **大窗口下预留过多**：1M 模型 70% 阈值在 700K 触发，剩余 300K 远超摘要 + 输出实际所需的 ~33K
diff --git a/packages/cli/src/ui/commands/contextCommand.test.ts b/packages/cli/src/ui/commands/contextCommand.test.ts
@@ -148,10 +148,14 @@ describe('/context shows three-tier thresholds', () => {
   });
 
   it('treats no-API-data sessions as safe and omits the threshold section from text', async () => {
-    // lastPromptTokenCount = 0 → collectContextData uses the estimated branch:
-    //   currentTier should be `safe` regardless of overhead size, and
-    //   formatContextUsageText must NOT emit the "Compaction thresholds" section
-    //   because the estimated path renders a different layout.
+    // lastPromptTokenCount = 0 → collectContextData uses the estimated branch
+    // (classifies against `rawOverhead`, not apiTotalTokens). With these
+    // default fixtures rawOverhead lands well below `warn`, so currentTier
+    // resolves to `safe`. On heavy system-prompt / skill / MCP loads the
+    // estimated branch can return warn/auto/hard — this test only covers
+    // the default-fixture safe case. formatContextUsageText must NOT emit
+    // the "Compaction thresholds" section because the estimated path
+    // renders a different layout.
     mockGetLastPromptTokenCount.mockReturnValue(0);
     const data = await collectContextData(makeMockConfig(200_000), false);
     expect(data.breakdown.currentTier).toBe('safe');
diff --git a/packages/cli/src/ui/commands/contextCommand.ts b/packages/cli/src/ui/commands/contextCommand.ts
@@ -332,12 +332,7 @@ export async function collectContextData(
     messages: messagesTokens,
     freeSpace,
     autocompactBuffer,
-    thresholds: {
-      effectiveWindow: thresholds.effectiveWindow,
-      warn: thresholds.warn,
-      auto: thresholds.auto,
-      hard: thresholds.hard,
-    },
+    thresholds,
     currentTier: currentTier(tierTokens, thresholds),
   };
 
@@ -428,7 +423,7 @@ export function formatContextUsageText(data: HistoryItemContextUsage): string {
     lines.push('');
     lines.push('**Compaction thresholds**');
     lines.push(
-      `  Effective window:   ${formatNum(breakdown.thresholds.effectiveWindow)}  (window − 20K reserve)`,
+      `  Effective window:   ${formatNum(breakdown.thresholds.effectiveWindow)}  (window − ${formatNum(contextWindowSize - breakdown.thresholds.effectiveWindow)} reserve)`,
     );
     lines.push(`  Warn threshold:     ${formatNum(breakdown.thresholds.warn)}`);
     lines.push(`  Auto threshold:     ${formatNum(breakdown.thresholds.auto)}`);
diff --git a/packages/cli/src/ui/types.ts b/packages/cli/src/ui/types.ts
@@ -5,6 +5,7 @@
  */
 
 import type {
+  CompactionThresholds,
   CompressionStatus,
   MCPServerConfig,
   ThoughtSummary,
@@ -344,16 +345,14 @@ export type HistoryItemMcpStatus = HistoryItemBase & {
 
 export type ContextTier = 'safe' | 'warn' | 'auto' | 'hard';
 
-export interface ContextThresholds {
-  /** Window minus 20K summary reserve — the budget available for input + summary. */
-  effectiveWindow: number;
-  /** Token count at which the warn tier triggers. */
-  warn: number;
-  /** Token count at which auto-compaction triggers. */
-  auto: number;
-  /** Token count at which auto-compaction is forced (resets failure counter). */
-  hard: number;
-}
+/**
+ * Alias for the core compaction-thresholds shape. Re-exported under the
+ * CLI-friendly name so consumers in this package don't pull on the core
+ * module path; structurally identical to `CompactionThresholds`. The
+ * `readonly` modifiers on the core type are immaterial for UI rendering,
+ * but kept implicitly through the alias.
+ */
+export type ContextThresholds = CompactionThresholds;
 
 export interface ContextCategoryBreakdown {
   systemPrompt: number;
diff --git a/packages/core/src/core/geminiChat.test.ts b/packages/core/src/core/geminiChat.test.ts
@@ -2067,16 +2067,30 @@ describe('GeminiChat', async () => {
       ).toBe(true);
     });
 
-    it('resets consecutiveFailures before forcing when hard threshold crossed', async () => {
-      // Pre-latch the breaker by failing the unforced cheap-gate
-      // MAX_CONSECUTIVE_FAILURES times below the hard threshold.
+    it('forwards latched consecutiveFailures into hard-rescue (no pre-call reset); success recovers via the post-call branch', async () => {
+      // Hard-rescue uses force=true, which already bypasses the
+      // chatCompressionService breaker (chatCompressionService.ts:339
+      // checks `!force`) regardless of the counter value — so a pre-call
+      // reset is unnecessary for "let the latched breaker recover".
+      //
+      // Pre-resetting would in fact DEFEAT the breaker on
+      // persistent-failure sessions: hard-rescue failures don't increment
+      // via tryCompress (force=true skips `if (!force)` at L627), and
+      // only the reactive overflow path at L992 explicitly increments.
+      // If hard-rescue zeroed the counter on every send, the L992
+      // increment would be wiped next send and the counter would
+      // oscillate 0↔1 indefinitely.
+      //
+      // Correct behavior asserted here: hard-rescue forwards the existing
+      // counter value as-is; on COMPRESSED success the post-call branch
+      // at geminiChat.ts:614 resets to 0 (recovering a latched session).
       const compressSpy = vi.spyOn(
         ChatCompressionService.prototype,
         'compress',
       );
 
-      // The latching sends never touch the hard tier; lastPromptTokenCount is
-      // small enough that effective < hard, so force stays false on each.
+      // Step 1: latch the breaker via MAX_CONSECUTIVE_FAILURES below-hard
+      // failures (cheap-gate path, force=false).
       compressSpy.mockResolvedValue({
         newHistory: null,
         info: {
@@ -2101,14 +2115,15 @@ describe('GeminiChat', async () => {
         }
         expect(compressSpy.mock.calls[i][1].force).toBe(false);
       }
-      // The counter is now at MAX_CONSECUTIVE_FAILURES (latched).
+      // Pre-increment semantic: i-th call sees i; counter on chat is now
+      // MAX_CONSECUTIVE_FAILURES (latched).
       expect(compressSpy.mock.calls.at(-1)![1].consecutiveFailures).toBe(
         MAX_CONSECUTIVE_FAILURES - 1,
       );
 
-      // Now bump lastPromptTokenCount into hard tier and send again. The
-      // hard-tier rescue must reset the counter and force=true on the call —
-      // not short-circuit on the latched breaker.
+      // Step 2: bump lastPromptTokenCount into hard tier and send again.
+      // Hard-rescue fires (force=true) and the COMPRESSED result triggers
+      // the post-call reset at geminiChat.ts:614.
       compressSpy.mockClear();
       compressSpy.mockResolvedValueOnce({
         newHistory: [
@@ -2125,17 +2140,18 @@ describe('GeminiChat', async () => {
       const rescueStream = await chat.sendMessageStream(
         'test-model',
         { message: 'rescue me' },
-        'prompt-hard-rescue-reset',
+        'prompt-hard-rescue-no-prereset',
       );
       for await (const _ of rescueStream) {
         /* consume */
       }
 
       expect(compressSpy).toHaveBeenCalledTimes(1);
-      // Counter forwarded to the service must be 0 (reset before the call),
-      // not MAX_CONSECUTIVE_FAILURES (which would gate the cheap-gate).
-      expect(compressSpy.mock.calls[0][1].consecutiveFailures).toBe(0);
       expect(compressSpy.mock.calls[0][1].force).toBe(true);
+      // Counter forwarded as-is — the LATCHED value, NOT zero.
+      expect(compressSpy.mock.calls[0][1].consecutiveFailures).toBe(
+        MAX_CONSECUTIVE_FAILURES,
+      );
     });
 
     it('does not force when tokens are below hard threshold (normal auto path)', async () => {
diff --git a/packages/core/src/core/geminiChat.ts b/packages/core/src/core/geminiChat.ts
@@ -736,9 +736,7 @@ export class GeminiChat {
       // Hard-tier rescue: when the estimated prompt size is at or above the
       // hard threshold (effectiveWindow - HARD_BUFFER), force compaction in
       // this send instead of waiting for the API to reject the request as too
-      // large. This also resets the consecutive-failure counter so a session
-      // that previously latched the breaker can recover — hard implies the
-      // next API call would very likely overflow without compaction.
+      // large.
       //
       // We compute `effectiveTokens` ONCE here and pass it through to
       // tryCompress → service.compress so the cheap-gate doesn't redo the
@@ -747,6 +745,17 @@ export class GeminiChat {
       // hard-tier rescue used the default imageTokenEstimate while the
       // cheap-gate inside tryCompress used the user's resolved value.
       // (review #4168 R1.3 + R1.4)
+      //
+      // The consecutive-failure counter is NOT pre-reset here. force=true
+      // already bypasses the breaker (chatCompressionService.ts:339 checks
+      // `!force`), so a latched session can still attempt hard-rescue;
+      // pre-resetting would defeat the breaker entirely because hard-rescue
+      // failures don't increment via tryCompress (force=true skips the
+      // `if (!force)` increment), and only reactive overflow at line 992
+      // increments explicitly. With a pre-reset the counter would oscillate
+      // 0↔1 across sends and never trip. On COMPRESSED success the post-call
+      // branch at line 614 still resets to 0, which is the correct recovery
+      // path for a previously-latched session.
       const contextLimit =
         this.config.getContentGeneratorConfig()?.contextWindowSize ??
         DEFAULT_TOKEN_LIMIT;
@@ -758,6 +767,11 @@ export class GeminiChat {
       // API-authoritative count + a tiny estimate of just the new user
       // message — it does NOT touch the history at all in that branch, so
       // skip the costly `getHistory(true)` clone on the steady-state path.
+      // The lastPromptTokenCount=0 branch (first send after --continue
+      // restore / subagent inheritance) walks history with a char/4
+      // heuristic that can under-count by ~15-20K tokens; the reactive
+      // overflow path at line 944 is the documented safety net when this
+      // under-count causes hard-rescue to miss.
       const effectiveTokens = estimatePromptTokens(
         this.lastPromptTokenCount > 0 ? [] : this.getHistory(true),
         userContent,
@@ -766,7 +780,9 @@ export class GeminiChat {
       );
       const shouldForceFromHard = effectiveTokens >= hard;
       if (shouldForceFromHard) {
-        this.consecutiveFailures = 0;
+        debugLogger.warn(
+          `[compaction] hard-tier rescue triggered: effectiveTokens=${effectiveTokens}, hard=${hard}, consecutiveFailures=${this.consecutiveFailures}.`,
+        );
       }
 
       compressionInfo = await this.tryCompress(
diff --git a/packages/core/src/core/turn.ts b/packages/core/src/core/turn.ts
@@ -173,13 +173,15 @@ export enum CompressionStatus {
   NOOP,
 
   /**
-   * The compression succeeded but the summary output hit
-   * COMPACT_MAX_OUTPUT_TOKENS, suggesting truncation. Distinct from
-   * `EMPTY_SUMMARY` so telemetry can separate prompt-quality failures
-   * (empty / nonsensical summary) from capacity failures (output cap
-   * hit, may need a higher cap or finer-grained splitter).
-   * `isCompressionFailureStatus` treats this as a failure so it counts
-   * toward the per-chat circuit breaker. (R5.2)
+   * The compression call produced a summary, but the output hit
+   * COMPACT_MAX_OUTPUT_TOKENS, indicating likely truncation. The summary
+   * is dropped (newHistory=null) and the attempt is treated as a failure:
+   * `isCompressionFailureStatus` returns true so it counts toward the
+   * per-chat circuit breaker. Kept distinct from
+   * `COMPRESSION_FAILED_EMPTY_SUMMARY` so telemetry can separate
+   * prompt-quality failures (empty / nonsensical summary) from capacity
+   * failures (output cap hit, may need a higher cap or finer-grained
+   * splitter). (R5.2)
    */
   COMPRESSION_FAILED_OUTPUT_TRUNCATED,
 }
diff --git a/packages/core/src/services/chatCompressionService.ts b/packages/core/src/services/chatCompressionService.ts
@@ -128,7 +128,12 @@ export interface CompactionThresholds {
  * Pure function — no I/O, no shared state — safe to call repeatedly.
  */
 export function computeThresholds(window: number): CompactionThresholds {
-  const effectiveWindow = window - SUMMARY_RESERVE;
+  // Clamp to 0 for tiny windows (window < SUMMARY_RESERVE) so the surfaced
+  // value in `/context` stays meaningful. The Math.max guards on auto/warn/hard
+  // below absorb the floor — clamping does not shift those outputs because
+  // each is `max(proportional, absolute)` and the proportional branch
+  // dominates whenever the absolute branch goes negative.
+  const effectiveWindow = Math.max(0, window - SUMMARY_RESERVE);
 
   const absAuto = effectiveWindow - AUTOCOMPACT_BUFFER;
   const auto = Math.max(DEFAULT_PCT * window, absAuto);