fix(core): add heap-pressure auto-compaction safety net by yiliang114 · Pull Request #4186 · QwenLM/qwen-code

yiliang114 · 2026-05-15T15:28:30Z

Summary

What changed:
- Add a temporary heap-pressure safety net in GeminiChat.tryCompress(): when V8 heapUsed / heap_size_limit >= 70%, the chat attempts auto-compaction even if the token/context threshold has not been reached yet.
- Add bypassTokenThreshold to ChatCompressionService so heap-pressure compaction can skip only the token gate while preserving automatic compaction semantics. It deliberately does not use force=true, because force=true carries manual /compress behavior.
- Add GeminiChat.getLastHistoryEntry() and use it in nextSpeakerChecker to avoid cloning the full comprehensive history just to inspect the last turn.
Why it changed:
- Recent OOM reports show V8 heap can reach its limit before token-based compaction runs, especially in long sessions, large-context models, and large tool-output/diff workflows.
- This is intended as a small transitional mitigation while broader memory work is in progress.
Reviewer focus:
- packages/core/src/core/geminiChat.ts: heap-pressure guard and the decision to keep force=false.
- packages/core/src/services/chatCompressionService.ts: bypassTokenThreshold skips only the token threshold gate and still respects the existing failed-compression latch.
- packages/core/src/utils/nextSpeakerChecker.ts: O(1) last-history inspection instead of a full comprehensive-history clone.

Validation

Commands run:

cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts
cd packages/core && npm run typecheck
npm run build && npm run typecheck

Prompts / inputs used:
- No live prompt was required. Unit tests mock high V8 heap pressure and compression-service behavior.
Expected result:
- Heap pressure bypasses only the token gate and does not set force=true.
- Under-threshold compression remains a NOOP unless bypassTokenThreshold is set.
- nextSpeakerChecker calls getHistory(true) only once and uses getLastHistoryEntry() for comprehensive-history last-turn checks.
- Workspace build and typecheck complete successfully.
Observed result:
```
✓ src/services/chatCompressionService.test.ts (54 tests)
✓ src/utils/nextSpeakerChecker.test.ts (11 tests)
✓ src/core/geminiChat.test.ts (81 tests)

Test Files  3 passed (3)
     Tests  146 passed (146)
```
cd packages/core && npm run typecheck passed.
npm run build && npm run typecheck passed. The build emitted existing non-blocking warnings for stale Browserslist data and packages/vscode-ide-companion/src/utils/editorGroupUtils.ts curly lint warning, with 0 errors.

Quickest reviewer verification path:

cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts

Evidence:
- Added tests cover heap-pressure bypassTokenThreshold, service-level threshold bypass, defensive last-history clone behavior, and avoiding the second full-history clone in nextSpeakerChecker.

Scope / Risk

Main risk or tradeoff:
- This can start auto-compaction earlier when heap pressure is high. That is intentional, but compression itself still has peak-memory cost. This PR avoids force=true specifically to avoid manual /compress semantics and to keep the change narrow.
Not covered / not validated:
- This does not claim to fix all OOM classes.
- It does not implement large tool-result offload, stream/logging retention reduction, heap snapshots, or /doctor memory diagnostics.
- It does not reproduce a multi-hour live OOM session locally; validation is targeted unit coverage plus build/typecheck.
Breaking changes / migration notes:
- None. bypassTokenThreshold is an internal optional service option.

Coordination with related memory work

This PR is intentionally small and transitional because several broader memory efforts are already open or being discussed:

fix(core): memory-based chat compression to prevent heap OOM #4127 — memory-based compression trigger with fixed heap thresholds.
feat(core)!: redesign auto-compaction thresholds with three-tier ladder #4168 — larger auto-compaction threshold redesign with a three-tier ladder.
feat(cli): add baseline /doctor memory diagnostics #4180 and feat(cli): add structured memory diagnostics JSON #3785 — /doctor memory diagnostics PRs.
Add baseline /doctor memory diagnostics #4179, Add interpretation layer to /doctor memory diagnostics #4181, Add structured /doctor memory output and safe session-scale stats #4182 — memory diagnostics issue slices.
Diagnose and mitigate large tool-result retention in long sessions #4184 — large tool-result retention diagnostics and offload direction.
feat(telemetry): add interaction span and detailed sensitive attributes #4097 and feat(telemetry): unify span creation paths for hierarchical trace tree #4126 — telemetry/logging span changes, relevant to retention when telemetry is enabled.
feat(core,cli): two-phase session listing for instant /resume first frame #3989, feat(cli): add fork-session resume flag #4159, feat(worktree): Phase C — session persistence, hooksPath, Footer + WorktreeExitDialog, three-mode --resume restore #4174 — resume/session-related PRs, relevant to /resume memory behavior.

The goal here is to reduce default long-session OOM risk without taking ownership of the full memory-diagnostics/tool-offload redesign.

Testing Matrix

	🍏	🪟	🐧
npm run	✅	⚠️	⚠️
npx	✅	⚠️	⚠️
Docker	⚠️	⚠️	⚠️
Podman	⚠️	N/A	N/A
Seatbelt	⚠️	N/A	N/A

Testing matrix notes:

Tested locally on macOS arm64 with focused core vitest, core typecheck, and root build/typecheck.
Windows/Linux/container execution was not run locally; the changed logic is Node/V8 process-memory logic plus TypeScript unit coverage.

Linked Issues / Bugs

Link: QwenLM#4185

github-actions · 2026-05-15T15:30:12Z

📋 Review Summary

This PR introduces a temporary heap-pressure safety net for chat compression, allowing auto-compaction to trigger based on V8 heap usage (≥70% of heap_size_limit) even when the token threshold hasn't been reached. It also adds a getLastHistoryEntry() method for O(1) last-message inspection, which is used in nextSpeakerChecker to avoid cloning the full comprehensive history. The implementation is well-tested, narrowly scoped, and addresses a real OOM mitigation need while broader memory work is in progress.

🔍 General Feedback

Positive aspects:
- The heap-pressure bypass is narrowly scoped and intentionally avoids force=true semantics, keeping manual /compress behavior distinct from automatic compaction.
- Good test coverage across all three modified areas (heap-pressure bypass, bypassTokenThreshold service option, and getLastHistoryEntry() optimization).
- The O(1) last-history inspection is a sensible performance optimization that avoids unnecessary structuredClone costs.
- Code follows existing patterns and conventions; the changes integrate naturally with the existing compression flow.
Architectural notes:
- The separation between force (manual semantics) and bypassTokenThreshold (automatic heap-pressure relief) is well-justified and correctly implemented.
- The failed-compression latch (hasFailedCompressionAttempt) is correctly respected even when bypassing the token gate.

🎯 Specific Feedback

🟡 High

File: packages/core/src/core/geminiChat.ts:507-518 — The isHeapPressureHigh() check uses process.memoryUsage().heapUsed / heapLimit >= 0.7. This is a reasonable default, but the 0.7 threshold is hardcoded and may not suit all environments (e.g., containers with tight memory limits). Consider making this configurable via ChatCompression settings or an environment variable, especially since the PR description acknowledges this is a "transitional mitigation."
File: packages/core/src/core/geminiChat.ts:507 — The bypassTokenThreshold flag is set based on heap pressure at the start of tryCompress(), but there's no logging of the actual heap percentage that triggered it. Adding the actual ratio to the debug log (e.g., 'Heap pressure is high (X%); attempting auto-compaction...') would aid debugging and incident investigation.

🟢 Medium

File: packages/core/src/services/chatCompressionService.ts:237 — The condition if (!force && !bypassTokenThreshold) correctly gates the token-threshold check, but the comment above it ("steady-state path on every send") is now slightly outdated since bypassTokenThreshold introduces a non-steady-state path. Consider updating the comment to reflect that this gate is skipped when heap pressure is high.
File: packages/core/src/utils/nextSpeakerChecker.ts:63-73 — The variable naming is slightly confusing: comprehensiveHistory is assigned but then the comment says "If comprehensive history is empty" (lowercase). Also, the early-return check at lines 66-70 is now redundant with the getLastHistoryEntry() check at line 71. Consider consolidating to just use getLastHistoryEntry() and remove the explicit length check.
File: packages/core/src/core/geminiChat.ts:1188-1191 — The getLastHistoryEntry() method returns a defensive clone of only the last entry, which is correct. However, the JSDoc comment says "O(1) inspections" but doesn't explicitly note that this is still O(n) in the size of the last Content.parts array due to structuredClone(). This is a minor documentation clarification.

🔵 Low

File: packages/core/src/core/geminiChat.ts:58 — The constant HEAP_PRESSURE_COMPRESSION_RATIO = 0.7 is well-named, but consider adding a brief comment explaining why 70% was chosen (e.g., "Leaves 30% headroom for compression overhead and subsequent operations").
File: packages/core/src/services/chatCompressionService.ts:186-190 — The bypassTokenThreshold JSDoc is clear, but it might benefit from a cross-reference to the heap-pressure check in geminiChat.ts that sets this flag, helping future maintainers trace the flow.
File: packages/core/src/utils/nextSpeakerChecker.ts — The test file mocks both getHistory and getLastHistoryEntry, which is correct for testing. However, the test at line 278-295 ("should avoid cloning comprehensive history...") asserts getHistory is called once and getLastHistoryEntry once, but the mock setup at line 94-101 mocks both methods. Consider adding a test that verifies getHistory(/* curated */ true) is still called (for the curated history used in the LLM request) while getLastHistoryEntry() is used for the function-response check.

✅ Highlights

Excellent test coverage: All three test files have been updated with targeted tests for the new heap-pressure bypass, bypassTokenThreshold service option, and getLastHistoryEntry() optimization. The 146 passing tests provide strong confidence in the changes.
Thoughtful design: The decision to use bypassTokenThreshold instead of force=true demonstrates careful consideration of the different semantics between manual and automatic compaction. This keeps the change narrowly scoped as intended.
Performance-conscious: The getLastHistoryEntry() optimization avoids cloning the entire comprehensive history just to inspect the last message, which is a meaningful improvement for long conversation histories.
Build and typecheck pass: The PR author confirmed that npm run build && npm run typecheck completed successfully with zero errors, and all relevant test suites pass.

yiliang114 · 2026-05-15T15:41:40Z

Additional context on why this PR is intentionally small:

Memory/OOM has become an active area of investigation in this repo this week. Several contributors are already working on related but different slices, and I want to make the relationship between those efforts and this PR explicit.

Related work already in progress

My own related PRs / prior work:

feat(cli): add /doctor diagnostic command #3404 — added the general /doctor diagnostic command. This provides the command framework that later memory-specific diagnostics can build on.
feat(cli): add structured memory diagnostics JSON #3785 — adds /doctor memory, including process/V8 memory diagnostics and structured output. This is a diagnostics-oriented PR, not a runtime mitigation.
perf(core): bound session-list metadata reads to head/tail 64KB; pool buffer; lazy message count #3897 — bounds /resume session-list metadata reads to fixed-size head/tail windows and makes message counts lazy. This reduces session listing I/O and avoids scaling the picker with full transcript size.
fix(core): add heap-pressure auto-compaction safety net #4186 — this PR. Adds a small heap-pressure auto-compaction safety net and removes one full-history clone from nextSpeakerChecker.
Earlier related fixes include fix: prevent AbortSignal listener memory leak #1811, which fixed an AbortSignal listener leak, and fix(memory): use project transcript path for dream #3722, which corrected memory/dream transcript lookup. Those are narrower fixes in adjacent memory/session areas.

Other contributors' related work:

@doudouOUC:
- feat(core): add reactive compression on context overflow #3879 adds reactive compression after provider context-window overflow.
- fix(core): harden reactive compression follow-ups #3985 hardens the reactive compression follow-up paths.
- feat(telemetry): add hierarchical session tracing spans #4071, feat(telemetry): add interaction span and detailed sensitive attributes #4097, and feat(telemetry): unify span creation paths for hierarchical trace tree #4126 work on telemetry/session tracing. These are useful for observability, but some of the touched logging/tracing paths are also relevant when reasoning about retained data.
- feat(skills): add /stuck diagnostic skill for frozen sessions #4133 adds a /stuck diagnostic skill for frozen or slow sessions.
- doudouOUC also posted detailed analysis on the OOM reports, covering structuredClone() amplification, compression peak memory, and possible stream/logging/error-path retention.
@qqqys:
- feat(cli): add baseline /doctor memory diagnostics #4180 adds a baseline /doctor memory implementation.
- Add baseline /doctor memory diagnostics #4179, Add interpretation layer to /doctor memory diagnostics #4181, Add structured /doctor memory output and safe session-scale stats #4182, and Diagnose and mitigate large tool-result retention in long sessions #4184 split out follow-up diagnostic and large tool-result retention work.
- feat(core,cli): two-phase session listing for instant /resume first frame #3989 improves /resume by splitting session listing into lite/enrichment phases.
- feat(cli): add fork-session resume flag #4159 adds a fork-session resume flag.
@LaZzyMan:
- feat(core)!: redesign auto-compaction thresholds with three-tier ladder #4168 is a broader auto-compaction threshold redesign using a three-tier ladder.
- fix(core): decouple auto-memory recall from main-agent request path #4172 decouples auto-memory recall from the main-agent request path.
- feat(worktree): Phase C — session persistence, hooksPath, Footer + WorktreeExitDialog, three-mode --resume restore #4174 adds worktree session persistence and resume restoration.
- feat(core): strip inline media before chat compaction summary #4101, already merged, strips inline media before compaction summaries, reducing compaction payload pressure.
fix(core): memory-based chat compression to prevent heap OOM #4127 by @Dinsmoor also explores memory-based compression, but uses fixed heap thresholds.

Why this PR exists separately

The larger memory-related efforts above are valuable, but they are either diagnostic-only, broader in scope, or touch many files and therefore need more review time. Meanwhile, recent reports show users are already hitting V8 heap OOMs in long sessions.

Based on this week's investigation and my local reproduction of the heap OOM pattern, the immediate failure mode is that token/context-based compaction can be too late. On large context-window models, especially 1M-context models, the 70% token threshold may not be reached before V8 heap is already near its limit. At that point, extra allocations from history cloning, compression setup, large tool outputs, UI/logging state, stream processing, or error handling can push the process into:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

This PR is therefore intentionally a temporary mitigation rather than the final memory architecture.

What this PR does

The narrow fix here is:

Add a V8 heap-pressure safety net based on heapUsed / heap_size_limit.
When heap pressure is high, bypass only the token threshold gate for auto-compaction.
Keep force=false, so this does not take the broader manual /compress path.
Avoid one known full-history clone in nextSpeakerChecker by adding an O(1) last-history accessor.

What this PR does not try to solve

This PR does not implement:

/doctor memory diagnostics
heap snapshots
large tool-result offload
stream/logging retention reduction
a full auto-compaction threshold redesign
session resume storage redesign

Those should continue in the existing follow-up PRs/issues.

The intent of this PR is to provide a quick, low-conflict safety net that reduces current long-session OOM risk while the more complete memory work continues.

doudouOUC

Code Review

整体评价：PR 的设计思路是正确的——bypassTokenThreshold 与 force=true 的语义区分很清晰，nextSpeakerChecker 的优化也干净无回归。以下是几个需要关注的问题：

🟠 高优先级：`hasFailedCompressionAttempt` 闩锁使安全网变成一次性机制

位置: chatCompressionService.ts 第 216 行附近的前置门控与新增的 bypassTokenThreshold 的交互

当堆压力触发压缩（bypassTokenThreshold=true）但压缩失败后，hasFailedCompressionAttempt 被置为 true。后续消息中即使堆压力持续升高，服务在前置门控就提前退出（NOOP），永远到不了 token 阈值门控：

// 第 216 行 — hasFailedCompressionAttempt 门控，不受 bypassTokenThreshold 影响
if (threshold <= 0 || (hasFailedCompressionAttempt && !force)) {
  return { newHistory: null, info: { ... NOOP } };
}

// 第 234 行 — bypassTokenThreshold 只绕过了这里
if (!force && !bypassTokenThreshold) { ... }

影响: 安全网实质上是一次性的——一次压缩失败后就永久失效。如果堆继续增长，OOM 仍会发生。

建议修复:

将 bypassTokenThreshold 也纳入第 216 行门控: (hasFailedCompressionAttempt && !force && !bypassTokenThreshold)
或者在堆压力触发时不标记 hasFailedCompressionAttempt = true

🟡 中优先级：`isHeapPressureHigh()` 做了两次 native 调用

getHeapStatistics() 已经提供了 used_heap_size（与 process.memoryUsage().heapUsed 是同一个 V8 指标）。一次调用即可获取分子分母，避免热路径上多余的同步 native 调用和微小的 TOCTOU 不一致：

private isHeapPressureHigh(): boolean {
  const stats = getHeapStatistics();
  const heapLimit = stats.heap_size_limit;
  if (!Number.isFinite(heapLimit) || heapLimit <= 0) return false;
  return stats.used_heap_size / heapLimit >= HEAP_PRESSURE_COMPRESSION_RATIO;
}

🟡 中优先级：缺失关键交互场景的测试

当前测试覆盖了 bypassTokenThreshold=true 绕过 token 门控的正常路径，但没有覆盖 hasFailedCompressionAttempt=true && bypassTokenThreshold=true 的场景。建议增加此边界 case 的测试，确认行为是否符合预期。

🔵 低优先级：`getLastHistoryEntry()` 的 `structuredClone`

这是对之前全量 clone 的显著改进。但在 nextSpeakerChecker 中实际只检查 role 和是否是 function response（纯只读操作）。如果将来性能分析发现瓶颈，可考虑提供无拷贝的只读版本。当前实现正确且安全，不急需处理。

🤖 Generated with Qwen Code

Let heap-pressure bypass also bypass the failed-attempt latch and cover the interaction with a regression test. Link: QwenLM#4185

Verify next-speaker checks use the raw last history entry for function responses while the LLM prompt still uses curated history. Link: QwenLM#4185

yiliang114 · 2026-05-16T08:20:05Z

Follow-up pushed after the review pass:

aa9bbb838 fixes the failed-attempt latch interaction so heap-pressure auto-compaction is not disabled after one failed attempt.
It also logs the actual heap pressure ratio, uses a single getHeapStatistics() read, and documents the raw-history lookup in nextSpeakerChecker.
7b1730a5 adds a regression test proving nextSpeakerChecker still uses the raw last history entry for functionResponse detection while the LLM prompt path uses curated history.

I intentionally did not add a getHistoryLength() small-history gate. A chat can have only a few entries but still contain one very large tool/function-response payload, which is one of the cases this safety net is meant to compact. Entry count would be a weak proxy and could skip the mitigation in exactly that case.

Local verification after the latest push:

cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts -> 148 tests passed
cd packages/core && npm run typecheck -> passed
npm run build && npm run typecheck -> passed, with only the existing vscode companion curly warning / Browserslist notices

wenshao

⚠️ CI has 3 failing checks: Post Coverage Comment, Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x).

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao

[Critical] Heap-pressure compaction failure permanently disables all subsequent auto-compaction via the shared hasFailedCompressionAttempt latch.

File: packages/core/src/core/geminiChat.ts, line 558 (existing code, not in diff)

When bypassTokenThreshold=true and compression fails (e.g., INFLATED_TOKEN_COUNT), line 558 sets hasFailedCompressionAttempt = true unconditionally (only checking !force). On the next turn, if heap pressure has subsided, bypassTokenThreshold becomes false, and the service-level latch at chatCompressionService.ts:224 blocks normal token-threshold auto-compaction with (hasFailedCompressionAttempt && !force && !bypassTokenThreshold).

This means a failed heap-pressure compaction poisons the primary auto-compaction path for the rest of the chat session. The only recovery is a successful manual /compress.

Suggested fix:

// line 558: change from
if (!force) {
// to
if (!force && !bypassTokenThreshold) {

— DeepSeek/deepseek-v4-pro via Qwen Code /review

# Conflicts: # packages/core/src/core/geminiChat.test.ts # packages/core/src/core/geminiChat.ts

wenshao

Reviewed the combined changes (40f9167 + c1c6f70). The latch fix is correct and the new 30-second heap-pressure cooldown addresses the primary concern from the prior review pass. Build passes, all 153 targeted tests pass.

One observation on the reactive overflow handler (geminiChat.ts:892): it still sets hasFailedCompressionAttempt unconditionally, while the heap-pressure path now uses a brief cooldown. Both are automatic safety nets but the latch policies differ. This is currently outside the diff scope and likely intentional (overflow failure is a stronger signal), but a brief comment would help future maintainers distinguish the two paths.

— claude-opus-4-7 via Qwen Code /review

wenshao

No new issues found. All prior Critical/Suggestion concerns from the previous review round are addressed. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao

本地拉 b20d99130 验证：

Critical（latch 污染）已修：geminiChat.ts:578-583 heap-pressure 失败/NOOP 走 30s cooldown 分支，不再设 hasFailedCompressionAttempt；只有 L584-591 普通 token-threshold failure 才 latch。chatCompressionService.ts:229 多加了 !bypassTokenThreshold 防御性短路。成功路径 reset 两个状态（L576-577）。
测试覆盖：8 条 heap-pressure / cooldown 专项断言（geminiChat.test.ts L3701-L3823），含"失败后不 latch"、"30s 后恢复"、"failed-inflated / noop 两种 cooldown 触发"、"heap stats 不可用回落 token-threshold"。
本地结果：tsc 0 错；156/156 单测过（geminiChat 88 + chatCompressionService 56 + nextSpeakerChecker 12）；CLI smoke 跑通无回归。

L892 reactive overflow handler 仍然无条件设 latch、跟 heap-pressure 路径不一致——这是 diff 外的既有行为，做个 follow-up 注释即可，不阻塞。

LGTM ✅

wenshao · 2026-05-16T15:35:15Z

tmux 真实交互测试报告（补 11:54Z approve 后追加验证）

Head: b20d9913 — 与 11:54Z approve 时一致，CI 仍全绿，mergeStateStatus CLEAN。

驱动方式

tmux 起 220×50 pane 跑真实 qwen TUI（DeepSeek 后端，--max-old-space-size=128 限 V8 old-space heap 到 128MB，DEBUG=1 捕获完整 debug log），通过 tmux send-keys 注入真键盘。

实测流程

[boot]               CLI 启动（128MB heap 限制下）
[turn 1] X reply     "Reply with single character X" → ✦ X
[turn 2] Y reply     "Reply with single character Y" → ✦ Y
[turn 3] Z reply     "Reply with single character Z" → ✦ Z
[turn 4] @ ref       "@big.txt count the chars"      → 模型调 shell `wc -c`，回 `28,329,950 字符`
[/about]             查内存与 context

观察

项	结果
CLI 在 `--max-old-space-size=128` 上下文里启动	✅ 无 OOM、无 startup error
4 轮 LLM 对话（含 @big.txt 引用 27MB 随机 base64 文件）	✅ 全部回复正常，无崩溃
`nextSpeakerChecker` 的 `getLastHistoryEntry()` O(1) 优化	✅ 每轮模型回复后正常推进，无回归
内存使用（`/about` 报告 Memory Usage）	RSS 266.9 MB（`--max-old-space-size` 只限 V8 old-space heap，进程总 RSS 还有 native + V8 metadata）
Context 占用	1.9% — 远低于 token 阈值
debug log 含 `tryCompress` / `Heap pressure` / `bypassToken` 关键字	0 行（163 行 debug log 全 grep）
无意外触发的 compaction	✅ 在 context 1.9% + heap 未到 70% 时，safety net 正确 no-op

70% heap pressure 真实触发的局限

V8 heap_size_limit 被 --max-old-space-size=128 限到 128MB，要让 used_heap_size / heap_size_limit >= 0.7 得让 V8 old-space 实际用到 ~90MB。我尝试用 @big.txt 加载 27MB 文件让 context 膨胀，但模型聪明地调了 shell wc -c 而不是把文件读进 context——这正是这个 OOM 报告的痛点：heap pressure 是长 session、大 context model、大 tool output 累积出来的低概率事件，手动很难在短时间内复现。

PR 的 heap-pressure 触发分支只在 unit tests 里通过 mockGetHeapStatistics(750) 模拟 75% 来真正验证（已 156/156 pass，含失败后不 latch + 30s cooldown + heap stats 不可用回落 token-threshold 等 8 条专项断言）。tmux 实测能确认的两点：

✅ Safety net 代码不会让 CLI 启动/运行失败
✅ 不会在 heap 未到阈值时假阳性触发 compaction（这本身是回归测试，因为新增分支引入了额外的状态读取与判断）

综合判定

11:54Z 之后无新 commit、无新 review，CI 持续全绿
tmux 跑了 4 轮真 LLM 对话 + memory 巡查 + debug log 全文 grep，没有任何回归
Unit tests 覆盖了 70% pressure 触发路径；tmux 验证了 no-pressure 路径不退化
11:29Z 那条 L892 reactive overflow handler 跟 heap-pressure 政策不一致的注释建议属于 follow-up，不阻塞

可以直接合入。

supercargotim-rgb · 2026-05-17T08:32:55Z

выпускайте в прод на сборку плиз - невозможно работать когда по 5 раз за день вылетает редактор)))

yiliang114 · 2026-05-20T08:00:31Z

代码审查（补）

PR 已合并且经过了充分的 review，追加几点观察：

✅ 确认项

所有前序 review 中发现的问题均已正确修复：

latch 污染 — chatCompressionService.ts:229 三重门控 + geminiChat.ts:578 heap-pressure 失败走 30s cooldown 而非设 latch
双重 getHeapStatistics() 调用 — 已改为单次调用 getHeapPressureRatio()
heap ratio 日志 — debugLogger.warn 输出精度百分比

🟢 低优先级观察

getLastHistoryEntry() JSDoc: 注释说避免 O(history) clone，但 structuredClone 本身开销跟最后一条 entry 大小成正比。可加上 "cloning cost is proportional to the last entry's size" 澄清。
Reactive overflow handler (geminiChat.ts:892) 与 heap-pressure 路径策略不一致: overflow handler 仍无条件设 hasFailedCompressionAttempt latch，而 heap-pressure 路径用 30s cooldown。wenshao 在 final review 中也指出了这点，建议 follow-up 加注释区分两条路径。
HEAP_PRESSURE_COMPRESSION_RATIO = 0.7 硬编码: PR 描述已标记为过渡方案，可接受。

📝 注意

当前工作树中 HEAP_PRESSURE_COMPRESSION_RATIO 已改为 99.0（带注释 "Disable"），这说明后续有其他 PR 在处理这个问题。

总体: PR 设计正确、测试充分（8 条专项断言 + 148 测试过），review 流程完整。无阻塞性问题。

The heap-pressure-based auto-compaction mechanism (added in PR #4186) was a temporary mitigation for OOM caused by structuredClone on the hot path. With the shallow copy fix (copyContentContainer) eliminating clone peaks, this safety net is no longer necessary: - 9/9 benchmark cases pass with safety net disabled (ratio=99.0) - Peak RSS stays ≤743 MB, well below 2GB default heap limit - Claude Code uses purely token-based autocompact with no heap-pressure logic Removed: - HEAP_PRESSURE_COMPRESSION_RATIO constant and cooldown - getHeapPressureRatio() method and node:v8 import - bypassTokenThreshold parameter in ChatCompressionService - All related tests

…istoryShallow) Main landed #4286 (replace structuredClone with shallow copy) which: - Reverted #4186's heap-pressure auto-compaction safety net (#4286 removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM cause was fixed by the shallow-copy refactor) - Reverted #4168's consecutiveFailures ladder back to single-shot hasFailedCompressionAttempt - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace structuredClone-based history access - Added a Chinese-language design doc draft for this exact redesign Resolution strategy: - Take OUR redesign everywhere it conflicts: three-tier threshold ladder, consecutiveFailures circuit breaker, hard-rescue, token estimator, hard-rescue debug log, CompressOptions plumbing for pendingUserMessage / precomputedEffectiveTokens / trigger. - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil / HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code (heap-pressure mechanism is gone on main; we're not reviving it). - Use main's new getHistoryShallow(true) in chatCompressionService and in the hard-tier rescue estimator path (was getHistory(true) before main's refactor; the shallow path is what other compaction call sites now use). - For chatCompressionService.test.ts inline mockChat objects, alias getHistoryShallow to the same vi.fn() as getHistory so existing .mockReturnValue() calls drive both methods. - For the design doc, keep our resolved Open Question 2 closure rationale and prepend the round-2 blockquote clarifying the Background section describes pre-redesign behavior; take main's slightly more thorough SUMMARY_RESERVE paragraph where it explains both with/without-thinking cases. - Replace the round-2 test that asserted "hard-rescue forwards consecutiveFailures=3" with a test compatible with the post-merge history-access shape (now using getHistoryShallow). 346 core tests passing; CLI typecheck clean for affected files. Pre-existing provider-config typecheck errors from main's #4287 refactor are unrelated to this PR and not touched here.

fix(core): add heap pressure compaction safety net

be97c79

Link: QwenLM#4185

wenshao requested changes May 16, 2026

View reviewed changes

Comment thread packages/core/src/services/chatCompressionService.ts

Comment thread packages/core/src/core/geminiChat.ts

Comment thread packages/core/src/core/geminiChat.ts

wenshao reviewed May 16, 2026

View reviewed changes

Comment thread packages/core/src/utils/nextSpeakerChecker.ts Outdated

Comment thread packages/core/src/services/chatCompressionService.ts

This was referenced May 16, 2026

📊 AI CLI 工具社区动态日报 2026-05-16 zx0828/big_model_radar#58

Open

📊 AI CLI 工具社区动态日报 2026-05-16 gsscsd/big_model_radar#351

Open

doudouOUC reviewed May 16, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 16, 2026

📊 AI CLI 工具社区动态日报 2026-05-16 ivanweng2077/big_model_radar#49

Open

yiliang114 added 2 commits May 16, 2026 16:03

fix(core): keep heap pressure compaction active

aa9bbb8

Let heap-pressure bypass also bypass the failed-attempt latch and cover the interaction with a regression test. Link: QwenLM#4185

test(core): cover raw next-speaker history lookup

7b1730a

Verify next-speaker checks use the raw last history entry for function responses while the LLM prompt still uses curated history. Link: QwenLM#4185

wenshao requested changes May 16, 2026

View reviewed changes

Comment thread packages/core/src/core/geminiChat.ts

Comment thread packages/core/src/services/chatCompressionService.ts

wenshao requested changes May 16, 2026

View reviewed changes

Comment thread packages/core/src/core/geminiChat.ts Outdated

Comment thread packages/core/src/core/geminiChat.ts

yiliang114 added 4 commits May 16, 2026 18:08

fix(core): avoid latching heap-pressure compression failures

40f9167

fix(core): back off failed heap-pressure compaction

c1c6f70

Merge remote-tracking branch 'origin/main' into codex/pr-4186-review

bb01181

# Conflicts: # packages/core/src/core/geminiChat.test.ts # packages/core/src/core/geminiChat.ts

fix(core): back off heap-pressure no-op compaction

b20d991

wenshao reviewed May 16, 2026

View reviewed changes

wenshao approved these changes May 16, 2026

View reviewed changes

wenshao merged commit 2c200a3 into QwenLM:main May 16, 2026
11 checks passed

yiliang114 mentioned this pull request May 20, 2026

fix(core): replace structuredClone with shallow copy to prevent OOM in long sessions #4286

Merged

LaZzyMan mentioned this pull request May 21, 2026

feat(core)!: redesign auto-compaction thresholds with three-tier ladder #4345

Merged

4 tasks

yiliang114 mentioned this pull request May 21, 2026

[P3] Memory Diagnostics / 内存诊断 #3000

Closed

This was referenced May 22, 2026

长时间运行经常出现内存错误，导致进程崩溃 #4399

Closed

problem critical error #4116

Closed

DragonnZhang mentioned this pull request Jun 8, 2026

fix(core): memory-based chat compression to prevent heap OOM #4127

Closed

Uh oh!

Conversation

yiliang114 commented May 15, 2026

Summary

Validation

Scope / Risk

Coordination with related memory work

Testing Matrix

Linked Issues / Bugs

Uh oh!

github-actions Bot commented May 15, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🟡 High

🟢 Medium

🔵 Low

✅ Highlights

Uh oh!

yiliang114 commented May 15, 2026

Related work already in progress

Why this PR exists separately

What this PR does

What this PR does not try to solve

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doudouOUC left a comment

Choose a reason for hiding this comment

Code Review

🟠 高优先级：hasFailedCompressionAttempt 闩锁使安全网变成一次性机制

🟡 中优先级：isHeapPressureHigh() 做了两次 native 调用

🟡 中优先级：缺失关键交互场景的测试

🔵 低优先级：getLastHistoryEntry() 的 structuredClone

Uh oh!

yiliang114 commented May 16, 2026

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented May 16, 2026

tmux 真实交互测试报告（补 11:54Z approve 后追加验证）

驱动方式

实测流程

观察

70% heap pressure 真实触发的局限

综合判定

Uh oh!

Uh oh!

supercargotim-rgb commented May 17, 2026

Uh oh!

yiliang114 commented May 20, 2026

代码审查（补）

✅ 确认项

🟢 低优先级观察

🟠 高优先级：`hasFailedCompressionAttempt` 闩锁使安全网变成一次性机制

🟡 中优先级：`isHeapPressureHigh()` 做了两次 native 调用

🔵 低优先级：`getLastHistoryEntry()` 的 `structuredClone`