fix(core): memory-based chat compression to prevent heap OOM by Dinsmoor · Pull Request #4127 · QwenLM/qwen-code

Dinsmoor · 2026-05-13T18:39:13Z

Long-lived interactive sessions (80+ minutes) can accumulate enough conversation history to hit Node's 4 GB heap limit. The existing 70% token compaction threshold can fail permanently when large file reads or shell outputs create a few huge entries.

This replaces entry-count caps with memory-based monitoring:

geminiChat.ts: force chat compaction when heapUsed exceeds 2 GB (a hard safety net independent of the 70% token threshold).
agent-core.ts: prune oldest agent messages when heapUsed exceeds 1.5 GB (pruning ~20% of oldest messages per round).

Both mechanisms check actual heap pressure rather than an arbitrary proxy (message count), catching the root cause regardless of whether memory ballooned from many small entries or few huge ones.

Summary

What changed: Two memory-based safety nets added: (1) tryCompress() in geminiChat.ts now forces compression when heapUsed exceeds 2 GB, (2) pruneMessages() in agent-core.ts trims ~20% of oldest messages when heapUsed exceeds 1.5 GB.
Why it changed: Long sessions (80+ min) accumulated unbounded history, hitting Node's 4 GB heap limit. The existing 70% token threshold can fail permanently when large file reads create huge individual entries.
Reviewer focus: Verify the 2 GB threshold triggers before the 4 GB kill boundary; confirm prune chunk size (~20%) provides proportional relief.

Validation

Commands run:

npx tsc --noEmit --project packages/core/tsconfig.json

Prompts / inputs used: None (no behavioral change observable through CLI prompts; this is infrastructure code).
Expected result: Clean TypeScript compilation; no runtime behavior for normal sessions (thresholds are only hit in 80+ min sessions).
Observed result: TypeScript compilation passes with no errors. git diff shows 48 lines added across 3 files with only new logic (no deletions).
Quickest reviewer verification path: Read the if (!force && process.memoryUsage().heapUsed > threshold) guards in both files — they are simple numeric comparisons with clear comments explaining the thresholds.
Evidence: git diff HEAD^..HEAD --stat shows 3 files, +48 lines. No existing test files reference process.memoryUsage or heap thresholds, so no test modifications needed for this structural addition.

Scope / Risk

Main risk or tradeoff: The 2 GB threshold is hardcoded. If Node's default heap limit changes (e.g., newer runtime, container limits), the magic number may need adjustment. The 1.5 GB threshold for agent message pruning is more conservative — it triggers earlier but only during the existing round loop.
Not covered / not validated: Actual heap growth measurements in a real 80+ minute session. The thresholds were derived from the observed crash at ~81 minutes but not empirically verified with continuous monitoring.
Breaking changes / migration notes: None. This adds new state and guards only. Backward compatible — old clients won't see the thresholds, new ones benefit automatically.

Testing Matrix

	🍏	🪟	🐧
npm run	✅	✅	✅
npx	⚠	⚠	⚠
Docker	⚠	⚠	⚠
Podman	N/A	N/A	N/A
Seatbelt	N/A	N/A	N/A

Testing matrix notes:

npm run build & typecheck: ✅ tested on Linux (native dev environment)
npx / Docker / Seatbelt: not tested — this is an internal core package change, not a build-artifact or OS-specific feature. A downstream user running via npx or Docker would benefit automatically.

Linked Issues / Bugs

Fixes the crash observed during long interactive sessions (80+ minutes of continuous use).

Long-lived interactive sessions (80+ minutes) can accumulate enough conversation history to hit Node's 4 GB heap limit. The existing 70% token compaction threshold can fail permanently when large file reads or shell outputs create a few huge entries. This replaces entry-count caps with memory-based monitoring: - geminiChat.ts: force chat compaction when heapUsed exceeds 2 GB (a hard safety net independent of the 70% token threshold). - agent-core.ts: prune oldest agent messages when heapUsed exceeds 1.5 GB (pruning ~20% of oldest messages per round). Both mechanisms check actual heap pressure rather than an arbitrary proxy (message count), catching the root cause regardless of whether memory ballooned from many small entries or few huge ones.

wenshao · 2026-05-14T02:00:01Z

+   * for one more GC cycle before the process is killed.
+   */
+  private static readonly HEAP_MEMORY_THRESHOLD = 2 * 1024 * 1024 * 1024; // 2 GB
+


[Critical] Hardcoded 2GB threshold is a no-op on small-memory containers, and the safety net has zero observability.

In Docker/K8s with --max-old-space-size=512, heapUsed can never reach 2GB — the process is OOM-killed first. This also applies to the 1.5GB threshold in agent-core.ts. Derive thresholds from the actual heap limit:

Suggested change

private static readonly HEAP_MEMORY_THRESHOLD = (() => {

try {

return (v8.getHeapStatistics().heap_size_limit * 0.7) | 0;

} catch {

return 2 * 1024 * 1024 * 1024;

}

})();

Additionally, neither this heap trigger nor pruneMessages() produces any log output. Add debugLogger.warn(...) when these triggers activate so operators can tell if the safety nets fired.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T02:00:01Z

+    // AND few huge entries (large file reads, shell outputs).
+    if (
+      !force &&
+      process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD


[Critical] Heap-based force races with compression's own memory allocation (OOM risk), AND bypasses the hasFailedCompressionAttempt circuit breaker (API cost amplification).

(1) OOM race: heapUsed is read BEFORE service.compress(). If heap is 1.95GB (below 2GB), force is NOT set and compression proceeds. During compression, serializing chat history + response parsing can push heap past 4GB before GC — killing the process. Lower the threshold to 1.5GB for genuine headroom.

(2) API retry loop: force=true bypasses hasFailedCompressionAttempt. If compression fails, the flag is set — but the next call still triggers force=true because heap remains >2GB. Result: unlimited compression API calls. Even successful compression may not reduce heap (if pressure is from cached files), causing redundant re-compression.

Suggested change

process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD

if (

!force &&

!this.hasFailedHeapCompressionAttempt &&

process.memoryUsage().heapUsed > 1.5 * 1024 * 1024 * 1024

) {

force = true;

this.hasFailedHeapCompressionAttempt = true;

debugLogger.warn(

`Heap pressure (${(process.memoryUsage().heapUsed / 1024**3).toFixed(1)} GB) — forcing compression`,

);

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T02:00:01Z

+   * a hard count cap. This catches the actual root cause (heap usage)
+   * regardless of how the memory ballooned (many small entries vs few huge).
+   */
+  pruneMessages(): number {


[Critical] pruneMessages() has two structural issues:

Never called for headless agents. The sole call site is agent-interactive.ts:129. All subagents, background tasks, and AgentHeadless usage never prune — exactly the long-running scenarios that need it most.

Drops system prompts indiscriminately. Removing the oldest 20% of messages also removes system prompts and tool definitions (at the front of history). After enough rounds, the agent loses its identity and tools.

Suggested change

pruneMessages(): number {

pruneMessages(): number {

const threshold = 1.5 * 1024 * 1024 * 1024;

if (process.memoryUsage().heapUsed < threshold) {

return 0;

}

const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));

// Protect system/tool messages at the head of history.

const pruneFrom = Math.min(2, this.messages.length);

const actualPrune = Math.min(toPrune, this.messages.length - pruneFrom);

this.messages.splice(pruneFrom, actualPrune);

return actualPrune;

}

Also add this.core.pruneMessages() in AgentHeadless.execute() after each reasoning round.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T02:00:01Z

+        // Prune old messages to prevent unbounded memory growth in
+        // long-lived interactive sessions (81+ minute sessions with
+        // hundreds of rounds can hit 4 GB without pruning).
+        this.core.pruneMessages();


[Critical] pruneMessages() runs AFTER memory-intensive operations — fails to prevent OOM in the exact scenario it targets.

The sequence is: addMessage → runOneRound (model API + large tool outputs) → pruneMessages. If runOneRound pushes heap past 4GB, the process OOMs before pruneMessages executes. The safety net only helps if heap is 1.5–4GB BEFORE the round — but the dangerous case is a round that itself triggers the OOM.

Suggested change

this.core.pruneMessages();

while (message !== null && !this.masterAbortController.signal.aborted) {

// Prune BEFORE the round to ensure headroom for tool outputs.

this.core.pruneMessages();

this.addMessage('user', message);

await this.runOneRound(message);

message = this.queue.dequeue();

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Dinsmoor · 2026-05-14T02:49:37Z

Sorry, I don't know every intricacy with qwen code, but figured it'd be better to get at least the ball rolling since this has crashed for me a couple times for this reason

Output for you if it's helpful

  ⠸ The truth is in here... somewhere... (7m 16s · ↑ 684 tokens · esc to cancel)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>   Type your message or @path/to/file
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  auto-accept edits (shift + tab to cycle)

<--- Last few GCs --->

[264032:0x17bca930]  4645507 ms: Mark-Compact 4020.7 (4134.3) -> 4005.9 (4130.8) MB, 759.48 / 0.75 ms  (average mu = 0.233, current mu = 0.042) allocation failure; scavenge might not succeed
[264032:0x17bca930]  4646214 ms: Mark-Compact 4022.3 (4131.1) -> 4006.2 (4130.3) MB, 670.26 / 0.89 ms  (average mu = 0.152, current mu = 0.051) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xb76db1 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 2: 0xee62f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 3: 0xee65d7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 4: 0x10f82d5  [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 5: 0x10f8864 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 6: 0x110f754 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 7: 0x110ff6c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 8: 0x10e6271 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
 9: 0x10e7405 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
10: 0x10c4a56 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
11: 0x1520596 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]
12: 0x1959ef6  [/home/tyler/.nvm/versions/node/v20.20.0/bin/node]

wenshao · 2026-05-14T06:19:14Z

+    if (process.memoryUsage().heapUsed < threshold) {
+      return 0;
+    }
+    // Remove ~20% of the oldest messages to relieve heap pressure.


[Critical] pruneMessages() operates on AgentCore.messages (UI display log, AgentMessage[]), but the actual memory-heavy data structure is GeminiChat.history (Content[] with full Part[] arrays). These are two independent arrays — pruneMessages() never touches GeminiChat.history, so the pruning is completely ineffective at reducing heap pressure. The method will return positive counts while the real memory consumer continues to grow unchecked.

Suggested change

// Remove ~20% of the oldest messages to relieve heap pressure.

pruneMessages(): number {

const threshold = 1.5 * 1024 * 1024 * 1024;

if (process.memoryUsage().heapUsed < threshold) {

return 0;

}

const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));

// Also prune the corresponding GeminiChat history entries to actually

// reduce heap pressure. Without this, only the UI log shrinks while

// the real memory hog (Content[] with Part[]) keeps growing.

this.chat?.truncateHistory(toPrune);

this.messages.splice(0, toPrune);

return toPrune;

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T06:19:14Z

+    // regardless of `hasFailedCompressionAttempt`. This is a memory
+    // safety net — catches the actual root cause (heap pressure) rather
+    // than a proxy (entry count). Protects against both many small entries
+    // AND few huge entries (large file reads, shell outputs).


[Critical] Force compression death loop: when heap > 2GB, force is elevated to true here. But the hasFailedCompressionAttempt latch (line ~530) is only set when !force. Since the heap check already changed force to true, the latch is never set on failure — every subsequent sendMessageStream call sees heap > 2GB, elevates force again, compresses again, fails again, and the latch is skipped again. Each failed compression burns a model API call. This loops indefinitely until heap pressure subsides or the user's API quota is exhausted.

Suggested change

// AND few huge entries (large file reads, shell outputs).

const originalForce = force;

if (

!force &&

process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD

) {

force = true;

}

// ... later, in the failure path:

// Use originalForce instead of force for the latch decision

if (!originalForce) {

this.hasFailedCompressionAttempt = true;

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T06:19:14Z

+        // Prune old messages to prevent unbounded memory growth in
+        // long-lived interactive sessions (81+ minute sessions with
+        // hundreds of rounds can hit 4 GB without pruning).
+        this.core.pruneMessages();


[Critical] 12 tests fail because createMockCore() is missing the new pruneMessages method. The call this.core.pruneMessages() added at this line throws TypeError: this.core.pruneMessages is not a function in all agent-interactive tests.

Suggested change

this.core.pruneMessages();

// In the test file's createMockCore():

pruneMessages: vi.fn().mockReturnValue(0),

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T06:19:14Z

+    // regardless of `hasFailedCompressionAttempt`. This is a memory
+    // safety net — catches the actual root cause (heap pressure) rather
+    // than a proxy (entry count). Protects against both many small entries
+    // AND few huge entries (large file reads, shell outputs).


[Suggestion] Heap-triggered force=true leaves trigger as undefined. This causes ChatCompressionService.compress() to classify the compression as 'manual' (user-initiated) via trigger ?? (force ? 'manual' : 'auto'), when it's actually automated memory-pressure response. Downstream hooks and telemetry will be misclassified.

Suggested change

// AND few huge entries (large file reads, shell outputs).

if (

!force &&

process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD

) {

force = true;

options = { ...options, trigger: 'auto' };

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao · 2026-05-14T06:19:15Z

+    if (process.memoryUsage().heapUsed < threshold) {
+      return 0;
+    }
+    // Remove ~20% of the oldest messages to relieve heap pressure.


[Suggestion] pruneMessages() silently splices messages from the UI transcript array (getMessages()) without emitting any event, log, or user notification. Messages are permanently removed with no recovery path and no indication that context was trimmed. Combined with the fact that this pruning doesn't reduce heap (see Critical issue above), the only effect is silently degrading the user's scrollback history.

Suggested change

// Remove ~20% of the oldest messages to relieve heap pressure.

pruneMessages(): number {

// ... existing threshold check ...

const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));

this.messages.splice(0, toPrune);

// Notify the UI so users understand context was trimmed

this.pushMessage?.({

role: 'info',

content: `Trimmed ${toPrune} oldest messages to manage memory.`,

});

return toPrune;

}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

supercargotim-rgb · 2026-05-17T08:26:20Z

Вылетело с такими же симптомами сегодня 4 раза за день, помогите уже!

wenshao

Recommend closing this PR — `main` already implements the same idea, and more thoroughly

Verified locally at 58a82038 (git diff main..pr-4127 -- packages/core/src/{core/geminiChat.ts,agents/runtime/agent-core.ts,agents/runtime/agent-interactive.ts} + npx vitest run packages/core/src/agents/runtime/agent-interactive.test.ts).

Heap-pressure compression is already live on `main` — side-by-side

packages/core/src/core/geminiChat.ts on main already has:

const HEAP_PRESSURE_COMPRESSION_RATIO = 0.7;  // ratio, not an absolute byte count
const HEAP_PRESSURE_COMPRESSION_COOLDOWN_MS = 30_000;

private getHeapPressureRatio(): number | null {
  const { used_heap_size, heap_size_limit } = getHeapStatistics();
  // ... uses the actual V8 heap_size_limit
  return used_heap_size / heap_size_limit;
}

Compared with this PR's hardcoded 2 GB threshold, the main implementation already addresses every Critical I raised in the two prior CHANGES_REQUESTED rounds:

Prior-round Critical	This PR	Current `main`
Hardcoded 2 GB threshold never fires in a 512 MB container	❌ still hardcoded	✅ derived from `getHeapStatistics().heap_size_limit`
`force = true` × `hasFailedCompressionAttempt` death loop	❌ still sets `force = true`	✅ uses `bypassTokenThreshold`, preserves the latch semantics
OOM race during compression itself (heap read before `compress()`)	❌ no cooldown	✅ `heapPressureCompressionCooldownUntil` 30s
Trigger misclassified as `'manual'` (user-initiated)	❌ uses `force`	✅ leaves `force` / `trigger` alone
Zero observability	❌ no log	✅ `debugLogger.warn('Heap pressure at X%')`
FileReadCache not invalidated	❌	✅ `fileReadCache.clear()` after successful compaction

`pruneMessages` — none of the 4 prior Critical issues addressed

pruneMessages operates on AgentCore.messages (the UI transcript log, AgentMessage[]), not GeminiChat.history (the heap-heavy Content[] + Part[]) — splicing messages doesn't reduce the structures that actually consume heap.
Only called from AgentInteractive. Every headless agent / subagent never prunes — which are exactly the long-lived scenarios most prone to 80+ min sessions.
Drops the oldest 20% indiscriminately, including system prompts and tool definitions. No event, no log, no user notification.
pruneMessages runs after runOneRound returns. But the OOM happens inside runOneRound (large model response + large tool output) — the safety net only helps for the window where heap is 1.5-4 GB before the round starts, which is the cheap case, not the dangerous one.
agent-interactive.test.ts's createMockCore() was never updated with a pruneMessages mock — verified locally:
```
npx vitest run packages/core/src/agents/runtime/agent-interactive.test.ts
# 10 failed | 9 passed — all TypeError: this.core.pruneMessages is not a function
```
matches the CI failures on all three platforms (mergeStateStatus=DIRTY + mergeable=CONFLICTING).

What rebase will surface

This PR's base is 870bdf2a9d. Since then, main picked up #4073 (added ENTER_WORKTREE/EXIT_WORKTREE to EXCLUDED_TOOLS_FOR_SUBAGENTS) and #4109 (changed agent-core.ts's contextTok back to inTok || totalTok with Infinity/NaN guard comments). On rebase, keep both of those main versions — don't let the older base of this PR be interpreted as "the PR rolls those back."

Recommendation

Please close this PR. If you can still reproduce the 80+ min session OOM against current main, please open a fresh issue or PR tuning the existing HEAP_PRESSURE_COMPRESSION_RATIO (or adding more cooldown) — that's a much smaller surface than reintroducing the whole mechanism.

— gpt-5.5 via Qwen Code /review

(Note: this body replaces the earlier Chinese version on the same review thread; content is unchanged.)

DragonnZhang

Summary

The intent — preventing heap OOM in long-lived interactive sessions — is worth fixing, but the implementation has structural correctness issues that undermine the goal and break CI across all three test platforms.

Key issues

pruneMessages() prunes the wrong data structure. It splices AgentCore.messages (the AgentMessage[] UI transcript returned by getMessages()), but the memory-heavy object is GeminiChat.history (Content[] with full Part[] arrays containing tool outputs, file reads, etc.). These are two independent arrays; trimming the display log does not relieve the heap pressure the PR claims to address. See packages/core/src/agents/runtime/agent-core.ts:1368.
Test suite broken. createMockCore() in agent-interactive.test.ts does not expose a pruneMessages method. The new this.core.pruneMessages() call in agent-interactive.ts:129 therefore throws at runtime on every round. This matches the CI failure on Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x), and Test (windows-latest, Node 22.x).
Pruning happens after the memory-intensive work. The call order is addMessage -> runOneRound (model API + tool outputs, the peak-heap moment) -> pruneMessages. If runOneRound is what pushes the process past the limit, pruning after the fact does not prevent the OOM.
Heap-based force path bypasses the hasFailedCompressionAttempt latch. In geminiChat.ts:484, the heap check elevates force to true before service.compress(...) runs, and hasFailedCompressionAttempt is only set when !force. Combined with trigger being undefined on the heap path (which ChatCompressionService.compress classifies as 'manual' via trigger ?? (force ? 'manual' : 'auto')), this creates repeated compression attempts on every call while heap is high — bounded only by MAX_CONSECUTIVE_FAILURES inside the service, which is a separate, undocumented breaker.
Hardcoded 2 GB threshold is a no-op on small-memory containers. Docker/K8s deployments commonly run with --max-old-space-size=512 or similar. On those, heapUsed can never reach 2 GB — the container is OOM-killed first. This needs to be derived from v8.getHeapStatistics().heap_size_limit (or a fraction thereof) to be meaningful across environments.
No observability. Both pruneMessages() and the heap-force path silently mutate state (splice messages, force compression) with no log, event, or user-visible notice. Users will see messages disappear or compression fire without explanation.

Recommendation

The underlying idea (memory-pressure-driven safety nets) is sound, but the mechanism needs to operate on the actual memory-heavy structures (GeminiChat.history, not the UI transcript), the threshold should be derived from the runtime heap limit rather than a magic number, the force path needs to integrate cleanly with the existing failure latch, and the mock/test surface needs updating to match the new AgentCore API. Given the CI is red and the primary pruning targets the wrong array, requesting changes.

— qwen-code via Qwen Code /review

DragonnZhang

Summary

The intent — preventing heap OOM in long-lived interactive sessions — is a real problem worth solving. However, the implementation has structural issues that prevent it from achieving its goal, and main has since gained more comprehensive infrastructure that makes this approach redundant.

Verification

I independently verified the issues raised by @wenshao and @DragonnZhang:

Wrong data structure pruned. pruneMessages() splices AgentCore.messages (the UI display log, AgentMessage[]), but the memory-heavy structure is GeminiChat.history (Content[] with full Part[] arrays including tool results and file contents). Pruning the UI log does nothing to relieve actual heap pressure.
Pruning occurs after the danger window. pruneMessages() runs after runOneRound(), but OOM occurs during the round (model API call + large tool outputs). It cannot prevent the crash it targets.
Force-compression retry loop. When heap > 2 GB, force is set to true, but if compression fails, hasFailedCompressionAttempt is only latched when !force. The next tryCompress call re-elevates force and retries — spending API calls on every send with no backoff.
Tests broken. createMockCore() in agent-interactive.test.ts does not mock pruneMessages, causing TypeError: this.core.pruneMessages is not a function in all tests that exercise the run loop. CI confirms failures on all three platforms.

Context from current `main`

Since this PR was opened, main has gained:

MemoryPressureMonitor (PR #4403): tiered RSS/totalmem-based cleanup (soft/hard/critical) with cooldown, cache eviction, and optional explicit GC. This is architecturally more comprehensive than hardcoded heapUsed thresholds.
Memory diagnostics dumper (PR #4654): writes actionable JSON to disk on pressure detection, surviving OOM crashes.
Shallow/tail history copies (PR #4644): replaced structuredClone of full history with shallow variants to prevent OOM on resume.
Heap-pressure auto-compaction was previously merged (PR #4186) using getHeapStatistics from node:v8 with a ratio-based approach, then refactored into the broader compaction redesign (PRs #4345, #4599).

Recommendation

This PR needs significant rework to be viable:

The pruning target must be GeminiChat.history, not AgentCore.messages.
Pruning must occur before (not after) memory-intensive operations.
Hardcoded thresholds should use getHeapStatistics ratios (as PR #4186 did) rather than absolute byte counts, to work across container sizes.
The hasFailedCompressionAttempt interaction needs a backoff mechanism.
Tests must be updated.
The changes should integrate with the existing MemoryPressureMonitor rather than introducing a parallel mechanism.

Given the scope of required changes and the existing infrastructure on main, it may be more effective to close this PR and file an issue describing the remaining gap (if any) that MemoryPressureMonitor does not cover.

— qwen-code via Qwen Code /review

Dinsmoor · 2026-06-09T02:13:19Z

nah

github-actions Bot mentioned this pull request May 14, 2026

📊 AI CLI 工具社区动态日报 2026-05-14 gsscsd/big_model_radar#341

Open

wenshao requested changes May 14, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 14, 2026

📊 AI CLI 工具社区动态日报 2026-05-14 ivanweng2077/big_model_radar#40

Open

LaZzyMan added the type/bug Something isn't working as expected label May 14, 2026

wenshao requested changes May 14, 2026

View reviewed changes

This was referenced May 15, 2026

OOM in long sessions: V8 heap pressure can exceed limit before token-based compaction runs #4185

Closed

fix(core): add heap-pressure auto-compaction safety net #4186

Merged

wenshao reviewed May 18, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 18, 2026

fix(core): replace structuredClone with shallow copy to prevent OOM in long sessions #4286

Merged

yiliang114 mentioned this pull request May 21, 2026

[P3] Memory Diagnostics / 内存诊断 #3000

Closed

DragonnZhang requested changes Jun 8, 2026

View reviewed changes

Dinsmoor closed this Jun 9, 2026

+  private static readonly HEAP_MEMORY_THRESHOLD = (() => {
+    try {
+      return (v8.getHeapStatistics().heap_size_limit * 0.7) | 0;
+    } catch {
+      return 2 * 1024 * 1024 * 1024;
+    }
+  })();

-      process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
+    if (
+      !force &&
+      !this.hasFailedHeapCompressionAttempt &&
+      process.memoryUsage().heapUsed > 1.5 * 1024 * 1024 * 1024
+    ) {
+      force = true;
+      this.hasFailedHeapCompressionAttempt = true;
+      debugLogger.warn(
+        `Heap pressure (${(process.memoryUsage().heapUsed / 1024**3).toFixed(1)} GB) — forcing compression`,
+      );
+    }

-  pruneMessages(): number {
+  pruneMessages(): number {
+    const threshold = 1.5 * 1024 * 1024 * 1024;
+    if (process.memoryUsage().heapUsed < threshold) {
+      return 0;
+    }
+    const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
+    // Protect system/tool messages at the head of history.
+    const pruneFrom = Math.min(2, this.messages.length);
+    const actualPrune = Math.min(toPrune, this.messages.length - pruneFrom);
+    this.messages.splice(pruneFrom, actualPrune);
+    return actualPrune;
+  }

-        this.core.pruneMessages();
+      while (message !== null && !this.masterAbortController.signal.aborted) {
+        // Prune BEFORE the round to ensure headroom for tool outputs.
+        this.core.pruneMessages();
+        this.addMessage('user', message);
+        await this.runOneRound(message);
+        message = this.queue.dequeue();
+      }

-    // AND few huge entries (large file reads, shell outputs).
+    const originalForce = force;
+    if (
+      !force &&
+      process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
+    ) {
+      force = true;
+    }
+    // ... later, in the failure path:
+    // Use originalForce instead of force for the latch decision
+    if (!originalForce) {
+      this.hasFailedCompressionAttempt = true;
+    }

	this.core.pruneMessages();
	// In the test file's createMockCore():
	pruneMessages: vi.fn().mockReturnValue(0),

-    // Remove ~20% of the oldest messages to relieve heap pressure.
+  pruneMessages(): number {
+    // ... existing threshold check ...
+    const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
+    this.messages.splice(0, toPrune);
+    // Notify the UI so users understand context was trimmed
+    this.pushMessage?.({
+      role: 'info',
+      content: `Trimmed ${toPrune} oldest messages to manage memory.`,
+    });
+    return toPrune;
+  }

Uh oh!

Conversation

Dinsmoor commented May 13, 2026

Summary

Validation

Scope / Risk

Testing Matrix

Linked Issues / Bugs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dinsmoor commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

supercargotim-rgb commented May 17, 2026

Uh oh!

wenshao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Recommend closing this PR — main already implements the same idea, and more thoroughly

Heap-pressure compression is already live on main — side-by-side

pruneMessages — none of the 4 prior Critical issues addressed

What rebase will surface

Recommendation

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Summary

Key issues

Recommendation

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Summary

Verification

Context from current main

Recommendation

Uh oh!

Dinsmoor commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wenshao left a comment •

edited

Loading

Recommend closing this PR — `main` already implements the same idea, and more thoroughly

Heap-pressure compression is already live on `main` — side-by-side

`pruneMessages` — none of the 4 prior Critical issues addressed

Context from current `main`