Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions packages/core/src/agents/runtime/agent-core.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1349,6 +1349,28 @@ export class AgentCore {
return this.messages;
}

/**
* Prune oldest messages from the message history when `heapUsed` exceeds
* 1.5 GB. Returns the number of messages pruned (0 if no pruning was
* needed). Call this after each reasoning round to prevent unbounded
* memory growth in long-lived interactive sessions.
*
* Messages are pruned in chunks — removing ~20% of the oldest messages
* each time, which naturally scales with memory pressure rather than
* a hard count cap. This catches the actual root cause (heap usage)
* regardless of how the memory ballooned (many small entries vs few huge).
*/
pruneMessages(): number {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] pruneMessages() has two structural issues:

  1. Never called for headless agents. The sole call site is agent-interactive.ts:129. All subagents, background tasks, and AgentHeadless usage never prune — exactly the long-running scenarios that need it most.
  2. Drops system prompts indiscriminately. Removing the oldest 20% of messages also removes system prompts and tool definitions (at the front of history). After enough rounds, the agent loses its identity and tools.
Suggested change
pruneMessages(): number {
pruneMessages(): number {
const threshold = 1.5 * 1024 * 1024 * 1024;
if (process.memoryUsage().heapUsed < threshold) {
return 0;
}
const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
// Protect system/tool messages at the head of history.
const pruneFrom = Math.min(2, this.messages.length);
const actualPrune = Math.min(toPrune, this.messages.length - pruneFrom);
this.messages.splice(pruneFrom, actualPrune);
return actualPrune;
}

Also add this.core.pruneMessages() in AgentHeadless.execute() after each reasoning round.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

const threshold = 1.5 * 1024 * 1024 * 1024; // 1.5 GB
if (process.memoryUsage().heapUsed < threshold) {
return 0;
}
// Remove ~20% of the oldest messages to relieve heap pressure.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] pruneMessages() operates on AgentCore.messages (UI display log, AgentMessage[]), but the actual memory-heavy data structure is GeminiChat.history (Content[] with full Part[] arrays). These are two independent arrays — pruneMessages() never touches GeminiChat.history, so the pruning is completely ineffective at reducing heap pressure. The method will return positive counts while the real memory consumer continues to grow unchecked.

Suggested change
// Remove ~20% of the oldest messages to relieve heap pressure.
pruneMessages(): number {
const threshold = 1.5 * 1024 * 1024 * 1024;
if (process.memoryUsage().heapUsed < threshold) {
return 0;
}
const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
// Also prune the corresponding GeminiChat history entries to actually
// reduce heap pressure. Without this, only the UI log shrinks while
// the real memory hog (Content[] with Part[]) keeps growing.
this.chat?.truncateHistory(toPrune);
this.messages.splice(0, toPrune);
return toPrune;
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] pruneMessages() silently splices messages from the UI transcript array (getMessages()) without emitting any event, log, or user notification. Messages are permanently removed with no recovery path and no indication that context was trimmed. Combined with the fact that this pruning doesn't reduce heap (see Critical issue above), the only effect is silently degrading the user's scrollback history.

Suggested change
// Remove ~20% of the oldest messages to relieve heap pressure.
pruneMessages(): number {
// ... existing threshold check ...
const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
this.messages.splice(0, toPrune);
// Notify the UI so users understand context was trimmed
this.pushMessage?.({
role: 'info',
content: `Trimmed ${toPrune} oldest messages to manage memory.`,
});
return toPrune;
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

const toPrune = Math.max(1, Math.ceil(this.messages.length * 0.2));
this.messages.splice(0, toPrune);
return toPrune;
}

/**
* Tool calls currently awaiting user approval. Mutated by
* AgentInteractive's TOOL_WAITING_APPROVAL handler; headless agents
Expand Down
4 changes: 4 additions & 0 deletions packages/core/src/agents/runtime/agent-interactive.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ export class AgentInteractive {
while (message !== null && !this.masterAbortController.signal.aborted) {
this.addMessage('user', message);
await this.runOneRound(message);
// Prune old messages to prevent unbounded memory growth in
// long-lived interactive sessions (81+ minute sessions with
// hundreds of rounds can hit 4 GB without pruning).
this.core.pruneMessages();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] pruneMessages() runs AFTER memory-intensive operations — fails to prevent OOM in the exact scenario it targets.

The sequence is: addMessagerunOneRound (model API + large tool outputs) → pruneMessages. If runOneRound pushes heap past 4GB, the process OOMs before pruneMessages executes. The safety net only helps if heap is 1.5–4GB BEFORE the round — but the dangerous case is a round that itself triggers the OOM.

Suggested change
this.core.pruneMessages();
while (message !== null && !this.masterAbortController.signal.aborted) {
// Prune BEFORE the round to ensure headroom for tool outputs.
this.core.pruneMessages();
this.addMessage('user', message);
await this.runOneRound(message);
message = this.queue.dequeue();
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] 12 tests fail because createMockCore() is missing the new pruneMessages method. The call this.core.pruneMessages() added at this line throws TypeError: this.core.pruneMessages is not a function in all agent-interactive tests.

Suggested change
this.core.pruneMessages();
// In the test file's createMockCore():
pruneMessages: vi.fn().mockReturnValue(0),

— DeepSeek/deepseek-v4-pro via Qwen Code /review

message = this.queue.dequeue();
}

Expand Down
25 changes: 25 additions & 0 deletions packages/core/src/core/geminiChat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,19 @@ export class GeminiChat {
// model.
private sendPromise: Promise<void> = Promise.resolve();

/**
* Heap memory threshold (2 GB). When `heapUsed` exceeds this, chat
* compression is forced regardless of `hasFailedCompressionAttempt`.
* This is a memory safety net independent of the 70% token compaction
* threshold — catching the actual root cause (heap pressure) rather than
* a proxy (entry count). Protects against both many small entries AND
* few huge entries (large file reads, shell outputs).
*
* Note: set below Node's default 4 GB heap limit so there is headroom
* for one more GC cycle before the process is killed.
*/
private static readonly HEAP_MEMORY_THRESHOLD = 2 * 1024 * 1024 * 1024; // 2 GB

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Hardcoded 2GB threshold is a no-op on small-memory containers, and the safety net has zero observability.

In Docker/K8s with --max-old-space-size=512, heapUsed can never reach 2GB — the process is OOM-killed first. This also applies to the 1.5GB threshold in agent-core.ts. Derive thresholds from the actual heap limit:

Suggested change
private static readonly HEAP_MEMORY_THRESHOLD = (() => {
try {
return (v8.getHeapStatistics().heap_size_limit * 0.7) | 0;
} catch {
return 2 * 1024 * 1024 * 1024;
}
})();

Additionally, neither this heap trigger nor pruneMessages() produces any log output. Add debugLogger.warn(...) when these triggers activate so operators can tell if the safety nets fired.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

/**
* Per-chat last-prompt-token-count, populated from `usageMetadata` on each
* model response. Used by the compaction threshold check so that subagents
Expand Down Expand Up @@ -464,6 +477,18 @@ export class GeminiChat {
signal?: AbortSignal,
options?: TryCompressOptions,
): Promise<ChatCompressionInfo> {
// Force compression when heapUsed exceeds the memory threshold,
// regardless of `hasFailedCompressionAttempt`. This is a memory
// safety net — catches the actual root cause (heap pressure) rather
// than a proxy (entry count). Protects against both many small entries
// AND few huge entries (large file reads, shell outputs).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Force compression death loop: when heap > 2GB, force is elevated to true here. But the hasFailedCompressionAttempt latch (line ~530) is only set when !force. Since the heap check already changed force to true, the latch is never set on failure — every subsequent sendMessageStream call sees heap > 2GB, elevates force again, compresses again, fails again, and the latch is skipped again. Each failed compression burns a model API call. This loops indefinitely until heap pressure subsides or the user's API quota is exhausted.

Suggested change
// AND few huge entries (large file reads, shell outputs).
const originalForce = force;
if (
!force &&
process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
) {
force = true;
}
// ... later, in the failure path:
// Use originalForce instead of force for the latch decision
if (!originalForce) {
this.hasFailedCompressionAttempt = true;
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Heap-triggered force=true leaves trigger as undefined. This causes ChatCompressionService.compress() to classify the compression as 'manual' (user-initiated) via trigger ?? (force ? 'manual' : 'auto'), when it's actually automated memory-pressure response. Downstream hooks and telemetry will be misclassified.

Suggested change
// AND few huge entries (large file reads, shell outputs).
if (
!force &&
process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
) {
force = true;
options = { ...options, trigger: 'auto' };
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

if (
!force &&
process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Heap-based force races with compression's own memory allocation (OOM risk), AND bypasses the hasFailedCompressionAttempt circuit breaker (API cost amplification).

(1) OOM race: heapUsed is read BEFORE service.compress(). If heap is 1.95GB (below 2GB), force is NOT set and compression proceeds. During compression, serializing chat history + response parsing can push heap past 4GB before GC — killing the process. Lower the threshold to 1.5GB for genuine headroom.

(2) API retry loop: force=true bypasses hasFailedCompressionAttempt. If compression fails, the flag is set — but the next call still triggers force=true because heap remains >2GB. Result: unlimited compression API calls. Even successful compression may not reduce heap (if pressure is from cached files), causing redundant re-compression.

Suggested change
process.memoryUsage().heapUsed > GeminiChat.HEAP_MEMORY_THRESHOLD
if (
!force &&
!this.hasFailedHeapCompressionAttempt &&
process.memoryUsage().heapUsed > 1.5 * 1024 * 1024 * 1024
) {
force = true;
this.hasFailedHeapCompressionAttempt = true;
debugLogger.warn(
`Heap pressure (${(process.memoryUsage().heapUsed / 1024**3).toFixed(1)} GB) — forcing compression`,
);
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

) {
force = true;
}

const service = new ChatCompressionService();
const { newHistory, info } = await service.compress(this, {
promptId,
Expand Down
Loading