Skip to content

Commit 4ebc43b

Browse files
authored
feat(test-utils): add memory usage integration test harness (#24876)
1 parent 34b4f1c commit 4ebc43b

18 files changed

Lines changed: 1021 additions & 3 deletions
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: 'Memory Tests: Nightly'
2+
3+
on:
4+
schedule:
5+
- cron: '0 2 * * *' # Runs at 2 AM every day
6+
workflow_dispatch: # Allow manual trigger
7+
8+
permissions:
9+
contents: 'read'
10+
11+
jobs:
12+
memory-test:
13+
name: 'Run Memory Usage Tests'
14+
runs-on: 'gemini-cli-ubuntu-16-core'
15+
if: "github.repository == 'google-gemini/gemini-cli'"
16+
steps:
17+
- name: 'Checkout'
18+
uses: 'actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8' # ratchet:actions/checkout@v5
19+
20+
- name: 'Set up Node.js'
21+
uses: 'actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020' # ratchet:actions/setup-node@v4
22+
with:
23+
node-version-file: '.nvmrc'
24+
cache: 'npm'
25+
26+
- name: 'Install dependencies'
27+
run: 'npm ci'
28+
29+
- name: 'Build project'
30+
run: 'npm run build'
31+
32+
- name: 'Run Memory Tests'
33+
run: 'npm run test:memory'

GEMINI.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ powerful tool for developers.
4444
- **Test Commands:**
4545
- **Unit (All):** `npm run test`
4646
- **Integration (E2E):** `npm run test:e2e`
47+
- **Memory (Nightly):** `npm run test:memory` (Runs memory regression tests
48+
against baselines. Excluded from `preflight`, run nightly.)
4749
- **Workspace-Specific:** `npm test -w <pkg> -- <path>` (Note: `<path>` must
4850
be relative to the workspace root, e.g.,
4951
`-w @google/gemini-cli-core -- src/routing/modelRouterService.test.ts`)

docs/integration-tests.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,46 @@ npm run test:integration:sandbox:docker
117117
npm run test:integration:sandbox:podman
118118
```
119119

120+
## Memory regression tests
121+
122+
Memory regression tests are designed to detect heap growth and leaks across key
123+
CLI scenarios. They are located in the `memory-tests` directory.
124+
125+
These tests are distinct from standard integration tests because they measure
126+
memory usage and compare it against committed baselines.
127+
128+
### Running memory tests
129+
130+
Memory tests are not run as part of the default `npm run test` or
131+
`npm run test:e2e` commands. They are run nightly in CI but can be run manually:
132+
133+
```bash
134+
npm run test:memory
135+
```
136+
137+
### Updating baselines
138+
139+
If you intentionally change behavior that affects memory usage, you may need to
140+
update the baselines. Set the `UPDATE_MEMORY_BASELINES` environment variable to
141+
`true`:
142+
143+
```bash
144+
UPDATE_MEMORY_BASELINES=true npm run test:memory
145+
```
146+
147+
This will run the tests, take median snapshots, and overwrite
148+
`memory-tests/baselines.json`. You should review the changes and commit the
149+
updated baseline file.
150+
151+
### How it works
152+
153+
The harness (`MemoryTestHarness` in `packages/test-utils`):
154+
155+
- Forces garbage collection multiple times to reduce noise.
156+
- Takes median snapshots to filter spikes.
157+
- Compares against baselines with a 10% tolerance.
158+
- Can analyze sustained leaks across 3 snapshots using `analyzeSnapshots()`.
159+
120160
## Diagnostics
121161

122162
The integration test runner provides several options for diagnostics to help

memory-tests/baselines.json

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"version": 1,
3+
"updatedAt": "2026-04-08T01:21:58.770Z",
4+
"scenarios": {
5+
"multi-turn-conversation": {
6+
"heapUsedBytes": 120082704,
7+
"heapTotalBytes": 177586176,
8+
"rssBytes": 269172736,
9+
"timestamp": "2026-04-08T01:21:57.127Z"
10+
},
11+
"multi-function-call-repo-search": {
12+
"heapUsedBytes": 104644984,
13+
"heapTotalBytes": 111575040,
14+
"rssBytes": 204079104,
15+
"timestamp": "2026-04-08T01:21:58.770Z"
16+
},
17+
"idle-session-startup": {
18+
"heapUsedBytes": 119813672,
19+
"heapTotalBytes": 177061888,
20+
"rssBytes": 267943936,
21+
"timestamp": "2026-04-08T01:21:53.855Z"
22+
},
23+
"simple-prompt-response": {
24+
"heapUsedBytes": 119722064,
25+
"heapTotalBytes": 177324032,
26+
"rssBytes": 268812288,
27+
"timestamp": "2026-04-08T01:21:55.491Z"
28+
}
29+
}
30+
}

memory-tests/globalSetup.ts

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/**
2+
* @license
3+
* Copyright 2026 Google LLC
4+
* SPDX-License-Identifier: Apache-2.0
5+
*/
6+
7+
import { mkdir, readdir, rm } from 'node:fs/promises';
8+
import { join, dirname } from 'node:path';
9+
import { fileURLToPath } from 'node:url';
10+
import { canUseRipgrep } from '../packages/core/src/tools/ripGrep.js';
11+
12+
const __dirname = dirname(fileURLToPath(import.meta.url));
13+
const rootDir = join(__dirname, '..');
14+
const memoryTestsDir = join(rootDir, '.memory-tests');
15+
let runDir = '';
16+
17+
export async function setup() {
18+
runDir = join(memoryTestsDir, `${Date.now()}`);
19+
await mkdir(runDir, { recursive: true });
20+
21+
// Set the home directory to the test run directory to avoid conflicts
22+
// with the user's local config.
23+
process.env['HOME'] = runDir;
24+
if (process.platform === 'win32') {
25+
process.env['USERPROFILE'] = runDir;
26+
}
27+
process.env['GEMINI_CONFIG_DIR'] = join(runDir, '.gemini');
28+
29+
// Download ripgrep to avoid race conditions
30+
const available = await canUseRipgrep();
31+
if (!available) {
32+
throw new Error('Failed to download ripgrep binary');
33+
}
34+
35+
// Clean up old test runs, keeping the latest few for debugging
36+
try {
37+
const testRuns = await readdir(memoryTestsDir);
38+
if (testRuns.length > 3) {
39+
const oldRuns = testRuns.sort().slice(0, testRuns.length - 3);
40+
await Promise.all(
41+
oldRuns.map((oldRun) =>
42+
rm(join(memoryTestsDir, oldRun), {
43+
recursive: true,
44+
force: true,
45+
}),
46+
),
47+
);
48+
}
49+
} catch (e) {
50+
console.error('Error cleaning up old memory test runs:', e);
51+
}
52+
53+
process.env['INTEGRATION_TEST_FILE_DIR'] = runDir;
54+
process.env['GEMINI_CLI_INTEGRATION_TEST'] = 'true';
55+
process.env['GEMINI_FORCE_FILE_STORAGE'] = 'true';
56+
process.env['TELEMETRY_LOG_FILE'] = join(runDir, 'telemetry.log');
57+
process.env['VERBOSE'] = process.env['VERBOSE'] ?? 'false';
58+
59+
console.log(`\nMemory test output directory: ${runDir}`);
60+
}
61+
62+
export async function teardown() {
63+
// Cleanup unless KEEP_OUTPUT is set
64+
if (process.env['KEEP_OUTPUT'] !== 'true' && runDir) {
65+
try {
66+
await rm(runDir, { recursive: true, force: true });
67+
} catch (e) {
68+
console.warn('Failed to clean up memory test directory:', e);
69+
}
70+
}
71+
}

memory-tests/memory-usage.test.ts

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
/**
2+
* @license
3+
* Copyright 2026 Google LLC
4+
* SPDX-License-Identifier: Apache-2.0
5+
*/
6+
7+
import { describe, it, beforeAll, afterAll, afterEach } from 'vitest';
8+
import { TestRig, MemoryTestHarness } from '@google/gemini-cli-test-utils';
9+
import { join, dirname } from 'node:path';
10+
import { fileURLToPath } from 'node:url';
11+
12+
const __dirname = dirname(fileURLToPath(import.meta.url));
13+
const BASELINES_PATH = join(__dirname, 'baselines.json');
14+
const UPDATE_BASELINES = process.env['UPDATE_MEMORY_BASELINES'] === 'true';
15+
const TOLERANCE_PERCENT = 10;
16+
17+
// Fake API key for tests using fake responses
18+
const TEST_ENV = { GEMINI_API_KEY: 'fake-memory-test-key' };
19+
20+
describe('Memory Usage Tests', () => {
21+
let harness: MemoryTestHarness;
22+
let rig: TestRig;
23+
24+
beforeAll(() => {
25+
harness = new MemoryTestHarness({
26+
baselinesPath: BASELINES_PATH,
27+
defaultTolerancePercent: TOLERANCE_PERCENT,
28+
gcCycles: 3,
29+
gcDelayMs: 100,
30+
sampleCount: 3,
31+
});
32+
});
33+
34+
afterEach(async () => {
35+
await rig.cleanup();
36+
});
37+
38+
afterAll(async () => {
39+
// Generate the summary report after all tests
40+
await harness.generateReport();
41+
});
42+
43+
it('idle-session-startup: memory usage within baseline', async () => {
44+
rig = new TestRig();
45+
rig.setup('memory-idle-startup', {
46+
fakeResponsesPath: join(__dirname, 'memory.idle-startup.responses'),
47+
});
48+
49+
const result = await harness.runScenario(
50+
'idle-session-startup',
51+
async (recordSnapshot) => {
52+
await rig.run({
53+
args: ['hello'],
54+
timeout: 120000,
55+
env: TEST_ENV,
56+
});
57+
58+
await recordSnapshot('after-startup');
59+
},
60+
);
61+
62+
if (UPDATE_BASELINES) {
63+
harness.updateScenarioBaseline(result);
64+
console.log(
65+
`Updated baseline for idle-session-startup: ${(result.finalHeapUsed / (1024 * 1024)).toFixed(1)} MB`,
66+
);
67+
} else {
68+
harness.assertWithinBaseline(result);
69+
}
70+
});
71+
72+
it('simple-prompt-response: memory usage within baseline', async () => {
73+
rig = new TestRig();
74+
rig.setup('memory-simple-prompt', {
75+
fakeResponsesPath: join(__dirname, 'memory.simple-prompt.responses'),
76+
});
77+
78+
const result = await harness.runScenario(
79+
'simple-prompt-response',
80+
async (recordSnapshot) => {
81+
await rig.run({
82+
args: ['What is the capital of France?'],
83+
timeout: 120000,
84+
env: TEST_ENV,
85+
});
86+
87+
await recordSnapshot('after-response');
88+
},
89+
);
90+
91+
if (UPDATE_BASELINES) {
92+
harness.updateScenarioBaseline(result);
93+
console.log(
94+
`Updated baseline for simple-prompt-response: ${(result.finalHeapUsed / (1024 * 1024)).toFixed(1)} MB`,
95+
);
96+
} else {
97+
harness.assertWithinBaseline(result);
98+
}
99+
});
100+
101+
it('multi-turn-conversation: memory remains stable over turns', async () => {
102+
rig = new TestRig();
103+
rig.setup('memory-multi-turn', {
104+
fakeResponsesPath: join(__dirname, 'memory.multi-turn.responses'),
105+
});
106+
107+
const prompts = [
108+
'Hello, what can you help me with?',
109+
'Tell me about JavaScript',
110+
'How is TypeScript different?',
111+
'Can you write a simple TypeScript function?',
112+
'What are some TypeScript best practices?',
113+
];
114+
115+
const result = await harness.runScenario(
116+
'multi-turn-conversation',
117+
async (recordSnapshot) => {
118+
// Run through all turns as a piped sequence
119+
const stdinContent = prompts.join('\n');
120+
await rig.run({
121+
stdin: stdinContent,
122+
timeout: 120000,
123+
env: TEST_ENV,
124+
});
125+
126+
// Take snapshots after the conversation completes
127+
await recordSnapshot('after-all-turns');
128+
},
129+
);
130+
131+
if (UPDATE_BASELINES) {
132+
harness.updateScenarioBaseline(result);
133+
console.log(
134+
`Updated baseline for multi-turn-conversation: ${(result.finalHeapUsed / (1024 * 1024)).toFixed(1)} MB`,
135+
);
136+
} else {
137+
harness.assertWithinBaseline(result);
138+
}
139+
});
140+
141+
it('multi-function-call-repo-search: memory after tool use', async () => {
142+
rig = new TestRig();
143+
rig.setup('memory-multi-func-call', {
144+
fakeResponsesPath: join(
145+
__dirname,
146+
'memory.multi-function-call.responses',
147+
),
148+
});
149+
150+
// Create directories first, then files in the workspace so the tools have targets
151+
rig.mkdir('packages/core/src/telemetry');
152+
rig.createFile(
153+
'packages/core/src/telemetry/memory-monitor.ts',
154+
'export class MemoryMonitor { constructor() {} }',
155+
);
156+
rig.createFile(
157+
'packages/core/src/telemetry/metrics.ts',
158+
'export function recordMemoryUsage() {}',
159+
);
160+
161+
const result = await harness.runScenario(
162+
'multi-function-call-repo-search',
163+
async (recordSnapshot) => {
164+
await rig.run({
165+
args: [
166+
'Search this repository for MemoryMonitor and tell me what it does',
167+
],
168+
timeout: 120000,
169+
env: TEST_ENV,
170+
});
171+
172+
await recordSnapshot('after-tool-calls');
173+
},
174+
);
175+
176+
if (UPDATE_BASELINES) {
177+
harness.updateScenarioBaseline(result);
178+
console.log(
179+
`Updated baseline for multi-function-call-repo-search: ${(result.finalHeapUsed / (1024 * 1024)).toFixed(1)} MB`,
180+
);
181+
} else {
182+
harness.assertWithinBaseline(result);
183+
}
184+
});
185+
});
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
2+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Hello! I'm ready to help. What would you like to work on?"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":5,"candidatesTokenCount":12,"totalTokenCount":17,"promptTokensDetails":[{"modality":"TEXT","tokenCount":5}]}}]}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
2+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I'll search for MemoryMonitor in the repository and analyze what it does."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":30,"candidatesTokenCount":15,"totalTokenCount":45,"promptTokensDetails":[{"modality":"TEXT","tokenCount":30}]}}]}
3+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"functionCall":{"name":"grep_search","args":{"pattern":"MemoryMonitor","path":".","include_pattern":"*.ts"}}},{"functionCall":{"name":"list_directory","args":{"path":"packages/core/src/telemetry"}}},{"functionCall":{"name":"read_file","args":{"file_path":"packages/core/src/telemetry/memory-monitor.ts"}}}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":30,"candidatesTokenCount":80,"totalTokenCount":110,"promptTokensDetails":[{"modality":"TEXT","tokenCount":30}]}}]}
4+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"I found the memory monitoring code. Here's a summary:\n\nThe `MemoryMonitor` class in `packages/core/src/telemetry/memory-monitor.ts` provides:\n\n1. **Continuous monitoring** via `start()`/`stop()` with configurable intervals\n2. **V8 heap snapshots** using `v8.getHeapStatistics()` and `process.memoryUsage()`\n3. **High-water mark tracking** to detect significant memory growth\n4. **Rate-limited recording** to avoid metric flood\n5. **Activity detection** — only records when user is active\n\nThe class uses a singleton pattern via `initializeMemoryMonitor()` for global access."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":500,"candidatesTokenCount":120,"totalTokenCount":620,"promptTokensDetails":[{"modality":"TEXT","tokenCount":500}]}}]}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
2+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Hello! I'm ready to help you with your coding tasks. What would you like to work on today?"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":5,"candidatesTokenCount":18,"totalTokenCount":23,"promptTokensDetails":[{"modality":"TEXT","tokenCount":5}]}}]}
3+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
4+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"JavaScript is a high-level, interpreted programming language. It was originally designed for adding interactivity to web pages."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":25,"candidatesTokenCount":60,"totalTokenCount":85,"promptTokensDetails":[{"modality":"TEXT","tokenCount":25}]}}]}
5+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
6+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"TypeScript is a typed superset of JavaScript developed by Microsoft. The main differences from JavaScript are static typing and better tooling."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":45,"candidatesTokenCount":80,"totalTokenCount":125,"promptTokensDetails":[{"modality":"TEXT","tokenCount":45}]}}]}
7+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
8+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Here is a simple TypeScript function:\n\nfunction greet(name: string): string { return `Hello, ${name}!`; }"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":60,"candidatesTokenCount":55,"totalTokenCount":115,"promptTokensDetails":[{"modality":"TEXT","tokenCount":60}]}}]}
9+
{"method":"generateContent","response":{"candidates":[{"content":{"parts":[{"text":"0"}],"role":"model"},"finishReason":"STOP","index":0}]}}
10+
{"method":"generateContentStream","response":[{"candidates":[{"content":{"parts":[{"text":"Here are 5 key TypeScript best practices: Enable strict mode, prefer interfaces, use union types, leverage type inference, and use readonly."}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":75,"candidatesTokenCount":70,"totalTokenCount":145,"promptTokensDetails":[{"modality":"TEXT","tokenCount":75}]}}]}

0 commit comments

Comments
 (0)