Skip to content

Commit 1bb24d5

Browse files
committed
clean
1 parent 4d8aad2 commit 1bb24d5

2 files changed

Lines changed: 62 additions & 35 deletions

File tree

.github/skills/chat-perf/SKILL.md

Lines changed: 26 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
1-
# Chat Performance Testing
1+
---
2+
name: chat-perf
3+
description: Run chat perf benchmarks and memory leak checks against the local dev build or any published VS Code version. Use when investigating chat rendering regressions, validating perf-sensitive changes to chat UI, or checking for memory leaks in the chat response pipeline.
4+
---
25

3-
Run chat perf benchmarks and memory leak checks against the local dev build or any published VS Code version. Use when investigating chat rendering regressions, validating perf-sensitive changes to chat UI, or checking for memory leaks in the chat response pipeline.
6+
# Chat Performance Testing
47

58
## When to use
69

@@ -38,12 +41,16 @@ Launches VS Code via Playwright Electron, opens the chat panel, sends a message
3841
| Flag | Default | Description |
3942
|---|---|---|
4043
| `--runs <n>` | `5` | Runs per scenario. More = more stable. Use 5+ for CI. |
41-
| `--scenario <id>` | all | Scenario to test (repeatable). See scenarios below. |
42-
| `--build <path\|ver>` | local dev | Build to test. Accepts path or version (`1.110.0`, `insiders`). |
43-
| `--baseline-build <ver>` | `1.115.0` | Version to download and compare against. |
44+
| `--scenario <id>` / `-s` | all | Scenario to test (repeatable). See `common/perf-scenarios.js`. |
45+
| `--build <path\|ver>` / `-b` | local dev | Build to test. Accepts path or version (`1.110.0`, `insiders`, commit hash). |
46+
| `--baseline <path>` || Compare against a previously saved baseline JSON file. |
47+
| `--baseline-build <ver>` | `1.115.0` | Version to download and benchmark as baseline. |
4448
| `--no-baseline` || Skip baseline comparison entirely. |
49+
| `--save-baseline` || Save results as the new baseline (requires `--baseline <path>`). |
4550
| `--resume <path>` || Resume a previous run, adding more iterations to increase confidence. |
4651
| `--threshold <frac>` | `0.2` | Regression threshold (0.2 = flag if 20% slower). |
52+
| `--no-cache` || Ignore cached baseline data, always run fresh. |
53+
| `--ci` || CI mode: write Markdown summary to `ci-summary.md` (implies `--no-cache`). |
4754
| `--verbose` || Print per-run details including response content. |
4855

4956
### Comparing two remote builds
@@ -87,18 +94,13 @@ Confidence levels reported: `high` (p < 0.01), `medium` (p < 0.05), `low` (p < 0
8794

8895
### Scenarios
8996

90-
| ID | What it stresses |
91-
|---|---|
92-
| `text-only` | Baseline — plain text response |
93-
| `large-codeblock` | Single TypeScript block with syntax highlighting |
94-
| `many-codeblocks` | 10 fenced code blocks (~600 lines) |
95-
| `many-small-chunks` | 200 small SSE chunks |
96-
| `mixed-content` | Markdown with headers, code blocks, prose |
97-
| `long-prose` | ~3000 words across 15 sections |
98-
| `rich-markdown` | Nested lists, bold, italic, links, blockquotes |
99-
| `giant-codeblock` | Single 200-line TypeScript block |
100-
| `rapid-stream` | 1000 tiny SSE chunks |
101-
| `file-links` | 32 file URI references with line anchors |
97+
Scenarios are defined in `scripts/chat-simulation/common/perf-scenarios.js` and registered via `registerPerfScenarios()`. There are three categories:
98+
99+
- **Content-only** — plain streaming responses (e.g. `text-only`, `large-codeblock`, `rapid-stream`)
100+
- **Tool-call** — multi-turn scenarios with tool invocations (e.g. `tool-read-file`, `tool-edit-file`)
101+
- **Multi-turn user** — multi-turn conversations with user follow-ups, thinking blocks (e.g. `thinking-response`, `multi-turn-user`, `long-conversation`)
102+
103+
Run `npm run perf:chat -- --help` to see the full list of registered scenario IDs.
102104

103105
### Metrics collected
104106

@@ -121,8 +123,8 @@ Launches one VS Code session, sends N messages sequentially, forces GC between e
121123

122124
| Flag | Default | Description |
123125
|---|---|---|
124-
| `--messages <n>` | `10` | Number of messages to send. More = more accurate slope. |
125-
| `--build <path\|ver>` | local dev | Build to test. |
126+
| `--messages <n>` / `-n` | `10` | Number of messages to send. More = more accurate slope. |
127+
| `--build <path\|ver>` / `-b` | local dev | Build to test. |
126128
| `--threshold <MB>` | `2` | Max per-message heap growth in MB. |
127129
| `--verbose` || Print per-message heap/DOM counts. |
128130

@@ -144,7 +146,10 @@ Launches one VS Code session, sends N messages sequentially, forces GC between e
144146
scripts/chat-simulation/
145147
├── common/
146148
│ ├── mock-llm-server.js # Mock CAPI server matching @vscode/copilot-api URL structure
149+
│ ├── perf-scenarios.js # Built-in scenario definitions (content, tool-call, multi-turn)
147150
│ └── utils.js # Shared: paths, env setup, stats, launch helpers
151+
├── config.jsonc # Default config (baseline version, runs, thresholds)
152+
├── fixtures/ # TypeScript fixture files used by tool-call scenarios
148153
├── test-chat-perf-regression.js
149154
└── test-chat-mem-leaks.js
150155
```
@@ -163,6 +168,6 @@ The copilot extension connects to this server via `IS_SCENARIO_AUTOMATION=1` mod
163168

164169
### Adding a scenario
165170

166-
1. Add a new entry to the `SCENARIOS` object in `common/mock-llm-server.js` — an array of string chunks that will be streamed as SSE
167-
2. Add the scenario ID to the `SCENARIOS` array in `common/utils.js`
171+
1. Add a new entry to the appropriate object (`CONTENT_SCENARIOS`, `TOOL_CALL_SCENARIOS`, or `MULTI_TURN_SCENARIOS`) in `common/perf-scenarios.js` using the `ScenarioBuilder` API from `common/mock-llm-server.js`
172+
2. The scenario is auto-registered by `registerPerfScenarios()` — no manual ID list to update
168173
3. Run: `npm run perf:chat -- --scenario your-new-scenario --runs 1 --no-baseline --verbose`

scripts/chat-simulation/test-chat-perf-regression.js

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -289,25 +289,43 @@ async function runOnce(electronPath, scenario, mockServer, verbose, runIndex, ru
289289
}
290290
});
291291

292-
// Start polling for code/chat/* perf marks inside the renderer.
293-
// The marks are emitted during the request and cleared immediately
294-
// after RequestComplete in the same microtask. We poll rapidly from
295-
// the page context to capture them before they're cleared.
292+
// Use a PerformanceObserver to capture code/chat/* marks as they're
293+
// emitted. This is event-driven (no polling) and captures marks
294+
// even if they're cleared immediately after emission.
296295
await window.evaluate(() => {
297296
// @ts-ignore
298297
globalThis._chatPerfCapture = [];
299-
// @ts-ignore
300-
globalThis._chatPerfPollId = setInterval(() => {
298+
try {
299+
// @ts-ignore
300+
globalThis._chatPerfObserver = new PerformanceObserver((list) => {
301+
for (const entry of list.getEntries()) {
302+
if (entry.name.startsWith('code/chat/')) {
303+
const timeOrigin = performance.timeOrigin ?? 0;
304+
// @ts-ignore
305+
globalThis._chatPerfCapture.push({
306+
name: entry.name,
307+
startTime: Math.round(timeOrigin + entry.startTime),
308+
});
309+
}
310+
}
311+
});
301312
// @ts-ignore
302-
const marks = globalThis.MonacoPerformanceMarks?.getMarks() ?? [];
303-
for (const m of marks) {
313+
globalThis._chatPerfObserver.observe({ type: 'mark', buffered: false });
314+
} catch {
315+
// PerformanceObserver not available — fall back to polling
316+
// @ts-ignore
317+
globalThis._chatPerfPollId = setInterval(() => {
304318
// @ts-ignore
305-
if (m.name.startsWith('code/chat/') && !globalThis._chatPerfCapture.some(c => c.name === m.name)) {
319+
const marks = globalThis.MonacoPerformanceMarks?.getMarks() ?? [];
320+
for (const m of marks) {
306321
// @ts-ignore
307-
globalThis._chatPerfCapture.push({ name: m.name, startTime: m.startTime });
322+
if (m.name.startsWith('code/chat/') && !globalThis._chatPerfCapture.some(c => c.name === m.name)) {
323+
// @ts-ignore
324+
globalThis._chatPerfCapture.push({ name: m.name, startTime: m.startTime });
325+
}
308326
}
309-
}
310-
}, 16); // poll every frame (~60fps)
327+
}, 16);
328+
}
311329
});
312330

313331
// Submit
@@ -425,15 +443,19 @@ async function runOnce(electronPath, scenario, mockServer, verbose, runIndex, ru
425443
console.log(` [debug] Client-side timing: firstResponse=${firstResponseTime - submitTime}ms, complete=${responseCompleteTime - submitTime}ms`);
426444
}
427445

428-
// Collect perf marks from our polling capture and stop the poll
446+
// Collect perf marks and tear down the observer/poll
429447
const chatMarks = await window.evaluate(() => {
430448
// @ts-ignore
431-
clearInterval(globalThis._chatPerfPollId);
449+
if (globalThis._chatPerfObserver) { globalThis._chatPerfObserver.disconnect(); }
450+
// @ts-ignore
451+
if (globalThis._chatPerfPollId) { clearInterval(globalThis._chatPerfPollId); }
432452
// @ts-ignore
433453
const marks = globalThis._chatPerfCapture ?? [];
434454
// @ts-ignore
435455
delete globalThis._chatPerfCapture;
436456
// @ts-ignore
457+
delete globalThis._chatPerfObserver;
458+
// @ts-ignore
437459
delete globalThis._chatPerfPollId;
438460
return marks;
439461
});

0 commit comments

Comments
 (0)