Skip to content

Commit fa68455

Browse files
authored
fix(test): unbreak qwen serve integration suites after daemon batch merge (#5041)
Three integration tests have failed every nightly Release and E2E run since the daemon-mode feature batch (#4490) merged, because these suites only run post-merge: - routes: resync the capabilities envelope baseline with the features the batch added (verified against a live daemon), and strip the env toggles that flip conditional tags so the exact-equality assertion is hermetic on dev machines. - baseline: the 2xN MCP grandchildren tripwire fired as designed — the workspace MCP pool eliminated the bootstrap/session duplicate discovery. Assert exactly N pooled children and cross-check the pool's per-server accounting against pgrep. - streaming: the permission test could finish with its turn still blocked on a second permission request nobody would ever answer; the abandoned request wedges the shared session's prompt FIFO and the downstream Last-Event-ID resume test times out waiting for a turn_complete that never comes (reproduced empirically). Pin the session to default approval mode (hermetic vs host user settings) and cancel the possibly-in-flight turn before finishing. The daemon-side wedge (abandoned permission request blocks the FIFO until an explicit cancel) is real beyond tests and tracked separately.
1 parent e07d069 commit fa68455

3 files changed

Lines changed: 113 additions & 42 deletions

File tree

integration-tests/cli/qwen-serve-baseline.test.ts

Lines changed: 42 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -443,41 +443,35 @@ async function measureRssAtSessionCount(sessionCount: number): Promise<{
443443
}, 120_000);
444444

445445
// PR 14b cross-check: validate the daemon's in-process MCP
446-
// accounting on `GET /workspace/mcp` (`clientCount`, the field
447-
// SDK consumers and dashboards see, and the same source the
448-
// push-event channel — `mcp_budget_warning` /
449-
// `mcp_child_refused_batch` — reads) against external `pgrep -P`
446+
// accounting on `GET /workspace/mcp` against external `pgrep -P`
450447
// measurement.
451448
//
452-
// Architectural note (PR 22a): a `qwen serve` ACP child runs
453-
// two `Config` objects, each carrying its own
454-
// `McpClientManager`. The bootstrap Config (`runAcpAgent` →
455-
// `config.initialize`) discovers MCP servers when the child
456-
// starts, and `/workspace/mcp` reads its manager via
457-
// `buildWorkspaceMcpStatus(this.config)` (`acpAgent.ts:1399`).
458-
// The per-session Config (`newSessionConfig` →
459-
// `config.initialize`) spawns a SECOND set of MCP children for
460-
// the SAME servers — its accounting is NOT what the
461-
// workspace-level snapshot reflects. So pgrep observes
462-
// `(1 + sessionCount) * MCP_SERVERS_CONFIGURED` grandchildren
463-
// while `clientCount` stays at `MCP_SERVERS_CONFIGURED`.
449+
// Architectural note (F2 workspace pool): the daemon hosts a
450+
// workspace-shared MCP transport pool (`QwenAgent.mcpPool`).
451+
// All sessions of a workspace share ONE transport per configured
452+
// server, so pgrep observes exactly `MCP_SERVERS_CONFIGURED`
453+
// grandchildren regardless of session count. (Pre-F2, bootstrap
454+
// + per-session Configs each ran their own `McpClientManager`,
455+
// and this test asserted the historical 2×N duplication.)
456+
// Pool accounting surfaces per server cell as `entryCount` /
457+
// `entrySummary`; the top-level `clientCount` field reflects the
458+
// workspace budget controller's reserved count — 0 when budgets
459+
// are off (this suite), NOT the live transport count.
464460
//
465461
// What this test validates:
466-
// 1. `clientCount` is exactly the configured server count
467-
// (bootstrap manager accounting is honest).
468-
// 2. pgrep observes the architectural 2×N grandchildren after
469-
// one session is created — encoded literally so a future
470-
// refactor that unifies bootstrap + session managers (#4175
471-
// follow-up to drop the duplicate discovery) fails this
472-
// assertion and forces a deliberate test update.
462+
// 1. pgrep observes exactly N grandchildren after a session is
463+
// created — encoded literally so a refactor that reintroduces
464+
// per-session MCP children fails this assertion and forces a
465+
// deliberate test update (same tripwire spirit as the pre-F2
466+
// 2×N assertion this replaces).
467+
// 2. Pool accounting is honest: per-server `entryCount` sums to
468+
// the observed pgrep count (no amplification slack at idle —
469+
// the fixtures are stdio-only).
473470
// 3. `clientCount` NEVER exceeds the observed pgrep count —
474471
// the original "snapshot must never over-report" guard.
475472
//
476-
// Skip-gated like the parent describe (POSIX, non-sandbox);
477-
// idle MCP fixtures are stdio-only so the relationship between
478-
// `clientCount` and pgrep is exact (no amplification slack
479-
// required at idle).
480-
it('clientCount matches external pgrep observation', async () => {
473+
// Skip-gated like the parent describe (POSIX, non-sandbox).
474+
it('pool accounting matches external pgrep observation', async () => {
481475
const ws = makeTempWorkspace('mcp-counter');
482476
let daemon: SpawnedDaemon | undefined;
483477
try {
@@ -490,28 +484,35 @@ async function measureRssAtSessionCount(sessionCount: number): Promise<{
490484
daemon = await spawnDaemon({ workspaceCwd: ws });
491485
await daemon.client.createOrAttachSession({ workspaceCwd: ws });
492486

493-
// Wait until the OS sees the FULL post-session set
494-
// (`MCP_SERVERS_CONFIGURED * 2` grandchildren — see the
487+
// Wait until the OS sees the full pooled set
488+
// (`MCP_SERVERS_CONFIGURED` grandchildren — see the
495489
// architectural note above), then read the snapshot.
496490
// pgrep first to lock the comparison floor; snapshot
497491
// second so the daemon can't sneak in a new connect
498492
// between the two reads.
499-
const expectedGrandchildren = MCP_SERVERS_CONFIGURED * 2;
500493
const observed = await waitForMcpGrandchildren(
501494
daemon.daemon.pid!,
502-
expectedGrandchildren,
495+
MCP_SERVERS_CONFIGURED,
503496
);
504497
const snapshot = await daemon.client.workspaceMcp();
505498

506-
// (1) Bootstrap manager accounting is honest.
507-
expect(snapshot.clientCount).toBe(MCP_SERVERS_CONFIGURED);
508-
// (2) pgrep observes both managers' children. If a future
509-
// refactor unifies them, change this to
510-
// `MCP_SERVERS_CONFIGURED` (and update the architectural
511-
// note above).
512-
expect(observed.mcpGrandchildren.length).toBe(expectedGrandchildren);
513-
// (3) Snapshot never over-reports OS reality. Holds under
514-
// both the current 2× regime and the unified 1× future.
499+
// (1) One pooled transport per configured server — no
500+
// per-session amplification. If this fails with MORE
501+
// children, per-session MCP spawning has been reintroduced;
502+
// update the architectural note above deliberately.
503+
expect(observed.mcpGrandchildren.length).toBe(MCP_SERVERS_CONFIGURED);
504+
// (2) Pool accounting is honest: entryCount sums to the
505+
// observed process count. Structural narrowing: the daemon
506+
// emits `entryCount` on pool-backed cells but the SDK's
507+
// `DaemonWorkspaceMcpServerStatus` doesn't carry the F2
508+
// pool fields yet.
509+
const pooledEntries = snapshot.servers.reduce(
510+
(sum, server) =>
511+
sum + ((server as { entryCount?: number }).entryCount ?? 0),
512+
0,
513+
);
514+
expect(pooledEntries).toBe(observed.mcpGrandchildren.length);
515+
// (3) Snapshot never over-reports OS reality.
515516
expect(snapshot.clientCount).toBeLessThanOrEqual(
516517
observed.mcpGrandchildren.length,
517518
);

integration-tests/cli/qwen-serve-routes.test.ts

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,26 @@ beforeAll(async () => {
7070
'--workspace',
7171
REPO_ROOT,
7272
],
73-
{ stdio: ['ignore', 'pipe', 'pipe'] },
73+
{
74+
stdio: ['ignore', 'pipe', 'pipe'],
75+
// Strip the env toggles that flip conditional capability tags
76+
// (`prompt_absolute_deadline`, `writer_idle_timeout`,
77+
// `rate_limit`, and the pool tags via the kill switch). The
78+
// capabilities baseline below assumes their default state; a
79+
// dev machine exporting any of these would otherwise fail the
80+
// exact-equality assertion.
81+
env: Object.fromEntries(
82+
Object.entries(process.env).filter(
83+
([k]) =>
84+
![
85+
'QWEN_SERVE_PROMPT_DEADLINE_MS',
86+
'QWEN_SERVE_WRITER_IDLE_TIMEOUT_MS',
87+
'QWEN_SERVE_RATE_LIMIT',
88+
'QWEN_SERVE_NO_MCP_POOL',
89+
].includes(k),
90+
),
91+
),
92+
},
7493
);
7594
// Read stdout until we see the listening line + parse the port.
7695
port = await new Promise<number>((resolve, reject) => {
@@ -187,6 +206,15 @@ describe('qwen serve — capabilities envelope', () => {
187206
// Order must match `SERVE_CAPABILITY_REGISTRY` in
188207
// `packages/cli/src/serve/capabilities.ts` and the unit-level
189208
// baseline features in `packages/cli/src/serve/server.test.ts`.
209+
//
210+
// Conditional tags absent under this suite's spawn flags (no
211+
// `--require-auth` / `--allow-origin` / deadline env vars /
212+
// rate-limit opt-in): `require_auth`, `allow_origin`,
213+
// `prompt_absolute_deadline`, `writer_idle_timeout`, `rate_limit`.
214+
// Pool tags (`mcp_workspace_pool`, `mcp_pool_restart`) ARE present
215+
// because the workspace MCP pool is on by default, as are
216+
// `workspace_settings` / `workspace_reload` (the CLI serve path
217+
// always wires `persistSetting` and the workspace service).
190218
expect(caps.features).toEqual([
191219
'health',
192220
'capabilities',
@@ -209,25 +237,45 @@ describe('qwen serve — capabilities envelope', () => {
209237
'workspace_mcp',
210238
'workspace_skills',
211239
'workspace_providers',
240+
'auth_provider_install',
212241
'workspace_memory',
213242
'workspace_agents',
243+
'workspace_agent_generate',
214244
'workspace_env',
215245
'workspace_preflight',
216246
'session_context',
247+
'session_context_usage',
217248
'session_supported_commands',
218249
'session_tasks',
250+
'session_stats',
219251
'session_close',
220252
'session_metadata',
221253
'mcp_guardrails',
254+
'workspace_mcp_manage',
222255
'mcp_guardrail_events',
256+
'mcp_server_runtime_mutation',
223257
'workspace_file_read',
224258
'workspace_file_bytes',
225259
'workspace_file_write',
226260
'session_approval_mode_control',
227261
'workspace_tool_toggle',
262+
'workspace_settings',
228263
'workspace_init',
229264
'workspace_mcp_restart',
265+
'session_recap',
266+
'session_btw',
267+
'mcp_workspace_pool',
268+
'mcp_pool_restart',
230269
'auth_device_flow',
270+
'permission_mediation',
271+
'non_blocking_prompt',
272+
'session_language',
273+
'session_rewind',
274+
'workspace_hooks',
275+
'session_hooks',
276+
'workspace_extensions',
277+
'session_branch',
278+
'workspace_reload',
231279
]);
232280
});
233281
});

integration-tests/cli/qwen-serve-streaming.test.ts

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,13 @@ describeLLM('qwen serve — multi-client first-responder permission', () => {
226226
workspaceCwd: REPO_ROOT,
227227
});
228228

229+
// Pin the session to `default` approval mode. The ACP child
230+
// inherits the host's user-level settings — a developer machine
231+
// with `approvalMode: yolo` auto-approves the write below, no
232+
// permission_request ever fires, and this test fails only
233+
// locally. CI passes because its HOME has no user settings.
234+
await client.setSessionApprovalMode(session.sessionId, 'default');
235+
229236
const ac1 = new AbortController();
230237
const ac2 = new AbortController();
231238
const seen1: DaemonEvent[] = [];
@@ -315,6 +322,21 @@ describeLLM('qwen serve — multi-client first-responder permission', () => {
315322
promptTask.catch(() => undefined),
316323
new Promise((r) => setTimeout(r, 30_000)),
317324
]);
325+
// The race above tolerates the turn still running (slow model).
326+
// But ABANDONING an in-flight turn wedges the shared session: if
327+
// the model asks for a SECOND permission after the allow_once
328+
// vote, nobody is left to answer it, the pending request blocks
329+
// the turn forever, and the per-session prompt FIFO holds every
330+
// later prompt behind it — the Last-Event-ID resume test below
331+
// then times out waiting for a turn_complete that never comes
332+
// (the exact 60s × 3-retry hang from the 2026-06-12 nightly).
333+
// Cancel the active prompt so the session is clean for the next
334+
// test; harmless when the turn already finished.
335+
await client.cancel(session.sessionId).catch(() => undefined);
336+
await Promise.race([
337+
promptTask.catch(() => undefined),
338+
new Promise((r) => setTimeout(r, 5_000)),
339+
]);
318340
ac1.abort();
319341
ac2.abort();
320342
await Promise.all([sub1, sub2]);

0 commit comments

Comments
 (0)