feat(serve): MCP guardrail push events + hysteresis (#4175 Wave 3 PR 14b)#4271
Conversation
…14b) Adds typed SSE push events for the MCP budget surface introduced by PR 14 (#4247). Operators dashboarding `/workspace/mcp` snapshot already see budget pressure via `budgets[0].status` + per-server `disabledReason: 'budget'`; PR 14b layers a real-time push channel on top so SDK clients can react without polling. Two new event types on `GET /session/:id/events`: - `mcp_budget_warning` — fires once on the upward 75% crossing of `reservedSlots.size / clientBudget`; re-arms only after the ratio drops below 37.5% (`MCP_BUDGET_REARM_FRACTION`). Mirrors PR 10's `slow_client_warning` hysteresis but at the manager level rather than the per-subscriber backlog level. Fires under both `warn` and `enforce` modes. - `mcp_child_refused_batch` — fires at end of each `discoverAllMcpTools*` pass when ≥1 server was refused, AND as a length-1 batch on the `readResource` lazy-spawn refusal path. `mode` is the literal `'enforce'` since `warn` mode never refuses. Wired end-to-end: - Core `McpClientManager` gains `evaluateBudgetState()` + `emitRefusedBatchIfAny()`, a `pendingRefusalNames` queue distinct from the snapshot-visible `lastRefusedServerNames` (PR 14 contract: refusals survive between passes for the snapshot — the emit log is independent), and a `setOnBudgetEvent(cb)` setter so acpAgent can register the callback after `loadCliConfig` returns but before discovery fires. - ACP child→bridge transport: `connection.extNotification` with method `qwen/notify/session/mcp-budget-event` (mirrors the `authenticate/update` precedent at `acpAgent.ts:355`). Payload carries `v: 1` envelope + sessionId for routing. - BridgeClient gains `extNotification` handler; resolves session via `byId.get(sessionId)` and republishes as a session-scoped SSE frame. Unknown methods, unknown kinds, and missing sessionIds drop silently for forward-compat. - Capability tag `mcp_guardrail_events` (always-on, distinct from PR 14's `mcp_guardrails` snapshot tag). - SDK typed-event registry: `KnownDaemonEvent` gains `DaemonMcpGuardrailEvent` union member; reducer adds 4 fields (`mcpBudgetWarningCount`, `lastMcpBudgetWarning`, `mcpRefusedBatchCount`, `lastMcpRefusedBatch`) on `DaemonSessionViewState`. Predicate guards reject `mode: 'warn'` on refused-batch (literal-`'enforce'` invariant) + unknown transport families (forward-compat: a daemon emitting a new transport speaks newer wire than this SDK). Tests: - 11 new manager tests (state machine arms/fires/rearms, refused batch coalescing, transport preservation, off-mode no-op, `readResource` length-1 batch, stop-resets-state). - 2 new acpAgent tests (extNotification dispatch with sessionId closure; defensive no-op when `getMcpClientManager` absent). - 3 new bridge tests (warning + refused-batch publish; drop unknown methods / kinds / sessionIds). - 5 new SDK tests (predicate accepts good shapes, rejects `thresholdRatio !== 0.75`, rejects `mode: 'warn'` on refused-batch, rejects unknown transport; reducer counters increment + last* capture; safety-net routes malformed payloads through `unrecognizedKnownEventCount`). - 1 new integration test: `pgrep -P` × `subprocessCount` cross-check validates the in-process counter (PR 14b's event source) against external observation. Skip-gated like PR 1 (POSIX, non-sandbox). - 1 new capability shape test for `mcp_guardrail_events`. - 1 added EXPECTED_STAGE1_FEATURES entry. Backward compat: every new field optional; reducer state initialized to zero defaults; old daemons advertise `mcp_guardrails` only and SDK consumers fall back to snapshot polling per the existing PR 14 contract. `EVENT_SCHEMA_VERSION` stays at `1` — push events are an additive, narrowable extension, not a wire bump. Verified: 865/865 tests pass across 30 files (manager, serve, acp-integration, sdk-typescript); typecheck clean across 4 workspaces; lint clean on 11 touched files. Refs #4175.
📋 Review SummaryThis PR (14b) adds real-time SSE push events for MCP budget guardrails on top of the snapshot surface from PR 14 (#4247). It introduces two new event types ( 🔍 General Feedback
🎯 Specific Feedback🟡 High Priority Issues
🟢 Medium Priority Improvements
🔵 Low Priority Suggestions
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds MCP guardrail push events (SSE) on top of the existing /workspace/mcp snapshot surface so SDK/clients can react to MCP budget pressure and refusals without polling, and wires child→bridge notifications to publish session-scoped frames.
Changes:
- Core: add budget-warning hysteresis + refused-batch coalescing and an
onBudgetEventcallback hook inMcpClientManager. - CLI/serve bridge: forward budget events via ACP
extNotificationand publish them as typed SSE frames; advertise newmcp_guardrail_eventscapability. - SDK: extend known event typing + predicates + reducer view-state counters for the new frame types; add tests/docs/integration coverage.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/tools/mcp-client-manager.ts | Adds hysteresis budget-warning evaluation, refused-batch coalescing, and callback wiring for push events. |
| packages/core/src/tools/mcp-client-manager.test.ts | Adds unit coverage for budget-warning hysteresis and refused-batch emission behavior. |
| packages/cli/src/acp-integration/acpAgent.ts | Wires manager budget events into child→bridge ACP extNotification messages. |
| packages/cli/src/acp-integration/acpAgent.test.ts | Tests setOnBudgetEvent wiring and defensive no-op behavior. |
| packages/cli/src/serve/httpAcpBridge.ts | Implements extNotification handling to translate budget events into SSE frames. |
| packages/cli/src/serve/httpAcpBridge.test.ts | Tests SSE publishing and silent drops for unknown/invalid notifications. |
| packages/cli/src/serve/capabilities.ts | Adds mcp_guardrail_events capability registration. |
| packages/cli/src/serve/server.test.ts | Asserts the new capability tag is present in the registry/baseline. |
| packages/sdk-typescript/src/daemon/events.ts | Adds typed event unions, predicates, and reducer state for MCP guardrail events. |
| packages/sdk-typescript/test/unit/daemonEvents.test.ts | Adds validation + reducer tests for the new MCP guardrail frames. |
| integration-tests/cli/qwen-serve-baseline.test.ts | Adds a baseline cross-check test intended to validate daemon accounting vs pgrep. |
| docs/users/qwen-serve.md | Documents the new push-event behavior and feature-detection guidance. |
| docs/developers/qwen-serve-protocol.md | Extends protocol docs with the new capability tag and event shapes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
Follow-up audit on commit 22ac97b. The two earlier critical threads (r3257175080 incremental-batch coalesce, r3257175089 re-arm gap on slot release) are still open in this commit. Adding one new inline below — the hysteresis test currently masks the re-arm gap, so once r3257175089 is fixed, the synthetic re-discover trigger should be removable.
Also noting alongside Copilot's r3257030112: the integration test claims to cross-check the event source (liveCount / subprocessCount, i.e. getMcpClientAccounting().total) against pgrep, but the assertions only read snapshot.clientCount. The push-event invariant (events' liveCount matches OS truth) isn't actually guarded — please assert on the field the test description names.
wenshao
left a comment
There was a problem hiding this comment.
[Critical] disconnectServer (L1209), removeServer (L1731), and the weReservedSlot catch path in discoverMcpToolsForServerInternal (L1115) all call this.reservedSlots.delete(serverName) but do NOT call this.evaluateBudgetState(). After a budget_warning fires (warnArmed = false), if the operator disconnects/removes servers until the ratio drops below 37.5% (MCP_BUDGET_REARM_FRACTION), warnArmed stays false permanently — no re-arm. Only stop() (next full discovery pass) or a successful reconnect can unstick it. These lines are unchanged code in the diff, so they can't be inline comments.
Fix: Add this.evaluateBudgetState() after each this.reservedSlots.delete(serverName) call in all three locations.
See inline comments below for additional findings.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Codex review round 1 of #4271 flagged four P2 issues; this fixup commit addresses all of them with bounded changes: #1 Bridge early-event buffer (httpAcpBridge.ts + `BridgeClient.earlyEvents` Map / `bufferEarlyEvent` / `sweepExpiredEarlyEvents` / `drainEarlyEvents`): Budget events fired during the child's `newSession` handler reach `BridgeClient.extNotification` before `byId.set` populates the session entry — pre-fix those frames hit `resolveEntry → undefined` and got dropped, so SSE subscribers never saw the events that crossed during session creation. Now we buffer per-sessionId with triple bounds (64 sessions × 32 frames × 60s TTL ≈ 400 KB worst case) and drain on `createSessionEntry`'s `byId.set`. Lazy TTL sweep + bounded capacity defends against malicious / buggy children spamming bogus sessionIds. #2 Pre-init callback registration (config.ts new `setMcpBudgetEventCallback` shim + acpAgent moves registration): Pre-fix acpAgent registered `setOnBudgetEvent` AFTER `config.initialize()` returned. In `QWEN_CODE_LEGACY_MCP_BLOCKING=1` mode discovery completes synchronously inside `initialize`, so end-of-pass events for the first pass were lost 100%; in default progressive mode there was a real race window. The Config-level shim stashes the callback and applies it inside `createToolRegistry` right after manager construction, so the manager has its callback wired BEFORE either `discoverAllTools` (legacy) or `startMcpDiscoveryInBackground` (default) fires. #3 Bulk-pass refused-batch coalescing (mcp-client-manager.ts new `bulkPassDepth` counter + `emitRefusedBatchIfAny` early-return guard): Pre-fix `discoverAllMcpToolsIncremental` walked N new servers and the per-server `discoverMcpToolsForServerInternal` refusal path emitted N length-1 batches inline, breaking the documented "one batch per `discoverAllMcpTools*` pass" contract. Now the bulk entry/exit increments/decrements `bulkPassDepth` in try/finally; while > 0, `emitRefusedBatchIfAny` no-ops. The bulk-pass `finally` decrements to 0 BEFORE its own emit, so the terminal call drains the queue once as a coalesced length-N batch. #4 Hysteresis re-arm on slot release (mcp-client-manager.ts new `releaseSlotName` helper + inline `evaluateBudgetState` in `tryReserveSlot`): Pre-fix the state machine only ran on bulk-pass end-of-pass + per-server success, so slot-release paths (`disconnectServer`, `removeServer`, `runWithDiscoveryTimeout`, connect-failure catch blocks) deleted from `reservedSlots` without touching `warnArmed`. Operator scenario "4/4 fire → drop to 1/4 → up to 4/4" never fired the second warning. Now every slot mutation drives the state machine via a single helper pair — `tryReserveSlot` calls evaluate after `add`, `releaseSlotName` wraps `delete + evaluate`. All 6 `reservedSlots.delete` sites migrated; redundant standalone `evaluateBudgetState` calls dropped from end-of-pass + per-server success + readResource late-reserve paths (the helper covers them). Tests: - `mcp-client-manager.test.ts`: +2 new tests (`discoverAllMcpToolsIncremental coalesces multi-server refusals into ONE batch`, `disconnectServer drives the hysteresis re-arm path`); 1 existing test's `reservedCount` assertion updated from 4→3 to reflect inline-at-crossing semantics. - `acpAgent.test.ts`: pre-init wiring test rewritten to assert the strict ordering invariant `setMcpBudgetEventCallback` → `initialize` via `callOrder` array; defensive test now exercises the new `Config.setMcpBudgetEventCallback` absence path. - `httpAcpBridge.test.ts`: +1 new test (`buffers events for a not-yet-registered sessionId, drains them on registration`) exercising the buffer + drain mechanism via thread-scope pre-buffering for a future session id. Verified: 868/868 tests pass across 30 files (manager 71 incl. 14 PR 14b + 2 fixup; serve 137 incl. 4 PR 14b + 1 fixup; acpAgent 55 incl. 2 PR 14b updated; sdk 27 PR 14b unchanged); typecheck clean across 4 workspaces; lint clean on 7 touched files.
…ail-events # Conflicts: # packages/cli/src/serve/capabilities.ts # packages/cli/src/serve/server.test.ts # packages/sdk-typescript/src/daemon/events.ts # packages/sdk-typescript/test/unit/daemonEvents.test.ts
wenshao
left a comment
There was a problem hiding this comment.
One additional typecheck failure could not be mapped to a changed diff line: packages/sdk-typescript/test/unit/daemonEvents.test.ts:667 fails with TS2345 for the existing slow_client_warning fixture because the object literal is inferred with widened type: string. Please apply the same literal-preserving fix there as well.
— mimo-v2.5-pro via Qwen Code /review
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Codex review round 2 of #4271 flagged two P2 issues on the SDK surface; both folded in: #1 Seed SDK replay for startup guardrail events (DaemonSessionClient.ts:78-94) PR 14b's `mcp_budget_warning` / `mcp_child_refused_batch` events fire during the child's `newSession` handler and are buffered on `BridgeClient.earlyEvents` until `byId.set(sessionId, entry)` runs; the bridge drains them onto the per-session bus before `spawnOrAttach` returns, so they live in the replay ring with ids. But the SDK's `DaemonSessionClient.createOrAttach` only seeded `Last-Event-ID: 0` when `req.modelServiceId` was set — every other new session started its first subscription LIVE, missing the startup-window guardrail events the new feature advertises. Unified rule: when `session.attached === false` (newly-created session) OR `req.modelServiceId` is set, seed `lastEventId: 0`. The old `modelServiceId`-only branch is preserved for re-attached sessions that need attach-time switch-event replay; the new `!attached` branch covers PR 14b's startup window. Re-attached sessions without `modelServiceId` still start live (caller may have its own event cursor it doesn't want to reset). The daemon already treats `Last-Event-ID: 0` as "replay from the beginning of the bounded ring"; if older events have been evicted from the ring (subscriber connected past the ring's coverage), the client receives the retained suffix and continues live from there — no behavior regression for late subscribers. #2 Re-export new guardrail event types from SDK barrels (daemon/index.ts + src/index.ts) PR 14b added 6 public types to `events.ts` but didn't thread them through the package's barrel exports. Consumers importing from `@qwen-code/sdk` could narrow `DaemonClientEvictedEvent` / `DaemonSlowClientWarningEvent` (older event types) but not `DaemonMcpBudgetWarningEvent` etc., breaking the SDK's encapsulation contract for the `mcp_guardrail_events` capability tag. Fix: add the 6 missing types to both `daemon/index.ts` and `src/index.ts`: - `DaemonMcpBudgetWarningData` / `DaemonMcpBudgetWarningEvent` - `DaemonMcpRefusedServer` - `DaemonMcpChildRefusedBatchData` / `DaemonMcpChildRefusedBatchEvent` - `DaemonMcpGuardrailEvent` (the union) Tests: 1 new SDK test (`replays from id 0 on freshly-created sessions so startup-window guardrail events are observable`) pins the `attached: false` → `Last-Event-ID: 0` invariant; existing 18 DaemonSessionClient tests stay green (the existing `modelServiceId`-driven replay test still passes because the new rule is OR-merged with the old one, and the `starts live when ... no model service replay need` test uses `attached: true` which bypasses the new branch). Verified: 1309/1309 tests pass across 43 files (manager, serve, acp-integration, sdk all green); typecheck clean across 4 workspaces; lint clean on 4 touched files.
Codex review round 3 of #4271 surfaced 6 actionable items across DeepSeek / mimo / copilot agents; all 6 folded in. Two suggestions explicitly declined as documented below. Adopted: #1 Debug logging for budget events (DeepSeek Critical) - mcp-client-manager.ts: evaluateBudgetState emits `debugLogger.info` on warning fire AND on re-arm. Pre-fix the manager had ZERO log output for budget events, so oncall could not distinguish "events emitted but dropped downstream" from "events never emitted." Refusal side already had stderr via `refuseAndLog`; warning + re-arm now have parity. - acpAgent.ts: changed `.catch(() => {})` to `.catch((err) => debugLogger.debug(...))`. Pre-fix every extNotification failure was silently swallowed — including real errors (serialization bugs, protocol violations), not just the expected ACP-channel-closed case. `debug` level keeps production quiet but gives oncall a debuggable trail when they flip debug on. #2 Rename mcpRefusedBatchCount → mcpChildRefusedBatchCount (DeepSeek) ViewState fields dropped "child" from the source event name `mcp_child_refused_batch`, breaking the `slow_client_warning → slowClientWarningCount` convention. Now matches: `mcpChildRefusedBatchCount` / `lastMcpChildRefusedBatch`. PR not merged → public API rename is zero-cost. Updated 4 sites (events.ts, daemonEvents.test.ts, capabilities.ts comment, and qwen-serve-protocol.md). #3 `satisfies DaemonEvent` on test fixtures (mimo "Critical") Adds explicit type annotation to mcp_budget_warning + refused- batch test fixtures so literal discriminators (`v: 1`, `type: '...'`) stay narrow. Note: the sdk package's `tsconfig.json` currently scopes `tsc --noEmit` to `src/**/*.ts` — tests aren't gated yet — so the original "TS2345" claim was overstated. The fixture is still better typed for when tests are eventually included. #4 emitRefusedBatchIfAny docstring fix (copilot) Pre-fix the docstring claimed it "Clears both the names list and the transport map after firing", but the implementation only clears `pendingRefusalNames`. The other two are reset at pass-start / `stop()` / `dropRefusalEntry` per the PR 14 contract (snapshot-visible refusals survive between passes). Updated the docstring + the `lastRefusedTransports` field comment to describe reality. #5 Rename pgrep cross-check test (copilot) Test name claimed "in-process subprocessCount matches external pgrep observation" but the assertion uses `clientCount` (the snapshot's exposed field; `subprocessCount` is manager-internal). For stdio-only fixtures the two are numerically equal, so the assertion was correct — but the test name now matches the field. Comment block reworked to make the subprocessCount/clientCount equivalence explicit. #6 emitBudgetEvent try/catch helper (mimo) `evaluateBudgetState` and `emitRefusedBatchIfAny` now route through a private `emitBudgetEvent(event)` helper that try/catches around `this.onBudgetEvent` and debug-logs failures. Pre-fix a synchronously-throwing callback would propagate the exception up through MCP discovery / `readResource` / `disconnectServer` paths and abort unrelated work — budget push events are best-effort telemetry, never critical-path. Production ACP adapter is async + already had its own .catch, but the manager-side guard makes the contract robust against future test fixtures and adapters. Declined (documented in review-thread replies): - Bridge structural validation of extNotification payloads (wenshao Suggestion / DeepSeek): SDK predicate-based validation already routes malformed payloads through `unrecognizedKnownEventCount`; bridge-side validation duplicates SDK logic without protecting against real attacks (the daemon-child is a trust boundary). - Relaxing `thresholdRatio: 0.75` literal (wenshao Suggestion / DeepSeek): premature widening for a hypothetical second threshold (0.5 critical). Better design when actually needed: discriminator union (`{ kind: 'warning_75' } | { kind: 'critical_50' }`) rather than relaxing the literal. Verified: 1309/1309 tests pass across 43 files; typecheck clean across 4 workspaces; lint clean on 6 touched files.
Codex round 4 of #4271 surfaced 1 actionable item from wenshao gpt-5.5: the `Config.setMcpBudgetEventCallback → pendingMcpBudgetCallback → createToolRegistry → registry.getMcpClientManager().setOnBudgetEvent` integration boundary had NO test coverage. The acpAgent test stubs the setter (proves QwenAgent calls it pre-`initialize()`); the manager tests bypass Config by passing `onBudgetEvent` directly to `McpClientManager`. Neither covered the actual stash + apply path inside Config — and that path is the safety net that prevents startup-window MCP guardrail events from being dropped under legacy blocking discovery + closes the progressive-mode race window. Adopted: extend the existing `vi.mock('../tools/tool-registry')` setup so each ToolRegistry instance stamps a per-instance `__mcpManagerMock` with `setOnBudgetEvent` + a stubbed `discoverAllMcpToolsIncremental` (the latter so `Config.startMcpDiscoveryInBackground` doesn't crash on missing method during `initialize`). Added 2 new tests under a new `setMcpBudgetEventCallback handoff to McpClientManager` describe: - `applies pending callback when registry is created during initialize()`: setter called BEFORE `initialize` → callback stashed in `pendingMcpBudgetCallback` → `createToolRegistry`'s apply branch forwards it to the freshly-constructed manager BEFORE discovery fires. Asserts manager's `setOnBudgetEvent` was called exactly once with the same callback function. - `applies callback directly to existing manager when called after initialize()`: setter called AFTER `initialize` → dispatches DIRECTLY to the existing manager via the `if (this.toolRegistry)` branch (the late-call path adapters use when they discover the manager only after Config is up). Also covers `cb: undefined` clearing the registration. Verified: 161/161 Config tests pass; 1470/1470 tests pass across 44 files; typecheck clean across 4 workspaces; lint clean on the 1 touched file. Closes the last open review thread on #4271.
…ail-events # Conflicts: # packages/sdk-typescript/src/daemon/events.ts
Codex round 5 of #4271 surfaced 1 Critical correctness/leak finding from wenshao gpt-5.5: pre-fix, `BridgeClient.extNotification` buffered guardrail events for ANY unknown sessionId. `resolveEntry(sessionId)` returns undefined for both newly-creating sessions (the original buffer target) AND for closed/killed sessions. A late `extNotification` from a dying child for an old sessionId would therefore land in `earlyEvents`. If the SAME sessionId came back via `session/load` or `session/resume` within the 60s TTL, `createSessionEntry`'s `drainEarlyEvents` call would replay stale prior-session telemetry onto the NEW subscriber — false budget warnings, refused server names from the OLD session leaking into a different consumer's view. Fix: tombstone Map for closed/killed session ids on `BridgeClient`. - Marked when the bridge removes a sessionId from `byId` (3 sites: channel.exited handler, closeSession, killSession). - Concurrently purges any in-flight `earlyEvents[id]` so a buffered-but-undrained frame can't leak either. - `bufferEarlyEvent` rejects tombstoned ids (the dying child's late notification just gets dropped). - `drainEarlyEvents` clears the tombstone — a fresh `createSessionEntry` for the same id is the legitimate "load/resume of a persisted session id" case, and at that point any stale event has already been rejected at buffer time. - TTL = `EARLY_EVENT_TTL_MS` (60s) — same as the early-event buffer, so by the time a tombstone expires there can be no stale frame for that id anywhere in the system. - Lazy `sweepExpiredTombstones` keeps the Map bounded. 3 byId.delete sites covered: - channel.exited handler (line ~2216): `info.client.markSessionClosed` - closeSession (line ~3367): `ci?.client.markSessionClosed` - killSession (line ~3787): `ci?.client.markSessionClosed` Test: new `tombstones closed sessionIds so late notifications cannot leak into a future load of the same id` exercises the full close-then-stale-notification → load-same-id-no-leak invariant via FakeAgent's loadSessionImpl returning the requested id verbatim. Verified: 1555/1555 tests pass across 46 files; typecheck clean across 4 workspaces; lint clean.
…ail-events # Conflicts: # docs/developers/qwen-serve-protocol.md
Codex round 6 of #4271 surfaced 5 actionable items: 1 Critical regression introduced by round 5 + 4 observability/hardening suggestions. All 5 adopted. #1 (Critical) Tombstone-vs-restore window — `httpAcpBridge.ts` The round 5 tombstone (60s post-close) was rejecting LEGITIMATE restore-time guardrail events. `markSessionClosed` sets the tombstone; `drainEarlyEvents` clears it — but `drainEarlyEvents` only runs AFTER the ACP `loadSession` / `unstable_resumeSession` returns. The restored child's MCP discovery firing during that ACP call window had its budget events buffered → tombstone-rejected → lost. Same close → load sequence within 60s lost the entire startup window of the re-opened session. Fix: new `inFlightRestoreIds: Set<string>` allow-list on BridgeClient. `markRestoreInFlight(sid)` called BEFORE the ACP restore call (in the IIFE that wraps `connection.loadSession` / `unstable_resumeSession`). `clearRestoreInFlight(sid)` paired in the IIFE's `finally` so success and failure both release. `bufferEarlyEvent` skips the tombstone check for ids in the allow-list. Idempotent (Set semantics) so coalesced restores work correctly. #2 (Suggestion) `thresholdRatio` predicate relaxed — `packages/sdk-typescript/src/daemon/events.ts` Pre-fix `isMcpBudgetWarningData` required `thresholdRatio === 0.75` exactly; daemon emits `MCP_BUDGET_WARN_FRACTION` from a separate package. A daemon-side bump (e.g. 0.80) would silently route every warning through `unrecognizedKnownEventCount` — a cross-package coordination hazard with no operator-visible failure mode. Relaxed to `isFiniteNumber` with a comment explaining the SDK's role is wire-shape validation, not threshold enforcement (semantics owned by daemon constant + protocol docs). NaN/Infinity rejection preserved. Test updated to assert that 0.5 is now accepted (forward-compat) and NaN is rejected (still wire- sane). #3 (Suggestion) Early-event buffer drops now log — `httpAcpBridge.ts` Pre-fix the 3 drop paths in `bufferEarlyEvent` (tombstone reject, session-cap reached, per-session-cap reached) silently `return`'d while every other drop site in this PR has stderr/debug breadcrumbs. Now each drop emits `writeStderrLine` (not `debugLogger.debug` — these drops can indicate operator-actionable abuse / fanout, so they're at the visible level matching `refuseAndLog`'s pattern). #4 (Suggestion) `emitRefusedBatchIfAny` defensive branches log — `packages/core/src/tools/mcp-client-manager.ts` Two "should never happen" branches (mode/budget invariant violation, names/queue sync gap) silently `clear` and `return` pre-fix. Now each emits `debugLogger.warn` so a future regression that reaches either branch leaves a diagnostic trail. #5 (Suggestion) Clear `pendingMcpBudgetCallback` after apply — `packages/core/src/config/config.ts` Pre-fix the field persisted for Config's lifetime. Subagent override paths (`createApprovalModeOverride` / `buildSubagentContextOverride`) call `createToolRegistry` again and would re-apply the parent session's callback to a fresh manager — routing subagent telemetry through the wrong ACP session. Now cleared to `undefined` after the apply. Late-call setter (`setMcpBudgetEventCallback` after initialize) unaffected — it dispatches directly to the existing manager. Verified: 1580/1580 tests pass across 47 files; typecheck clean across 4 workspaces; lint clean on 5 touched files.
wenshao
left a comment
There was a problem hiding this comment.
[Critical] [typecheck] packages/sdk-typescript/test/unit/daemonEvents.test.ts:670
tsc --noEmit reports TS2345: the pre-existing slow_client_warning fixture at line 664 is inferred with widened literal types (v: number, type: string) and fails the asKnownDaemonEvent parameter check. The same type-widening pattern was fixed for the new PR 14b fixtures at lines 776 and 914 (satisfies DaemonEvent), but this older sibling fixture was missed.
const warning = {
v: 1,
type: 'slow_client_warning',
data: { queueSize: 192, maxQueued: 256, lastEventId: 42 },
} satisfies DaemonEvent;
(This finding cannot be posted as an inline comment because line 670 is unchanged code outside the PR diff — the regression is caused by the PR's type additions in events.ts which widen TypeScript's inference for existing object literals.)
— qwen-latest-series-invite-beta-v28 via Qwen Code /review
Codex round 7 of #4271 surfaced 3 lifecycle issues — 1 Critical subagent isolation hole + 1 P2 stale-frame leak on restore failure + 1 P3 unbounded tombstone map. All 3 adopted. #1 (P2) Late-call setMcpBudgetEventCallback should not stash — `packages/core/src/config/config.ts` Pre-fix: the late-call branch (after `initialize()`) assigned to `pendingMcpBudgetCallback` BEFORE applying directly to the existing manager. Round 6 #5 cleared the field inside `createToolRegistry`, but the late-call setter re-set it — so a subsequent `createToolRegistry` (subagent override) inherited the parent session's callback and routed subagent telemetry through the wrong ACP session. Fix: restructured to two paths. - Late-call (`toolRegistry` exists): apply directly + clear `pendingMcpBudgetCallback`. Never stash on this path. - Pre-init (no registry yet): stash for `createToolRegistry` to consume — the only way to reach a manager that doesn't exist yet. Test: new `does NOT stash the callback when called after initialize()` asserts that after a late-call setter + a subsequent `config.createToolRegistry({ forSubAgent: true })`, the subagent manager's `setOnBudgetEvent` was never called. #2 (P2) Drop buffered guardrail events on restore failure — `packages/cli/src/serve/httpAcpBridge.ts` Pre-fix: round 6 #1 added `markRestoreInFlight` so `bufferEarlyEvent` accepts frames during restore. Failure path called `clearRestoreInFlight` but did NOT purge `earlyEvents[id]`. A subsequent successful retry (`session/load` of the same id within 60s) would `drainEarlyEvents` those stale frames into the new session — exactly the leak round 5's tombstone was meant to prevent. Fix: in the restore IIFE's `finally` block, when `registeredEntry` is undefined (failure path), call `markSessionClosed(req.sessionId)`. That helper already does both required actions: refresh tombstone + delete `earlyEvents[id]`. Test: new `purges buffered guardrail events when restore fails` — spawn + close + load (fails after child queues guardrail event) + load again (succeeds) → assert no stale `mcp_budget_warning` in the ring. #3 (P3) Sweep tombstones in markSessionClosed — `packages/cli/src/serve/httpAcpBridge.ts` Pre-fix: `sweepExpiredTombstones` was only called inside `bufferEarlyEvent`. On a daemon with high session-churn but few extNotifications (the common production pattern when MCP guardrail mode is `off`), the tombstone map grew monotonically — the documented 60s TTL didn't bound memory. Fix: sweep at the top of `markSessionClosed`. Cheap (one integer compare per entry); under any realistic workload the map stays small. Verified: 1582/1582 tests pass across 47 files; typecheck clean across 4 workspaces; lint clean on 4 touched files.
…ail-events # Conflicts: # docs/developers/qwen-serve-protocol.md # packages/core/src/tools/mcp-client-manager.ts
Codex round 8 flagged `slow_client_warning` (line 670, this file) as the sibling of PR 14b's `mcp_budget_warning` / `mcp_child_refused_batch` fixtures (rounds 3 #3) — same widening shape (`v: 1` → `number`, `type: '…'` → `string`), same describe block, same `asKnownDaemonEvent` call site. Note: the reviewer's claim that the issue was "caused by the PR's type additions in events.ts which widen TypeScript's inference for existing object literals" is **inaccurate**. The widening is inherent to TypeScript's literal-inference rules for `let`-style object bindings and predates PR 14b. Verified by running explicit `tsc --noEmit test/unit/daemonEvents.test.ts`: 18 TS2345 errors across the file, ~16 of which involve `model_switched` / `permission_request` / `session_update` / `client_evicted` / `stream_error` fixtures that PR 14b never touched. Why fix only one: `slow_client_warning` is the closest topical sibling to PR 14b's two fixtures (the round 3 #3 contract explicitly was "PR 14b's own fixtures"). Extending to the other 17 widening sites would (a) bloat the PR diff with PR 4 / PR 10 / PR 11 era test debt unrelated to PR 14b's review focus, and (b) still leave the underlying root cause (sdk `tsconfig.json` excludes the test directory from typecheck) — `npm run typecheck` passes clean before AND after this commit because tests aren't in scope. A future PR can opt tests into the typecheck scope and fix all 17 in one batch with the actual gating change. Verified: 46/46 sdk daemonEvents tests pass; 1656/1656 across the sweep; typecheck clean across 4 workspaces; lint clean.
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Post Coverage Comment (ubuntu-latest, 22.x), Test (windows-latest, Node 22.x), Test (macos-latest, Node 22.x), Test (ubuntu-latest, Node 22.x). — qwen-latest-series-invite-beta-v28 via Qwen Code /review
Local validation reportVerified head Build & static checks
Unit tests (1654 passed)
Capability advertisement
Live HTTP smoke under real budget pressureDaemon launched with
Why
|
wenshao
left a comment
There was a problem hiding this comment.
Approving based on the local validation pass posted above — state machine + extNotification + bridge republish + SDK predicate/reducer chain holds end-to-end, PR 14 snapshot remains correct, and the same SSE wire that delivered mcp_server_restarted / mcp_server_restart_refused in real conditions will deliver mcp_budget_warning / mcp_child_refused_batch on subsequent state transitions. The one non-blocking note (a docs sentence pointing initial-state consumers at /workspace/mcp before the first push event fires at boot) can land as a fold-in or follow-up doc PR.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen-latest-series-invite-beta-v28 via Qwen Code /review
…#4175 PR 22a) (#4295) * refactor(acp-bridge): create skeleton + lift zero-coupling primitives (#4175 PR 22a) First slice of #4175 Wave 5 PR 22 (`refactor(serve): extract acp bridge primitives`). Lifts the three primitives from `packages/cli/src/serve/` that have zero coupling to the rest of `serve/`, so PR 22a can land ahead of PR 17 (#4282) and PR 14b (#4271) without merge-conflict risk on `httpAcpBridge.ts`. Also seeds the `PermissionMediator` interface contract that PR 24 will implement (4 strategies — first-responder / designated / consensus / local-only — replacing the inline first-responder voting in `BridgeClient.requestPermission`). What moves - `cli/src/serve/eventBus.ts` (578 LOC, no internal imports) → `packages/acp-bridge/src/eventBus.ts` - `cli/src/serve/inMemoryChannel.ts` (73 LOC, only depends on `@agentclientprotocol/sdk`) → `packages/acp-bridge/src/inMemoryChannel.ts` - `httpAcpBridge.ts:638-677` `AcpChannel` / `AcpChannelExitInfo` / `ChannelFactory` types → `packages/acp-bridge/src/channel.ts` - Both test files moved alongside their sources What's new - `packages/acp-bridge/` package (`@qwen-code/acp-bridge`, internal, not published to npm yet) - `packages/acp-bridge/src/permission.ts` — type-only `PermissionMediator` / `PermissionPolicy` / `PermissionResolution` / `PermissionRequestRecord` / `PermissionVote` / `PermissionVoteOutcome`. No implementation; PR 24 fills it in. - `packages/cli/src/serve/eventBus.ts` and `inMemoryChannel.ts` are now one-line re-export wrappers for backward compat. - `httpAcpBridge.ts:638-677` is now a one-line `import type` + `export type` re-export. Backward compatibility - All existing relative imports (`./eventBus.js`, `./inMemoryChannel.js`, `../serve/eventBus.js`) keep resolving via the wrappers — no churn for the 8 importer sites in `cli/src/serve/` plus `cli/src/commands/serve.ts:14`. - `httpAcpBridge.ts` continues to export `AcpChannel` / `AcpChannelExitInfo` / `ChannelFactory` so any external consumer is unaffected. - Zero `/capabilities` payload changes, zero HTTP route changes, zero SDK behavior changes, zero spawn-site behavior changes. What's not in this slice (deferred to PR 22b after PR 17/14b land) - `BridgeClient` + `createHttpAcpBridge` factory + `defaultSpawnChannelFactory` (~3000 LOC, lines 1082-3770 + 4106-4307 of `httpAcpBridge.ts`). - Channel and VSCode-IDE-companion own-spawn migrations. Tests - `npm test --workspace @qwen-code/acp-bridge` — 28/28 pass (eventBus 20 + inMemoryChannel 8) at the new location. - `cd packages/cli && npx vitest run src/serve/` — 567/567 pass (no regressions). - `cd packages/cli && npx tsc --noEmit` clean. References - #4175 Wave 5 PR 22 row - #3803 chiga0's "Stage 1.5-prereq AcpChannel lift" - `httpAcpBridge.ts:679-696` FIXME for the four `PermissionMediator` strategies this slice declares * fix(acp-bridge): minimize package-lock.json diff to acp-bridge entries only The previous commit ran `npm install --ignore-scripts`, which npm 11 treated as license to renormalize peer-dependency markers across the entire lockfile and resolve `@types/react@18.3.28`, `@types/react-dom@18.3.7`, and `@types/prop-types@15.7.15` away from their pinned versions. CI's `npm ci` then refused to install because the manifests no longer matched the lockfile. Restored package-lock.json to origin/main and surgically added only the three entries the new package actually requires: - `node_modules/@qwen-code/acp-bridge` workspace symlink - `packages/acp-bridge` workspace manifest snapshot - `@qwen-code/acp-bridge` listed under `packages/cli`'s dependencies `npm ci --no-audit --ignore-scripts` now succeeds (1453 packages, no warnings about stale lockfile entries). Re-verified `acp-bridge` package tests (28/28 pass) and `tsc --build` clean. * fix(acp-bridge): address PR 22a review feedback Three review-driven fixes: 1. **wenshao**: removed `src/**/*.test.ts` from `tsconfig.json` exclude so the moved `eventBus.test.ts` and `inMemoryChannel.test.ts` regain typecheck coverage. Pre-fix, a future change to `BridgeEvent` / `SubscribeOptions` shape would only fail at runtime; post-fix `tsc --noEmit` catches the mismatch. Matches `packages/core/tsconfig.json`'s pattern (no test exclude; emitted test artefacts in dist/ are accepted convention). 2. **Copilot**: corrected the FIXME line citation in `permission.ts` and `README.md` from `1144-1154` to the actual range `1096-1106` (the four-strategy FIXME inside `BridgeClient.requestPermission`). Verified via grep against current `httpAcpBridge.ts`. 3. **Copilot review summary**: added a "Imports — root vs subpaths" section to README.md explaining when to use the barrel root (`@qwen-code/acp-bridge`) vs per-module subpaths (`/eventBus`, `/inMemoryChannel`, `/channel`, `/permission`), and added `@see` JSDoc pointers in `cli/src/serve/eventBus.ts` and `inMemoryChannel.ts` wrappers to the implementation files for the design-rationale comments that moved to acp-bridge. Verification: 28/28 acp-bridge tests + 567/567 cli serve tests pass; `tsc --noEmit` clean across both packages including the moved test files. Declined (replied on the PR): - Move `@agentclientprotocol/sdk` to `peerDependencies` — sound advice in general but not yet relevant; package is internal (`files: ["dist"]`, no npm publish), so dedupe is automatic through monorepo file: links. Will revisit during PR 28 (npm alpha publish).
Summary
PR 14b layers a real-time push channel on top of the snapshot surface introduced by PR 14 (#4247 / #4175 Wave 3). Operators dashboarding
/workspace/mcpalready see budget pressure viabudgets[0].statusand per-serverdisabledReason: 'budget'; this PR adds typed SSE events so SDK clients can react without polling.Two new event types on
GET /session/:id/events:mcp_budget_warning— fires once on the upward 75% crossing ofreservedSlots.size / clientBudget. Re-arms only after the ratio drops below 37.5% (MCP_BUDGET_REARM_FRACTION). Mirrors PR 10'sslow_client_warninghysteresis but at the manager level rather than the per-subscriber backlog level. Fires under bothwarnandenforcemodes;offskips the state machine entirely.mcp_child_refused_batch— coalesced once perdiscoverAllMcpTools*pass when ≥1 server was refused, AND as a length-1 batch on thereadResourcelazy-spawn refusal path.modeis the literal'enforce'sincewarnmode never refuses.What's wired
McpClientManager): newevaluateBudgetState()+emitRefusedBatchIfAny()methods,pendingRefusalNamesqueue distinct from snapshot-visiblelastRefusedServerNames(PR 14 contract preserved), andsetOnBudgetEvent(cb)setter so acpAgent registers the callback afterloadCliConfigreturns but before discovery starts.connection.extNotificationwith methodqwen/notify/session/mcp-budget-event(mirrors theauthenticate/updateprecedent atacpAgent.ts:355). Payload carriesv: 1envelope + sessionId for routing.BridgeClient.extNotificationresolves session viabyId.get(sessionId)and republishes as a session-scoped SSE frame. Unknown methods, unknown kinds, and missing sessionIds drop silently for forward-compat.mcp_guardrail_events(always-on, distinct from PR 14'smcp_guardrailssnapshot tag).KnownDaemonEventgainsDaemonMcpGuardrailEventunion member; reducer adds 4 fields onDaemonSessionViewState(mcpBudgetWarningCount,lastMcpBudgetWarning,mcpRefusedBatchCount,lastMcpRefusedBatch). Predicate guards rejectmode: 'warn'on refused-batch (literal-'enforce'invariant) and unknown transport families (forward-compat: a daemon emitting a new transport speaks newer wire than this SDK).Backward compatibility
Every new field is optional; reducer state defaults to zeros; old daemons advertise
mcp_guardrailsonly and SDK consumers fall back to snapshot polling per the existing PR 14 contract.EVENT_SCHEMA_VERSIONstays at1— push events are an additive narrowable extension, not a wire bump. Off-mode constructor stripsonBudgetEvent(defense in depth: even a stray internal call can't fire).Test plan
readResourcelength-1 batch;stop()re-arms state).getMcpClientManagerabsent on stubbed test ToolRegistries).thresholdRatio !== 0.75; rejectsmode: 'warn'on refused-batch; rejects unknown transports; reducer counters increment + last* capture; safety-net routes malformed payloads throughunrecognizedKnownEventCount).pgrep -P×subprocessCountcross-check validates the in-process counter (PR 14b's event source) against external observation. Skip-gated like PR 1 (POSIX, non-sandbox).mcp_guardrail_events.Files changed
13 files, +1828 / -16 lines:
packages/core/src/tools/mcp-client-manager.ts(+test)packages/cli/src/acp-integration/acpAgent.ts(+test)setOnBudgetEventregistration; ext-notification dispatchpackages/cli/src/serve/httpAcpBridge.ts(+test)extNotificationhandler on BridgeClientpackages/cli/src/serve/capabilities.ts(+server.test.ts)mcp_guardrail_eventstag + shape assertionpackages/sdk-typescript/src/daemon/events.ts(+test)integration-tests/cli/qwen-serve-baseline.test.tspgrep -P×subprocessCountcross-checkdocs/users/qwen-serve.md+docs/developers/qwen-serve-protocol.mdRefs #4175.
🤖 Generated with Qwen Code