Feature Branch: 048-tray-refetch-elimination
Created: 2026-05-08
Status: Draft
Input: User description: "Need to delete remaining tray polls of /api/v1/servers — proper fix for the paths spec 047 left out of scope."
Spec 047 (PR #450, v0.29.4) eliminated the burst-storm refetch path: each servers.changed SSE event used to trigger a GET /api/v1/servers round trip from the macOS tray, and during upstream retry-storms the rate could exceed 10 fetches/second. The servers.changed handler now consumes the embedded payload directly. CPU dropped 92% on the tray-only steady state.
What remains is a smaller residual: the Swift tray has five other call sites that still hit /api/v1/servers. Live verification with the user's real config showed ~8 fetches in an 18-second window even with no SSE storms, traced to:
| # | File:line | Trigger | Cadence under load |
|---|---|---|---|
| 1 | CoreProcessManager.swift:514-515 |
case "status": SSE handler — refetches when upstream_stats.connected_servers changes between successive status events |
Every 1-3 s during retry storms (one flap per status event) |
| 2 | CoreProcessManager.swift:608 |
startPeriodicRefresh — 30 s Timer.scheduledTimer calling refreshState (which calls refreshServers) |
Every 30 s |
| 3 | CoreProcessManager.swift:640 |
refreshSecurityStatus — falls through to apiClient.servers() when the Docker health check returns false but configured servers exist |
Every 30 s (called from refreshState) |
| 4 | MCPProxyApp.swift:101-110 |
10 s Timer.publish that always calls client.servers() |
Every 10 s |
| 5 | MCPProxyApp.swift:172-174 |
menuWillOpen — refetches each time the user clicks the tray icon |
Per user click |
appState.servers is already kept fresh by spec 047's SSE handler. These five paths refetch the same data that's already in memory.
- Cut steady-state
/api/v1/serversfetches from the macOS tray to ≤ 1 per minute at idle (only a long-cadence safety-net refresh remains). - Tray UI reactivity (menu state, dashboard, badges) stays as snappy as today — within ≤ 50 ms of an SSE event.
- No behavior change for the user; this is pure cost removal.
- No changes to the core. Pure Swift-side refactor.
- Changing the core SSE contract (
servers.changedpayload,statusevent shape). - Touching the Web UI (its handler is already covered by spec 047).
- Refactoring the unrelated
refreshActivity,refreshSessions,refreshTokenMetrics, orrefreshSecurityStatusDocker/diagnostics paths beyond the one specificapiClient.servers()fall-through. - Changing
apiClient.servers()itself or the underlying transport. The function stays available for the safety-net refresh and any future explicit-refresh UI.
None visible. Tray menu, badge counts, and dashboard remain the same. Clicking the tray icon still opens a menu reflecting current state — but the data comes from in-memory appState.servers instead of a fresh fetch on every click.
Site 1 — case "status": SSE handler (CoreProcessManager.swift:500-522)
The handler currently calls refreshServers() when connected_count changes. With spec 047, servers.changed events already deliver the full server list whenever upstream state actually changes. The status event's connected_count is just a derived stat. Replace the refreshServers() call with: update appState.totalServers and appState.totalTools from the inline stats only. The servers.changed event will deliver the matching server-list update within ~50 ms (coalescer window).
Site 2 — periodic 30 s refresh (CoreProcessManager.swift:600-613)
Drop refreshServers from refreshState. Keep the other periodic refreshes (refreshActivity, refreshSessions, refreshTokenMetrics, refreshSecurityStatus) since they cover data SSE doesn't carry. Lower the cadence of the server-list-only safety-net to a separate 5-minute timer that fires refreshServers once — guards against a missed SSE event, doesn't add measurable cost.
Site 3 — refreshSecurityStatus Docker fallthrough (CoreProcessManager.swift:640)
Replace try? await apiClient.servers() with a synchronous read of appState.servers on the main actor. The check is "do we have any connected stdio servers?" — that's available in-memory.
Site 4 — 10 s MCPProxyApp.swift timer (:101-110)
Remove the timer entirely. Its purpose ("keep health/action data current") is exactly what appState.servers updated by SSE provides. Replace with a comment block explaining the SSE-driven model.
Site 5 — menuWillOpen refetch (MCPProxyApp.swift:168-181)
Remove the in-line client.servers() call. The menu rebuilds from appState.servers which SSE keeps current. The rebuildMenu() call below it stays.
A single new long-cadence Timer in MCPProxyApp.swift calls coreManager?.refreshServers() once every 5 minutes (configurable via the existing config service if needed). This is purely a defense-in-depth measure: if an SSE event was dropped (which the buffer + coalescer already work hard to prevent), the worst-case staleness window is bounded.
| Failure | Behavior |
|---|---|
| SSE stream drops | Existing reconnect logic in CoreProcessManager reconnects; on reconnect the initial status event delivers fresh stats. The 5-minute safety-net catches any gap. |
appState.servers is empty (e.g., immediately after launch before first SSE event) |
Pre-existing behavior unchanged; the existing initial fetch in coreManager.connect populates state once at startup. |
Core publishes a notify-only servers.changed (older core, downgrade scenario) |
Spec 047's existing fallback (if !consumedFromPayload { await refreshServers() }) still triggers a single targeted refetch. |
Per CLAUDE.md TDD rule, every site change is preceded by a failing XCTest.
- Site 1:
testStatusEventDoesNotRefetchOnConnectedCountChange— fake SSE client emits twostatusevents with differentconnected_count; assertapiClient.servers()was not called. - Site 2:
testPeriodicRefreshDoesNotCallServers— fast-forward the periodic refresh tick; assert noservers()call. - Site 3:
testRefreshSecurityStatusReadsAppStateNotAPI— pre-populateappState.serverswith a connected stdio server; force the Docker fallback path; assert noservers()call anddockerAvailable=true. - Site 4:
testNoTimerDrivenServersFetch— install a counter onapiClient.servers(); wait > 10 s; assert counter == 0. - Site 5:
testMenuWillOpenDoesNotRefetch— invokemenuWillOpendirectly; assert noservers()call. - Safety-net:
testSafetyNetTimerRefetchesAt5min— fast-forward time by 5 minutes; assert exactly oneservers()call.
Replicate the spec 047 verification harness:
- Build the Go binary + Swift tray; swap into a clone of
/Applications/MCPProxy.app. - Launch with the user's real 30-server config.
- With the tray running and only the tray polling, count
/api/v1/serversGETs in~/Library/Logs/mcpproxy/http.logover a 60-second window.
Acceptance: ≤ 1 GET per 60 seconds at idle. Ad-hoc clicks on the tray icon should not generate a GET. SSE-driven state changes (toggle a server) should not generate a GET.
- Replacing the 30 s
refreshActivityperiodic with anactivitySSE-driven model — separate spec; needs alast-Nhistory bootstrap. - Replacing
refreshTokenMetricsandrefreshSessionsperiodics — same shape as activity. - Investigating whether
refreshSecurityStatusDocker check could be SSE-driven entirely (currently independent of server list). - A user-visible "refresh now" button — out of scope for this PR; the SSE-driven model already feels instant.
| Risk | Likelihood | Mitigation |
|---|---|---|
| Removing the 10 s timer hides a stale-state bug we don't know about today | Low | Keep the 5-minute safety-net. Verify in live tests that SSE actually delivers all the state the UI cares about. |
menuWillOpen removal causes the menu to render with stale data when the tray was just relaunched |
Low | The existing initial fetch on connect populates state before SSE catches up. If observed in QA, add a "first-open-after-connect" exception. |
status event handler skips the refetch but connected_count divergence persists |
Medium | Live verification step explicitly toggles servers and asserts UI reflects within 50 ms. If divergence shows up, document the gap and consider keeping the refetch behind a 1 Hz rate limit. |
- All five identified call sites stop refetching
/api/v1/serversin their normal path. - One new 5-minute safety-net timer exists in
MCPProxyApp.swift. - New XCTests pass; existing Swift tests stay green.
- Live verification artifact (
http.logGET count over 60 s idle) committed underspecs/048-tray-refetch-elimination/verification/. - Spec, plan, tasks, research, contracts (none — no API change), quickstart all committed.
None. Sites are well-identified and the fix shape is mechanical.