Skip to content

fix(app): keep V2 transport alive when manager health probe fails#15

Open
Catafal wants to merge 1 commit into
lunel-dev:mainfrom
Catafal:fix/v2-disconnect-loop
Open

fix(app): keep V2 transport alive when manager health probe fails#15
Catafal wants to merge 1 commit into
lunel-dev:mainfrom
Catafal:fix/v2-disconnect-loop

Conversation

@Catafal
Copy link
Copy Markdown

@Catafal Catafal commented Apr 10, 2026

Summary

Fixes #13 — the CLI reconnect loop (1006) triggered by manager /health probe failures.

Root cause: handleConnectivityLost() was calling v2TransportRef.current?.close() whenever the manager health probe timed out or returned non-2xx. But manager reachability and proxy session health are completely separate services — the proxy maintains existing session sockets independently (5-minute stale auth cache). Closing the transport on every probe failure created a tight 3-second reconnect loop that never resolved while the manager was degraded.

Why 1006 specifically: Closing the app transport caused the proxy to send peer_disconnected to the CLI, resetting its handshake state. When the app reconnected, the probe fired again within 3s and killed the new transport — repeat indefinitely.

Changes

app/contexts/ConnectionContext.tsx

  • handleConnectivityLost(): removed the 2 lines that closed the V2 transport. The transport manages its own lifecycle via ws.onclose / ws.onerror, which already trigger runReconnectLoop('transport_closed') for real proxy connection failures.
  • handleConnectivityRestored(): added stale transport cleanup before triggering runReconnectLoop. While the manager was unreachable, the proxy may have expired the session — the reconnect loop needs a clean slate, and the old WebSocket should be properly closed rather than orphaned.

cleanupSockets() (used for explicit user disconnects) is unchanged.

Testing

  • GIVEN active V2 session; WHEN manager probe fails; THEN CLI receives no 1006, session continues
  • GIVEN manager was unreachable; WHEN probe recovers; THEN stale transport closed before reconnect, no WebSocket leak
  • GIVEN user explicitly disconnects; WHEN cleanupSockets(true) runs; THEN transport still closed correctly (unchanged path)

Manager reachability and session transport health are separate concerns.
The proxy keeps sessions alive independently (5-min stale auth cache),
so closing the V2 transport on every /health probe failure was causing
a tight 1006 reconnect loop whenever the manager was temporarily degraded.

- Remove transport teardown from handleConnectivityLost (probe failure
  should not affect the live session with the proxy)
- Add stale transport cleanup in handleConnectivityRestored before
  triggering runReconnectLoop, to avoid leaking an orphaned WebSocket
  if the proxy expired the session while the manager was unreachable

Fixes lunel-dev#13

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App closes active V2 session when manager /health probe fails, causing a CLI disconnect loop (1006)

1 participant