fix(app): keep V2 transport alive when manager health probe fails#15
Open
Catafal wants to merge 1 commit into
Open
fix(app): keep V2 transport alive when manager health probe fails#15Catafal wants to merge 1 commit into
Catafal wants to merge 1 commit into
Conversation
Manager reachability and session transport health are separate concerns. The proxy keeps sessions alive independently (5-min stale auth cache), so closing the V2 transport on every /health probe failure was causing a tight 1006 reconnect loop whenever the manager was temporarily degraded. - Remove transport teardown from handleConnectivityLost (probe failure should not affect the live session with the proxy) - Add stale transport cleanup in handleConnectivityRestored before triggering runReconnectLoop, to avoid leaking an orphaned WebSocket if the proxy expired the session while the manager was unreachable Fixes lunel-dev#13 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #13 — the CLI reconnect loop (1006) triggered by manager
/healthprobe failures.Root cause:
handleConnectivityLost()was callingv2TransportRef.current?.close()whenever the manager health probe timed out or returned non-2xx. But manager reachability and proxy session health are completely separate services — the proxy maintains existing session sockets independently (5-minute stale auth cache). Closing the transport on every probe failure created a tight 3-second reconnect loop that never resolved while the manager was degraded.Why 1006 specifically: Closing the app transport caused the proxy to send
peer_disconnectedto the CLI, resetting its handshake state. When the app reconnected, the probe fired again within 3s and killed the new transport — repeat indefinitely.Changes
app/contexts/ConnectionContext.tsxhandleConnectivityLost(): removed the 2 lines that closed the V2 transport. The transport manages its own lifecycle viaws.onclose/ws.onerror, which already triggerrunReconnectLoop('transport_closed')for real proxy connection failures.handleConnectivityRestored(): added stale transport cleanup before triggeringrunReconnectLoop. While the manager was unreachable, the proxy may have expired the session — the reconnect loop needs a clean slate, and the old WebSocket should be properly closed rather than orphaned.cleanupSockets()(used for explicit user disconnects) is unchanged.Testing
cleanupSockets(true)runs; THEN transport still closed correctly (unchanged path)