fix(browse): daemon resilience on loaded machines — Bun ConnectionRefused, stop/restart response flush, startup + git-root timeouts#1732
Open
mplatts wants to merge 2 commits into
Conversation
…esponse The compiled CLI runs on Bun, whose fetch reports a refused/dropped socket as err.code 'ConnectionRefused'/'ConnectionClosed' (message "Unable to connect..."), not Node's ECONNREFUSED/ECONNRESET. The crash-retry catch in sendCommand only knew the Node codes, so daemon crashes (and `browse restart`) leaked the raw Bun error and exited 1 instead of restarting. The stop/restart meta-command handlers did `await shutdown(); return ...`, but shutdown() calls process.exit() inline — the response never flushed, so the CLI saw a dropped socket. Defer shutdown one tick (setTimeout 100ms) so the 200 flushes first; the daemon then exits and the next command cold-starts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both timeouts were tuned for an idle machine and lose under sustained load (10+ dev servers): cold Chromium launch measured ~5.7s at load 10 and exceeds the 8s start budget at load 12+, and `git rev-parse` spikes past the 2s git-root timeout ~30% of the time. The latter falls back to cwd, scattering per-cwd state files so `goto` and `url` hit different daemons (about:blank). - MAX_START_WAIT: macOS/Linux 8s -> 15s (matches the existing Windows budget). Poll loop returns the instant the daemon is healthy, so this only costs time on a genuine failure. - getGitRoot timeout: 2s -> 8s. Still bounds a broken .git from hanging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On a dev machine under sustained load (think 10 local servers + compiles, load avg 12+), the
browsedaemon was unreliable in three distinct ways:browse restartandbrowse stopprintedUnable to connect. Is the computer able to access the url?and exited 1 instead of doing their job.browse goto <url>intermittently failed withServer failed to start within 8s, even though the daemon came up a second later.gotowould report a 200 buturlreturnedabout:blank— the two commands talking to different daemons.All three are timeout/error-handling assumptions that hold on an idle machine and break under load. Root causes, with measurements from a load-~12 box:
Root causes + fixes
1. Bun reports refused/dropped sockets differently than Node. The compiled CLI runs on Bun, whose
fetchthrowserr.code === 'ConnectionRefused'/'ConnectionClosed'(message"Unable to connect..."), not Node'sECONNREFUSED/ECONNRESET. The crash-retry guard insendCommand(cli.ts) only matched the Node codes, so a mid-command daemon drop leaked the raw Bun error and exited 1 instead of restarting-and-retrying. Broadened the guard to match both. (The repo's own e2e helpers already check for both'ConnectionRefused'and'Unable to connect', so this shape was known elsewhere.)2.
stop/restartnever flushed their HTTP response. Both handlers didawait shutdown(); return '...', butshutdown()callsprocess.exit()inline — so thereturnwas dead code and the response never reached the CLI. The CLI saw a dropped socket; combined with (1) that surfaced as the raw error. Deferredshutdown()one tick (setTimeout(..., 100)) so the 200 flushes first, then the daemon exits and the next command lazily cold-starts.3. Startup + git-root timeouts too tight under load.
MAX_START_WAITwas 8s on macOS/Linux. A cold Chromium launch measured ~5.7s at load 10 and exceeds 8s at load 12+, so the CLI abandoned a daemon that was still booting. Raised to 15s (matches the existing Windows budget). The poll loop returns the instant the daemon is healthy, so this only costs time in a genuine failure.getGitRoot()'sgit rev-parsetimeout was 2s; under load it spikes past that (~6s observed), returns null, andresolveConfigfalls back toprocess.cwd(). That scatters.gstack/browse.jsonacross cwds, sogotoandurlhit different daemons. Raised to 8s (still bounds a genuinely broken.git).Test plan
browse restart→ exits 0, prints a clear message, next command cold-starts a fresh daemonbrowse stop→ exits 0, daemon and Chromium fully torn down (0 leftover processes)browse goto/url/screenshot→ green, single daemon, no stray state filesbun build --compile browse/src/cli.ts) + daemon reload from source, verified end-to-end on macOS/arm64, Bun 1.2.18Changes are source-only across
browse/src/{cli,config,meta-commands}.ts. No dependency or build-script changes.