feat: make brev exec fast with SSH multiplexing#319
Conversation
Skip upfront API status checks — just fire SSH immediately with a 5s connect timeout. If SSH succeeds (especially via a reused multiplexed connection), return instantly. If it fails, then check instance status and give helpful error messages suggesting brev ls. Add ControlMaster/ControlPath/ControlPersist to all SSH config templates so subsequent connections reuse the master socket (~0.6s vs ~3s). Suppress noisy SSH output (Agent pid, PTY warnings, known hosts warnings).
| err := runSSHWithTimeout(sshName, command, 5) | ||
| if err == nil { | ||
| // Success — fire analytics in background and return | ||
| go trackExecAnalytics(sstore, workspaceNameOrID) |
There was a problem hiding this comment.
- Goroutine leak / silent failure in trackExecAnalytics
go trackExecAnalytics(sstore, workspaceNameOrID)
trackExecAnalytics makes API calls (GetUserWorkspaceByNameOrIDErr, GetCurrentUser, TrackEvent). If these block or panic, the goroutine will leak or crash silently. The
function also silently swallows all errors.
Suggestion: Consider adding a timeout context or recover() in the goroutine. The silent error swallowing is acceptable for analytics, but a panic would take down the
process.
| ForwardAgent yes | ||
| AddKeysToAgent yes | ||
| ControlMaster auto | ||
| ControlPath ~/.ssh/brev-control-%r@%h:%p |
There was a problem hiding this comment.
- ControlPath with colon may cause issues on some systems
ControlPath ~/.ssh/brev-control-%r@%h:%p
The : between %h and %p can cause issues on filesystems that don't support colons in filenames (e.g., SMB mounts, some macOS edge cases). While ~/.ssh is almost always on
a local filesystem, this is a common gotcha.
Suggestion: Use - instead of : as separator: ~/.ssh/brev-control-%r@%h-%p
| cmd := fmt.Sprintf("ssh %s '%s'", sshAlias, escapedCmd) | ||
| // -T disables pseudo-terminal allocation (no "Pseudo-terminal will not be allocated" warning) | ||
| // ssh-agent stdout is suppressed to /dev/null to hide "Agent pid NNNNN" | ||
| cmd := fmt.Sprintf("eval $(ssh-agent -s) > /dev/null && ssh -T -o ConnectTimeout=%d -o LogLevel=ERROR %s '%s'", connectTimeoutSecs, sshAlias, escapedCmd) |
There was a problem hiding this comment.
not sure if this new ssh-agent behavior is on purpose, but thought I would put it here just in case
- New ssh-agent spawned on every exec
cmd := fmt.Sprintf("eval $(ssh-agent -s) > /dev/null && ssh -T ...")
Every brev exec call spawns a new ssh-agent process. With the multiplexing optimization encouraging frequent rapid calls, this could lead to many orphaned ssh-agent
processes. The agent is started but never killed.
Suggestion: Check if SSH_AUTH_SOCK is already set before spawning a new agent, or use ssh-agent only when needed.
- Use `-` instead of `:` in ControlPath to avoid issues on filesystems that don't support colons (SMB mounts, WSL edge cases) - Only spawn ssh-agent if SSH_AUTH_SOCK is not already set, preventing orphaned agent processes on rapid repeated exec calls
Summary
brev execnow runs SSH immediately with a 5s connect timeout instead of making API calls to check instance status first. If SSH succeeds (especially via multiplexed connection), returns instantly with zero API overhead.ControlMaster auto/ControlPath/ControlPersist 10mto all SSH config templates. First connection ~3s, subsequent connections ~0.6s (5x speedup).brev ls.What changed
Before
Every
brev execcall would:brev refreshto sync SSH configEven if the instance was already running and SSH was ready, every single exec paid the cost of multiple API round-trips before doing anything.
After
With SSH multiplexing enabled, the first connection to an instance takes ~3s (normal SSH handshake). Every subsequent
brev execwithin 10 minutes reuses the master connection and completes in ~0.6s.SSH multiplexing
Added to all three SSH config templates (
ssh.go,sshconfigurer.goV2 and V3):ControlMaster auto— reuse an existing master connection if available, otherwise create oneControlPath— socket file stored per user/host/port comboControlPersist 10m— master stays alive 10 minutes after last session disconnectsSuppressed noisy output
eval $(ssh-agent -s) > /dev/null— hides "Agent pid 12345"ssh -T— disables PTY allocation, hides "Pseudo-terminal will not be allocated because stdin is not a terminal"-o LogLevel=ERROR— hides "Warning: Permanently added '1.2.3.4' (ED25519) to the list of known hosts"Benchmark
Test plan
brev exec <running-instance> "echo ok"— should succeed immediatelybrev exec <stopped-instance> "echo ok"— should fail fast (5s), then start instance and retrybrev exec <nonexistent-instance> "echo ok"— should fail with helpful message