Skip to content

feat: make brev exec fast with SSH multiplexing#319

Merged
theFong merged 3 commits into
mainfrom
feat/fast-exec-multiplexing
Mar 9, 2026
Merged

feat: make brev exec fast with SSH multiplexing#319
theFong merged 3 commits into
mainfrom
feat/fast-exec-multiplexing

Conversation

@theFong
Copy link
Copy Markdown
Member

@theFong theFong commented Mar 8, 2026

Summary

  • Fire-first approach: brev exec now runs SSH immediately with a 5s connect timeout instead of making API calls to check instance status first. If SSH succeeds (especially via multiplexed connection), returns instantly with zero API overhead.
  • SSH multiplexing: Added ControlMaster auto / ControlPath / ControlPersist 10m to all SSH config templates. First connection ~3s, subsequent connections ~0.6s (5x speedup).
  • Graceful fallback: If SSH fails, checks instance status and either starts a stopped instance, waits for a booting instance, or gives a helpful error pointing to brev ls.
  • Clean output: Suppressed noisy SSH messages (Agent pid, PTY warnings, known hosts warnings).

What changed

Before

Every brev exec call would:

  1. API call to look up workspace by name/ID
  2. Check if stopped → start it if so
  3. Poll API every 5s until status is RUNNING (up to 10 min)
  4. Run brev refresh to sync SSH config
  5. Another API call to re-verify status is RUNNING
  6. Poll SSH connection up to 40 times waiting for it to be available
  7. Finally run the SSH command

Even if the instance was already running and SSH was ready, every single exec paid the cost of multiple API round-trips before doing anything.

After

  1. Immediately fire SSH with a 5-second connect timeout
  2. If it works → done (zero API calls)
  3. If it fails → then check instance status and fall back to the old start/poll/wait behavior

With SSH multiplexing enabled, the first connection to an instance takes ~3s (normal SSH handshake). Every subsequent brev exec within 10 minutes reuses the master connection and completes in ~0.6s.

SSH multiplexing

Added to all three SSH config templates (ssh.go, sshconfigurer.go V2 and V3):

ControlMaster auto
ControlPath ~/.ssh/brev-control-%r@%h:%p
ControlPersist 10m
  • ControlMaster auto — reuse an existing master connection if available, otherwise create one
  • ControlPath — socket file stored per user/host/port combo
  • ControlPersist 10m — master stays alive 10 minutes after last session disconnects

Suppressed noisy output

  • eval $(ssh-agent -s) > /dev/null — hides "Agent pid 12345"
  • ssh -T — disables PTY allocation, hides "Pseudo-terminal will not be allocated because stdin is not a terminal"
  • -o LogLevel=ERROR — hides "Warning: Permanently added '1.2.3.4' (ED25519) to the list of known hosts"

Benchmark

WITHOUT multiplexing:  ~3.0s every call
WITH multiplexing:     ~3.0s first call, ~0.6s subsequent calls

Test plan

  • brev exec <running-instance> "echo ok" — should succeed immediately
  • Run it twice in a row — second call should be noticeably faster
  • brev exec <stopped-instance> "echo ok" — should fail fast (5s), then start instance and retry
  • brev exec <nonexistent-instance> "echo ok" — should fail with helpful message
  • Verify no "Agent pid", "Pseudo-terminal", or "Permanently added" messages in output

Skip upfront API status checks — just fire SSH immediately with a 5s
connect timeout. If SSH succeeds (especially via a reused multiplexed
connection), return instantly. If it fails, then check instance status
and give helpful error messages suggesting brev ls.

Add ControlMaster/ControlPath/ControlPersist to all SSH config templates
so subsequent connections reuse the master socket (~0.6s vs ~3s).

Suppress noisy SSH output (Agent pid, PTY warnings, known hosts warnings).
@theFong theFong requested a review from a team as a code owner March 8, 2026 22:34
Comment thread pkg/cmd/exec/exec.go
err := runSSHWithTimeout(sshName, command, 5)
if err == nil {
// Success — fire analytics in background and return
go trackExecAnalytics(sstore, workspaceNameOrID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Goroutine leak / silent failure in trackExecAnalytics

go trackExecAnalytics(sstore, workspaceNameOrID)

trackExecAnalytics makes API calls (GetUserWorkspaceByNameOrIDErr, GetCurrentUser, TrackEvent). If these block or panic, the goroutine will leak or crash silently. The
function also silently swallows all errors.

Suggestion: Consider adding a timeout context or recover() in the goroutine. The silent error swallowing is acceptable for analytics, but a panic would take down the
process.

Comment thread pkg/ssh/ssh.go Outdated
ForwardAgent yes
AddKeysToAgent yes
ControlMaster auto
ControlPath ~/.ssh/brev-control-%r@%h:%p
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ControlPath with colon may cause issues on some systems

ControlPath ~/.ssh/brev-control-%r@%h:%p

The : between %h and %p can cause issues on filesystems that don't support colons in filenames (e.g., SMB mounts, some macOS edge cases). While ~/.ssh is almost always on
a local filesystem, this is a common gotcha.

Suggestion: Use - instead of : as separator: ~/.ssh/brev-control-%r@%h-%p

Comment thread pkg/cmd/exec/exec.go Outdated
cmd := fmt.Sprintf("ssh %s '%s'", sshAlias, escapedCmd)
// -T disables pseudo-terminal allocation (no "Pseudo-terminal will not be allocated" warning)
// ssh-agent stdout is suppressed to /dev/null to hide "Agent pid NNNNN"
cmd := fmt.Sprintf("eval $(ssh-agent -s) > /dev/null && ssh -T -o ConnectTimeout=%d -o LogLevel=ERROR %s '%s'", connectTimeoutSecs, sshAlias, escapedCmd)
Copy link
Copy Markdown
Contributor

@patelspratik patelspratik Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this new ssh-agent behavior is on purpose, but thought I would put it here just in case


  1. New ssh-agent spawned on every exec

cmd := fmt.Sprintf("eval $(ssh-agent -s) > /dev/null && ssh -T ...")

Every brev exec call spawns a new ssh-agent process. With the multiplexing optimization encouraging frequent rapid calls, this could lead to many orphaned ssh-agent
processes. The agent is started but never killed.

Suggestion: Check if SSH_AUTH_SOCK is already set before spawning a new agent, or use ssh-agent only when needed.

patelspratik
patelspratik previously approved these changes Mar 9, 2026
- Use `-` instead of `:` in ControlPath to avoid issues on filesystems
  that don't support colons (SMB mounts, WSL edge cases)
- Only spawn ssh-agent if SSH_AUTH_SOCK is not already set, preventing
  orphaned agent processes on rapid repeated exec calls
@theFong theFong merged commit ce5403b into main Mar 9, 2026
9 checks passed
@theFong theFong deleted the feat/fast-exec-multiplexing branch March 9, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants