Skip to content

Commit 60a2570

Browse files
authored
🤖 perf: shard OpenSSH exec paths and simplify SSH project sync (#3125)
## Summary This keeps the SSH scaling wins that matter in practice — sharded OpenSSH exec paths, serialized same-project sync/import work, hashed remote project layout, and persisted SSH workspace roots where they still match a known checkout shape — while deleting the explicit master-pool/coordinator machinery and replacing most mock-heavy runtime unit suites with higher-level SSH integration coverage. ## Background The original branch tackled real SSH bottlenecks: one implicit control socket per host, duplicate same-project bundle imports, and basename-derived remote roots that could collide. But the first implementation grew into an app-managed OpenSSH master scheduler with a large amount of lifecycle, compatibility, and mock-test surface area. This revision deliberately simplifies the design so the branch still improves SSH scalability without carrying most of that maintenance burden. ## Implementation - shard short-lived OpenSSH exec/file traffic across a small deterministic set of `ControlPath`s via `ControlMaster=auto`, keep `sshConnectionPool` as the single bootstrap/backoff layer, re-check requested shard readiness after waiting on another inflight probe, and cap any follow-up shard probe to the remaining acquire budget - simplify `SSHRuntime` to one hashed remote project layout, one serialized per-project sync path, one current-snapshot marker, and ref-manifest validation before snapshot reuse - keep bundle imports in `refs/mux-bundle/*` and UUID bundle temp paths, but remove the explicit `openSshMasterPool`, `projectSyncCoordinator`, remote branch-metadata persistence, and hot-path legacy layout auto-detection layers - preserve persisted SSH workspace roots only when config still points at a known canonical or legacy checkout shape, including sibling multi-project path hints for repo operations and resolving an upgraded worktree's actual common git dir before rename/delete worktree commands - replace the runtime-internal unit suites with focused Docker-backed SSH integration coverage for concurrent exec bursts, rename/delete lifecycle, snapshot recovery, and checked-out branch collision cases, plus targeted `sshConnectionPool` and legacy-worktree regressions ## Validation - `make typecheck` - `make static-check` - `bun test ./src/node/runtime/runtimeHelpers.test.ts` - `bun test ./src/node/runtime/sshConnectionPool.test.ts` - `bun test src/node/services/workspaceProjectRepos.test.ts src/node/services/workspaceService.test.ts src/node/services/workspaceService.multiProject.test.ts` - `TEST_INTEGRATION=1 bun x jest tests/runtime/runtime.test.ts --runInBand --testNamePattern='SSHRuntime'` - `TEST_INTEGRATION=1 bun x jest tests/runtime/runtime.test.ts --runInBand --testNamePattern='legacy base repo for upgraded SSH worktrees'` ## Risks Moderate. The simplified transport no longer has the same theoretical ceiling as the previous explicit master-pool design, but it still improves on the original baseline by removing the single implicit host-wide control path from short-lived SSH execs. The main compatibility edge is that SSH workspaces rooted outside the known canonical or legacy checkout shapes now fall back to the canonical hashed layout and may need re-init instead of transparent path inference. ## Pains The original explicit-pool direction accumulated a lot of policy and mock-test code. The cleanup work here was mostly about deleting that surface area without giving back the core scaling fixes, which in turn meant shifting confidence into the real Docker-backed SSH integration harness. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$269.00`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=269.00 -->
1 parent 9331413 commit 60a2570

27 files changed

+1881
-1501
lines changed

‎src/node/runtime/DockerRuntime.ts‎

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -353,7 +353,10 @@ export class DockerRuntime extends RemoteRuntime {
353353
return `cd ${shescape.quote(cwd)}`;
354354
}
355355

356-
protected spawnRemoteProcess(fullCommand: string, _options: ExecOptions): Promise<SpawnResult> {
356+
protected spawnRemoteProcess(
357+
fullCommand: string,
358+
_options: ExecOptions & { deadlineMs?: number }
359+
): Promise<SpawnResult> {
357360
// Verify container name is available
358361
if (!this.containerName) {
359362
throw new RuntimeError(

‎src/node/runtime/RemoteRuntime.ts‎

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
* - spawnRemoteProcess() - how to spawn the external process (ssh/docker)
1212
* - getBasePath() - base directory for workspace operations
1313
* - quoteForRemote() - path quoting strategy
14-
* - onExitCode() - optional exit code handling (SSH connection pool)
1514
*/
1615

1716
import type { ChildProcess } from "child_process";
@@ -45,8 +44,12 @@ import { getAtomicWriteTempPath } from "./atomicWriteTempPath";
4544
export interface SpawnResult {
4645
/** The spawned child process */
4746
process: ChildProcess;
48-
/** Optional async work to do before exec (e.g., acquire connection) */
49-
preExec?: Promise<void>;
47+
/** Optional transport-scoped exit handling (e.g., master-pool health accounting). */
48+
onExit?: (exitCode: number, stderr: string) => void;
49+
/** Optional close handling that must run even for synthetic abort/timeout exits. */
50+
onClose?: () => void;
51+
/** Optional transport-scoped spawn error handling. */
52+
onError?: (error: Error) => void;
5053
}
5154

5255
/**
@@ -59,11 +62,11 @@ export abstract class RemoteRuntime implements Runtime {
5962
*
6063
* @param fullCommand The full shell command to execute (already wrapped in bash -c)
6164
* @param options Original exec options
62-
* @returns The spawned process and optional pre-exec work
65+
* @returns The spawned process and optional transport lifecycle hooks
6366
*/
6467
protected abstract spawnRemoteProcess(
6568
fullCommand: string,
66-
options: ExecOptions
69+
options: ExecOptions & { deadlineMs?: number }
6770
): Promise<SpawnResult>;
6871

6972
/**
@@ -84,15 +87,6 @@ export abstract class RemoteRuntime implements Runtime {
8487
*/
8588
protected abstract cdCommand(cwd: string): string;
8689

87-
/**
88-
* Called when exec completes with an exit code.
89-
* Subclasses can use this for connection pool health tracking.
90-
* @param stderr - Captured stderr for error reporting (e.g., SSH connection failures)
91-
*/
92-
protected onExitCode(_exitCode: number, _options: ExecOptions, _stderr: string): void {
93-
// Default: no-op. SSH overrides to report to connection pool.
94-
}
95-
9690
/**
9791
* Command prefix (e.g., "SSH" or "Docker") for logging.
9892
*/
@@ -138,8 +132,13 @@ export abstract class RemoteRuntime implements Runtime {
138132
}
139133

140134
// Spawn the remote process (SSH or Docker)
141-
// For SSH, this awaits connection pool backoff before spawning
142-
const { process: childProcess } = await this.spawnRemoteProcess(fullCommand, options);
135+
const timeoutMs = options.timeout !== undefined ? options.timeout * 1000 : undefined;
136+
const deadlineMs = timeoutMs !== undefined ? Date.now() + timeoutMs : undefined;
137+
const spawnResult = await this.spawnRemoteProcess(fullCommand, {
138+
...options,
139+
deadlineMs,
140+
});
141+
const { process: childProcess } = spawnResult;
143142

144143
// Short-lived commands can close stdin before writes/close complete.
145144
if (childProcess.stdin) {
@@ -163,23 +162,23 @@ export abstract class RemoteRuntime implements Runtime {
163162
// Create promises for exit code and duration immediately.
164163
const exitCode = new Promise<number>((resolve, reject) => {
165164
childProcess.on("close", (code, signal) => {
166-
if (aborted || options.abortSignal?.aborted) {
167-
resolve(EXIT_CODE_ABORTED);
168-
return;
169-
}
170-
if (timedOut) {
171-
resolve(EXIT_CODE_TIMEOUT);
172-
return;
165+
const finalExitCode =
166+
aborted || options.abortSignal?.aborted
167+
? EXIT_CODE_ABORTED
168+
: timedOut
169+
? EXIT_CODE_TIMEOUT
170+
: (code ?? (signal ? -1 : 0));
171+
172+
if (finalExitCode !== EXIT_CODE_ABORTED && finalExitCode !== EXIT_CODE_TIMEOUT) {
173+
spawnResult.onExit?.(finalExitCode, stderrForErrorReporting);
173174
}
174-
const finalExitCode = code ?? (signal ? -1 : 0);
175-
176-
// Let subclass handle exit code (e.g., SSH connection pool)
177-
this.onExitCode(finalExitCode, options, stderrForErrorReporting);
175+
spawnResult.onClose?.();
178176

179177
resolve(finalExitCode);
180178
});
181179

182180
childProcess.on("error", (err) => {
181+
spawnResult.onError?.(err);
183182
reject(
184183
new RuntimeError(
185184
`Failed to execute ${this.commandPrefix} command: ${err.message}`,
@@ -226,8 +225,10 @@ export abstract class RemoteRuntime implements Runtime {
226225
void exitCode.finally(() => abortSignal.removeEventListener("abort", onAbort));
227226
}
228227

229-
// Handle timeout
230-
if (options.timeout !== undefined) {
228+
// Handle timeout. Include connection acquisition time in the local deadline so
229+
// user-configured timeouts do not silently stretch while the runtime waits for SSH capacity.
230+
if (timeoutMs !== undefined) {
231+
const remainingTimeoutMs = Math.max(0, (deadlineMs ?? Date.now()) - Date.now());
231232
const timeoutHandle = setTimeout(() => {
232233
timedOut = true;
233234

@@ -245,7 +246,7 @@ export abstract class RemoteRuntime implements Runtime {
245246
disposable[Symbol.dispose]();
246247
}, 1000);
247248
hardKillHandle.unref();
248-
}, options.timeout * 1000);
249+
}, remainingTimeoutMs);
249250

250251
void exitCode.finally(() => clearTimeout(timeoutHandle));
251252
}

0 commit comments

Comments
 (0)