e2e: cross-OS packaged-desktop targets (macOS/Linux/Windows), filmed#1184
e2e: cross-OS packaged-desktop targets (macOS/Linux/Windows), filmed#1184RhysSullivan wants to merge 1 commit into
Conversation
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ❌ Deployment failed View logs |
executor-cloud | 259e9d0 | Jun 28 2026, 10:49 PM |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-marketing | 259e9d0 | Commit Preview URL Branch Preview URL |
Jun 28 2026, 10:49 PM |
Cloudflare preview
Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes. |
@executor-js/cli
@executor-js/config
@executor-js/execution
@executor-js/sdk
@executor-js/plugin-file-secrets
@executor-js/plugin-graphql
@executor-js/plugin-keychain
@executor-js/plugin-mcp
@executor-js/plugin-onepassword
@executor-js/plugin-openapi
@executor-js/codemode-core
@executor-js/runtime-quickjs
executor
commit: |
Greptile SummaryThis PR adds cross-OS packaged-desktop e2e targets that boot a real VM (tart for macOS/Linux, dockur for Windows), push the electron-builder bundle, launch it with
Confidence Score: 5/5Additive test infrastructure only; cannot affect the default test chain or production code. All three OS paths have been run end-to-end with filmed recordings. Drive-by fixes are small and targeted. New code is isolated entirely to the e2e harness. The CdpPage class in e2e/src/vm/desktop.ts and the waitForText fallback in e2e/desktop-vm/console-renders.test.ts would benefit from tighter error handling, but neither issue can cause silent test passes. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant V as vitest runner
participant G as globalsetup (per-OS)
participant VM as Guest VM
participant T as desktop-vm scenario
V->>G: run globalsetup
G->>G: ensureBundle() (build if needed)
G->>VM: tart clone + run --no-graphics (macOS/Linux)
G->>VM: pushDirAsTar (app bundle)
G->>VM: start daemon (supervised)
G->>VM: "start app --remote-debugging-port=9222"
G->>VM: waitGuestPageTarget (poll /json/list via SSH)
G->>V: "guestTunnel(ip, 9222) -> localPort"
G->>V: set E2E_DESKTOP_CDP_PORT, E2E_DESKTOP_VM_IP
V->>T: run scenario
T->>T: "pageWsUrl(localPort) -> ws URL"
T->>VM: CdpPage.connect (WebSocket via SSH tunnel)
T->>VM: Runtime.enable, Page.enable
T-->>VM: recordGuestScreen (concurrent)
T->>VM: waitForText Integrations
T->>VM: Page.captureScreenshot
T->>VM: Runtime.evaluate body.innerText
T->>T: expect toContain Integrations
T->>T: await recording session.mp4
T->>V: pass or skip
V->>G: teardown
G->>VM: forward.close()
G->>VM: vm.discard() tart delete
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant V as vitest runner
participant G as globalsetup (per-OS)
participant VM as Guest VM
participant T as desktop-vm scenario
V->>G: run globalsetup
G->>G: ensureBundle() (build if needed)
G->>VM: tart clone + run --no-graphics (macOS/Linux)
G->>VM: pushDirAsTar (app bundle)
G->>VM: start daemon (supervised)
G->>VM: "start app --remote-debugging-port=9222"
G->>VM: waitGuestPageTarget (poll /json/list via SSH)
G->>V: "guestTunnel(ip, 9222) -> localPort"
G->>V: set E2E_DESKTOP_CDP_PORT, E2E_DESKTOP_VM_IP
V->>T: run scenario
T->>T: "pageWsUrl(localPort) -> ws URL"
T->>VM: CdpPage.connect (WebSocket via SSH tunnel)
T->>VM: Runtime.enable, Page.enable
T-->>VM: recordGuestScreen (concurrent)
T->>VM: waitForText Integrations
T->>VM: Page.captureScreenshot
T->>VM: Runtime.evaluate body.innerText
T->>T: expect toContain Integrations
T->>T: await recording session.mp4
T->>V: pass or skip
V->>G: teardown
G->>VM: forward.close()
G->>VM: vm.discard() tart delete
Reviews (2): Last reviewed commit: "e2e: cross-OS packaged-desktop targets (..." | Re-trigger Greptile |
| case "run": { | ||
| const result = await execFileAsync("sh", ["-c", withBase(step.command, ctx.baseUrl)]).catch( | ||
| (error: { stdout?: string; stderr?: string }) => ({ | ||
| stdout: error.stdout ?? "", | ||
| stderr: error.stderr ?? String(error), | ||
| }), | ||
| ); | ||
| const output = `${result.stdout}${result.stderr}`; | ||
| if (step.contains !== undefined && !output.includes(step.contains)) { | ||
| throw new Error( | ||
| `\`run\` output did not contain ${JSON.stringify(step.contains)}\n${output.slice(0, 1000)}`, | ||
| ); | ||
| } | ||
| return output.trim().slice(0, 2000); |
There was a problem hiding this comment.
run step: live execution checks stdout+stderr; generated test checks only stdout
executeStep assembles output = stdout + stderr and checks step.contains against both streams. codegenStep emits const { stdout } = await execFileAsync(...) and asserts stdout.toContain(...). A command that writes the expected text only to stderr (e.g. many CLIs write progress/status there) passes during browse --contains exploration but produces a failing assertion in the promoted test — silently breaking the "live behavior and the test match" invariant the file header promises.
| for (let i = 0; i < 40; i++) { | ||
| const ok = await new Promise<boolean>((resolve) => { | ||
| const sock = net.connect({ host: "127.0.0.1", port: localPort }, () => { | ||
| sock.destroy(); | ||
| resolve(true); | ||
| }); | ||
| sock.on("error", () => resolve(false)); | ||
| sock.setTimeout(1000, () => { | ||
| sock.destroy(); | ||
| resolve(false); | ||
| }); | ||
| }); | ||
| if (ok) break; | ||
| await sleep(500); | ||
| } | ||
| return { localPort, close: () => child.kill() }; |
There was a problem hiding this comment.
guestTunnel never throws, silently bypassing "skip honestly" degradation
The polling loop exits without error after 40 attempts (~20 s). attachOrProvision in desktop-vm.ts wraps the call in try/catch and degrades to a skip only if an exception is thrown. When guestTunnel returns quietly with a port that was never successfully forwarded (e.g. SSH bound the local port but exited after ConnectTimeout=8 because the guest was unreachable), E2E_DESKTOP_CDP_PORT is still set to the dead port, so the scenario runs instead of skipping. Compare tart.ts's waitLocalPort, which explicitly throws after exhausting its attempts.
| const remote = | ||
| `S=${storage}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` + | ||
| `docker exec ${container} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` + | ||
| `ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`; |
There was a problem hiding this comment.
Windows recording passes
container and storage straight from env into a shell command string sent over SSH. A value like exec-win; rm -rf /storage would execute on the remote host. These are internal developer-controlled env vars so the blast radius is limited, but quoting them eliminates the risk entirely.
| const remote = | |
| `S=${storage}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` + | |
| `docker exec ${container} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` + | |
| `ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`; | |
| const quoteSh = (v: string): string => `'${v.replaceAll("'", "'\\''")}'`; | |
| const remote = | |
| `S=${quoteSh(storage)}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` + | |
| `docker exec ${quoteSh(container)} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` + | |
| `ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`; |
Run the real electron-builder desktop bundle inside a guest VM on macOS, Linux,
and Windows, driven over a CDP tunnel and filmed. One shared scenario
(desktop-vm/console-renders.test.ts) and driver (src/vm/desktop.ts); each target
lands test.ts + session.mp4 + step screenshots in runs/<target>/. Only launch and
capture differ per OS:
macOS: autologin Aqua session, launchctl asuser, screencapture
linux: Xvfb + openbox, xdotool window resize, ffmpeg x11grab
windows: dockur (QEMU) interactive session, QEMU screendump
macOS and Linux auto-provision a tart guest and build the bundle locally (the
executor binary cross-compiles via BUN_TARGET); Windows attaches to a dockur host
via E2E_DESKTOP_WIN_* env. Not in the default test chain; skips honestly without a
guest, like desktop-packaged skips without a display.
Also:
- Force tart SSH to password-only (PubkeyAuthentication=no, IdentitiesOnly=yes) so
a loaded SSH agent does not exhaust the guest's MaxAuthTries, an intermittent
failure the existing cli-{os} lanes also hit.
- build-sidecar keys the executable-bit chmod on the build target, not the host,
so a windows-target cross-build no longer ENOENTs on a unix executor binary.
746f289 to
259e9d0
Compare
What
Cross-OS packaged-desktop e2e targets: the real desktop app on macOS, Linux, and Windows in VMs, driven over CDP and filmed. One shared scenario (
desktop-vm/console-renders.test.ts) and driver (src/vm/desktop.ts); switching OS is a flag (vitest run --project desktop-<os>).Recordings
The same shared scenario, filmed on each guest OS. Read the test, watch the run:
macOS (tart guest, autologin Aqua,

screencapture)Linux (tart guest, Xvfb + openbox, ffmpeg

x11grab)Windows (dockur/QEMU guest,

screendump)How it works
The real electron-builder bundle, inside a guest VM, driven over a CDP tunnel, lands
test.ts+session.mp4+ step screenshots inruns/<target>/. One shared scenario and driver; only launch and capture differ per OS:launchctl asuserscreencapturexdotoolresizex11grabscreendumpmacOS and Linux auto-provision a
tartguest and build the bundle locally (theexecutorbinary cross-compiles viaBUN_TARGET); Windows attaches to a dockur host viaE2E_DESKTOP_WIN_*env. Not in the defaultbun run testchain; skips honestly without a guest, likedesktop-packagedskips without a display.Drive-by fixes
tartSSH forced to password-only (PubkeyAuthentication=no,IdentitiesOnly=yes) so a loaded SSH agent does not exhaust the guest'sMaxAuthTries. Also removes an intermittent failure the existingcli-{os}lanes hit.build-sidecarkeys the executable-bit chmod on the build target, not the host, so a windows-target cross-build no longer ENOENTs on a unixexecutorbinary.Verified
vitest run --project desktop-macos/--project desktop-linuxeach auto-provision a guest end to end and produce a playablesession.mp4of the real console, then discard the guest.desktop-windowsfilms via the dockur path. Gates green (format, lint, typecheck).