Skip to content

e2e: cross-OS packaged-desktop targets (macOS/Linux/Windows), filmed#1184

Open
RhysSullivan wants to merge 1 commit into
mainfrom
e2e/authoring-loop-and-cross-os-desktop
Open

e2e: cross-OS packaged-desktop targets (macOS/Linux/Windows), filmed#1184
RhysSullivan wants to merge 1 commit into
mainfrom
e2e/authoring-loop-and-cross-os-desktop

Conversation

@RhysSullivan

@RhysSullivan RhysSullivan commented Jun 28, 2026

Copy link
Copy Markdown
Owner

What

Cross-OS packaged-desktop e2e targets: the real desktop app on macOS, Linux, and Windows in VMs, driven over CDP and filmed. One shared scenario (desktop-vm/console-renders.test.ts) and driver (src/vm/desktop.ts); switching OS is a flag (vitest run --project desktop-<os>).

Recordings

The same shared scenario, filmed on each guest OS. Read the test, watch the run:

macOS (tart guest, autologin Aqua, screencapture)
desktop-macos

Linux (tart guest, Xvfb + openbox, ffmpeg x11grab)
desktop-linux

Windows (dockur/QEMU guest, screendump)
desktop-windows

How it works

The real electron-builder bundle, inside a guest VM, driven over a CDP tunnel, lands test.ts + session.mp4 + step screenshots in runs/<target>/. One shared scenario and driver; only launch and capture differ per OS:

OS display / launch capture
macOS autologin Aqua, launchctl asuser screencapture
linux Xvfb + openbox, xdotool resize ffmpeg x11grab
windows dockur (QEMU) interactive session QEMU screendump

macOS and Linux auto-provision a tart guest and build the bundle locally (the executor binary cross-compiles via BUN_TARGET); Windows attaches to a dockur host via E2E_DESKTOP_WIN_* env. Not in the default bun run test chain; skips honestly without a guest, like desktop-packaged skips without a display.

Drive-by fixes

  • tart SSH forced to password-only (PubkeyAuthentication=no, IdentitiesOnly=yes) so a loaded SSH agent does not exhaust the guest's MaxAuthTries. Also removes an intermittent failure the existing cli-{os} lanes hit.
  • build-sidecar keys the executable-bit chmod on the build target, not the host, so a windows-target cross-build no longer ENOENTs on a unix executor binary.

Verified

vitest run --project desktop-macos / --project desktop-linux each auto-provision a guest end to end and produce a playable session.mp4 of the real console, then discard the guest. desktop-windows films via the dockur path. Gates green (format, lint, typecheck).

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
executor-cloud 259e9d0 Jun 28 2026, 10:49 PM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing 259e9d0 Commit Preview URL

Branch Preview URL
Jun 28 2026, 10:49 PM

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Console https://executor-preview-pr-1184.executor-e2e.workers.dev
MCP https://executor-preview-pr-1184.executor-e2e.workers.dev/mcp
Deployed commit 259e9d0

Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes.

@pkg-pr-new

pkg-pr-new Bot commented Jun 28, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1184

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1184

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1184

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1184

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1184

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1184

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1184

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1184

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1184

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1184

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1184

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1184

executor

npm i https://pkg.pr.new/executor@1184

commit: 259e9d0

@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds cross-OS packaged-desktop e2e targets that boot a real VM (tart for macOS/Linux, dockur for Windows), push the electron-builder bundle, launch it with --remote-debugging-port, forward CDP to the host, drive it with a shared scenario, and film the session — one runs/<target>/ bucket per OS. It also fixes a Windows-target cross-build ENOENT (chmod keyed on build target, not host) and an SSH MaxAuthTries exhaustion bug in the existing cli-{os} tart lanes.

  • New targets (desktop-macos, desktop-linux, desktop-windows): one shared scenario and CDP driver; per-OS globalsetups handle guest provisioning, display setup, and filming.
  • Drive-by fixes: build-sidecar.ts now checks the build target before chmod; tart.ts SSH options add PubkeyAuthentication=no/IdentitiesOnly=yes to prevent agent key exhaustion.
  • Architecture: tart processes run detached/unref()'d so the test runner can exit cleanly until discard() is called.

Confidence Score: 5/5

Additive test infrastructure only; cannot affect the default test chain or production code.

All three OS paths have been run end-to-end with filmed recordings. Drive-by fixes are small and targeted. New code is isolated entirely to the e2e harness.

The CdpPage class in e2e/src/vm/desktop.ts and the waitForText fallback in e2e/desktop-vm/console-renders.test.ts would benefit from tighter error handling, but neither issue can cause silent test passes.

Important Files Changed

Filename Overview
e2e/src/vm/desktop.ts New shared CDP driver + SSH helpers. CdpPage.command resolves undefined on error frames and lacks a WebSocket close handler to drain pending promises.
e2e/desktop-vm/console-renders.test.ts New shared VM scenario. Broad .catch() on waitForText swallows connection errors, not just timeouts.
e2e/setup/desktop-vm.ts Shared attach-or-provision logic. guestTunnel never throws on port-bind failure, so the fallback-to-skip path is unreachable.
e2e/setup/desktop-macos.globalsetup.ts macOS tart globalsetup: builds bundle, pushes via tar-stream, ad-hoc codesigns, starts daemon + app via launchctl asuser.
e2e/setup/desktop-linux.globalsetup.ts Linux tart globalsetup: brings up Xvfb + openbox, launches app with --no-sandbox, xdotool-resizes window for good x11grab coverage.
e2e/setup/desktop-windows.globalsetup.ts Windows dockur globalsetup: SSH jump-tunnel approach is clean. Recording interpolates env vars into remote shell commands (flagged in previous review).
e2e/src/vm/tart.ts Adds PubkeyAuthentication=no/IdentitiesOnly=yes to prevent SSH agent key exhaustion, and detached+unref on the tart process.
apps/desktop/scripts/build-sidecar.ts Drive-by fix: keys the executable-bit chmod on the build TARGET to avoid ENOENT on Windows-target cross-builds.
e2e/vitest.config.ts Adds desktop-macos/linux/windows projects with generous hookTimeout (900 s). Not in the default test chain.
e2e/targets/registry.ts Registers three new desktop- target names so each OS gets its own runs// bucket.
e2e/targets/desktop.ts Uses E2E_TARGET env var as the target name so each OS project lands in its own run bucket.
e2e/AGENTS.md Documents the three desktop VM targets, guest image requirements, and Xvfb + openbox rationale.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant V as vitest runner
    participant G as globalsetup (per-OS)
    participant VM as Guest VM
    participant T as desktop-vm scenario

    V->>G: run globalsetup
    G->>G: ensureBundle() (build if needed)
    G->>VM: tart clone + run --no-graphics (macOS/Linux)
    G->>VM: pushDirAsTar (app bundle)
    G->>VM: start daemon (supervised)
    G->>VM: "start app --remote-debugging-port=9222"
    G->>VM: waitGuestPageTarget (poll /json/list via SSH)
    G->>V: "guestTunnel(ip, 9222) -> localPort"
    G->>V: set E2E_DESKTOP_CDP_PORT, E2E_DESKTOP_VM_IP

    V->>T: run scenario
    T->>T: "pageWsUrl(localPort) -> ws URL"
    T->>VM: CdpPage.connect (WebSocket via SSH tunnel)
    T->>VM: Runtime.enable, Page.enable
    T-->>VM: recordGuestScreen (concurrent)
    T->>VM: waitForText Integrations
    T->>VM: Page.captureScreenshot
    T->>VM: Runtime.evaluate body.innerText
    T->>T: expect toContain Integrations
    T->>T: await recording session.mp4
    T->>V: pass or skip

    V->>G: teardown
    G->>VM: forward.close()
    G->>VM: vm.discard() tart delete
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant V as vitest runner
    participant G as globalsetup (per-OS)
    participant VM as Guest VM
    participant T as desktop-vm scenario

    V->>G: run globalsetup
    G->>G: ensureBundle() (build if needed)
    G->>VM: tart clone + run --no-graphics (macOS/Linux)
    G->>VM: pushDirAsTar (app bundle)
    G->>VM: start daemon (supervised)
    G->>VM: "start app --remote-debugging-port=9222"
    G->>VM: waitGuestPageTarget (poll /json/list via SSH)
    G->>V: "guestTunnel(ip, 9222) -> localPort"
    G->>V: set E2E_DESKTOP_CDP_PORT, E2E_DESKTOP_VM_IP

    V->>T: run scenario
    T->>T: "pageWsUrl(localPort) -> ws URL"
    T->>VM: CdpPage.connect (WebSocket via SSH tunnel)
    T->>VM: Runtime.enable, Page.enable
    T-->>VM: recordGuestScreen (concurrent)
    T->>VM: waitForText Integrations
    T->>VM: Page.captureScreenshot
    T->>VM: Runtime.evaluate body.innerText
    T->>T: expect toContain Integrations
    T->>T: await recording session.mp4
    T->>V: pass or skip

    V->>G: teardown
    G->>VM: forward.close()
    G->>VM: vm.discard() tart delete
Loading

Reviews (2): Last reviewed commit: "e2e: cross-OS packaged-desktop targets (..." | Re-trigger Greptile

Comment thread e2e/src/journey/steps.ts Outdated
Comment on lines +148 to +161
case "run": {
const result = await execFileAsync("sh", ["-c", withBase(step.command, ctx.baseUrl)]).catch(
(error: { stdout?: string; stderr?: string }) => ({
stdout: error.stdout ?? "",
stderr: error.stderr ?? String(error),
}),
);
const output = `${result.stdout}${result.stderr}`;
if (step.contains !== undefined && !output.includes(step.contains)) {
throw new Error(
`\`run\` output did not contain ${JSON.stringify(step.contains)}\n${output.slice(0, 1000)}`,
);
}
return output.trim().slice(0, 2000);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 run step: live execution checks stdout+stderr; generated test checks only stdout

executeStep assembles output = stdout + stderr and checks step.contains against both streams. codegenStep emits const { stdout } = await execFileAsync(...) and asserts stdout.toContain(...). A command that writes the expected text only to stderr (e.g. many CLIs write progress/status there) passes during browse --contains exploration but produces a failing assertion in the promoted test — silently breaking the "live behavior and the test match" invariant the file header promises.

Comment thread e2e/src/vm/desktop.ts
Comment on lines +120 to +135
for (let i = 0; i < 40; i++) {
const ok = await new Promise<boolean>((resolve) => {
const sock = net.connect({ host: "127.0.0.1", port: localPort }, () => {
sock.destroy();
resolve(true);
});
sock.on("error", () => resolve(false));
sock.setTimeout(1000, () => {
sock.destroy();
resolve(false);
});
});
if (ok) break;
await sleep(500);
}
return { localPort, close: () => child.kill() };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 guestTunnel never throws, silently bypassing "skip honestly" degradation

The polling loop exits without error after 40 attempts (~20 s). attachOrProvision in desktop-vm.ts wraps the call in try/catch and degrades to a skip only if an exception is thrown. When guestTunnel returns quietly with a port that was never successfully forwarded (e.g. SSH bound the local port but exited after ConnectTimeout=8 because the guest was unreachable), E2E_DESKTOP_CDP_PORT is still set to the dead port, so the scenario runs instead of skipping. Compare tart.ts's waitLocalPort, which explicitly throws after exhausting its attempts.

Comment thread e2e/src/vm/desktop.ts
Comment on lines +176 to +179
const remote =
`S=${storage}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` +
`docker exec ${container} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` +
`ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Windows recording passes container and storage straight from env into a shell command string sent over SSH. A value like exec-win; rm -rf /storage would execute on the remote host. These are internal developer-controlled env vars so the blast radius is limited, but quoting them eliminates the risk entirely.

Suggested change
const remote =
`S=${storage}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` +
`docker exec ${container} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` +
`ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`;
const quoteSh = (v: string): string => `'${v.replaceAll("'", "'\\''")}'`;
const remote =
`S=${quoteSh(storage)}; rm -rf "$S/frames"; mkdir -p "$S/frames"; ` +
`docker exec ${quoteSh(container)} python3 -c "import base64;exec(base64.b64decode('${b64}'))"; ` +
`ffmpeg -y -framerate 4 -i "$S/frames/f%03d.ppm" -pix_fmt yuv420p -movflags +faststart "$S/win.mp4" >/dev/null 2>&1`;

@RhysSullivan RhysSullivan changed the title e2e: browse/promote authoring loop and cross-OS packaged-desktop targets e2e: cross-OS packaged-desktop targets (macOS/Linux/Windows), filmed Jun 28, 2026
Run the real electron-builder desktop bundle inside a guest VM on macOS, Linux,
and Windows, driven over a CDP tunnel and filmed. One shared scenario
(desktop-vm/console-renders.test.ts) and driver (src/vm/desktop.ts); each target
lands test.ts + session.mp4 + step screenshots in runs/<target>/. Only launch and
capture differ per OS:
  macOS:   autologin Aqua session, launchctl asuser, screencapture
  linux:   Xvfb + openbox, xdotool window resize, ffmpeg x11grab
  windows: dockur (QEMU) interactive session, QEMU screendump

macOS and Linux auto-provision a tart guest and build the bundle locally (the
executor binary cross-compiles via BUN_TARGET); Windows attaches to a dockur host
via E2E_DESKTOP_WIN_* env. Not in the default test chain; skips honestly without a
guest, like desktop-packaged skips without a display.

Also:
- Force tart SSH to password-only (PubkeyAuthentication=no, IdentitiesOnly=yes) so
  a loaded SSH agent does not exhaust the guest's MaxAuthTries, an intermittent
  failure the existing cli-{os} lanes also hit.
- build-sidecar keys the executable-bit chmod on the build target, not the host,
  so a windows-target cross-build no longer ENOENTs on a unix executor binary.
@RhysSullivan RhysSullivan force-pushed the e2e/authoring-loop-and-cross-os-desktop branch from 746f289 to 259e9d0 Compare June 28, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant