Skip to content

Commit 768123d

Browse files
committed
Stop reporting clean sidecar shutdowns as desktop crashes
A post-boot sidecar exit was only treated as expected when the desktop itself called stopSidecar (which sends SIGTERM). A SIGINT reaching the child any other way — most often a process-group signal racing ahead of teardown — exited the daemon and fell through to reportSidecarCrash plus the in-window crash screen. That surfaced as a spurious 'Sidecar exited unexpectedly (code=130 signal=null)' crash report on a healthy shutdown. Classify the post-boot exit on its exit code/signal alone: - Latch quit intent in before-quit (markAppQuitting), synchronously and before stopConnection's await, so any sidecar exit during teardown is treated as shutdown even if a signal beats the SIGTERM. - Treat a graceful or signal-driven exit (code 0/130/143, or a raw SIGINT/SIGTERM) as a clean shutdown: still surface the recovery screen, but don't report it. Abnormal exits (other non-zero codes, SIGKILL, a null code) still report. stderr is not consulted — it is a rolling session tail, a poor proxy for 'errored at exit'; a genuine internal fault is reported with a real stack by the sidecar's own reporter. Proven black-box: e2e/desktop/sidecar-clean-shutdown-not-reported.test.ts launches the real Electron app pointed at a local Sentry envelope sink (via a non-packaged EXECUTOR_DESKTOP_SENTRY_DSN seam) and asserts a SIGKILL IS reported while a SIGINT clean shutdown shows recovery but is not — with a second SIGKILL as a FIFO ordering fence so the negative is conclusive.
1 parent dcb121d commit 768123d

5 files changed

Lines changed: 314 additions & 5 deletions

File tree

apps/desktop/src/main/diagnostics.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,13 @@ import log from "electron-log/main.js";
2525
import * as Sentry from "@sentry/electron/main";
2626
import { getServerSettings } from "./settings";
2727

28-
const sentryDsn = __EXECUTOR_SENTRY_DSN__;
28+
// Packaged builds always use the DSN baked in at build time. The non-packaged
29+
// override is a test seam: the desktop e2e points crash reporting at a local
30+
// sink so it can assert what is (and isn't) reported. It can never redirect a
31+
// real user's reports — production is always packaged, where the override is
32+
// dead and the baked DSN wins.
33+
const sentryDsn =
34+
(app.isPackaged ? undefined : process.env.EXECUTOR_DESKTOP_SENTRY_DSN) || __EXECUTOR_SENTRY_DSN__;
2935

3036
// The informal cross-tool opt-out (consoledonottrack.com). Checked before
3137
// any SDK initializes, and it covers all three processes because the

apps/desktop/src/main/index.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import {
2323
startSidecar,
2424
stopSidecar,
2525
onUnexpectedSidecarExit,
26+
markAppQuitting,
2627
SidecarPortInUseError,
2728
type SidecarConnection,
2829
} from "./sidecar";
@@ -888,6 +889,11 @@ if (ensureSingleInstance()) {
888889
});
889890

890891
app.on("before-quit", async (event) => {
892+
// Latch the quit intent first, synchronously: from here on, a sidecar exit
893+
// is part of shutdown. A process-group SIGINT can race ahead of the SIGTERM
894+
// stopConnection sends below, and without this it would be misreported as a
895+
// crash (the Sentry "Sidecar exited unexpectedly (code=130)" false positive).
896+
markAppQuitting();
891897
if (!connection) return;
892898
// A supervised daemon must keep serving after the app quits — don't stop it,
893899
// and don't block the quit on teardown we don't need to do.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/**
2+
* Pure classification of a sidecar's *post-boot* exit (the child died after it
3+
* had already reported ready). Kept free of electron/log imports so the rule is
4+
* unit-testable in isolation — `sidecar.ts` owns the I/O around it.
5+
*
6+
* The trap this guards against: the desktop only marks an exit "expected" when
7+
* it stopped the child itself via `stopSidecar` (which sends SIGTERM). A signal
8+
* that arrives any other way — most commonly a process-group SIGINT racing
9+
* ahead of teardown — would otherwise be filed as a crash, painting the crash
10+
* screen over (and Sentry-reporting) what was really a clean shutdown.
11+
*
12+
* Classification is on the exit code/signal alone — the OS's verdict on whether
13+
* the process ended normally. We deliberately do NOT inspect stderr: the buffer
14+
* is a rolling tail of the whole session (a startup log line would make it
15+
* non-empty), so it is a poor proxy for "errored at exit". A genuine internal
16+
* fault in the daemon is reported with a real stack by the sidecar's own
17+
* crash reporter (@sentry/bun, see sidecar/server.ts); this main-process path
18+
* is only the coarse "the child vanished" net, so it should fire on abnormal
19+
* exits and stay quiet on clean ones.
20+
*/
21+
22+
// Exit codes a process reports when it shuts down through its own handler:
23+
// 0 is a graceful success; 130 = 128 + SIGINT(2) and 143 = 128 + SIGTERM(15)
24+
// are the conventional codes an interrupted program exits with (an Effect
25+
// `runMain` program interrupted by SIGINT exits 130).
26+
const CLEAN_SHUTDOWN_EXIT_CODES: ReadonlySet<number> = new Set([0, 130, 143]);
27+
28+
// When no handler runs, the process dies on the raw signal instead and Node
29+
// reports the signal name with a null code. SIGINT/SIGTERM are orderly stop
30+
// requests; SIGKILL/SIGSEGV/etc. are not and fall through to "crash".
31+
const CLEAN_SHUTDOWN_SIGNALS: ReadonlySet<string> = new Set(["SIGINT", "SIGTERM"]);
32+
33+
export type PostBootSidecarExitDecision =
34+
/** We asked it to stop, or the app is quitting — say nothing, do nothing. */
35+
| { readonly kind: "expected" }
36+
/**
37+
* A graceful or signal-driven exit (code 0/130/143, or a raw SIGINT/SIGTERM)
38+
* that bypassed `stopSidecar`. The web UI is dead so the recovery screen still
39+
* has to show, but this is a healthy shutdown and must not be reported.
40+
*/
41+
| { readonly kind: "recover" }
42+
/** A genuine fault (abnormal exit / hard kill): surface recovery UI *and* report. */
43+
| { readonly kind: "crash" };
44+
45+
export const classifyPostBootSidecarExit = (input: {
46+
/** The child was passed to `stopSidecar` (we initiated this stop). */
47+
readonly expected: boolean;
48+
/** `before-quit` has fired — the whole app is tearing down. */
49+
readonly appQuitting: boolean;
50+
readonly code: number | null;
51+
readonly signal: string | null;
52+
}): PostBootSidecarExitDecision => {
53+
if (input.expected || input.appQuitting) return { kind: "expected" };
54+
const cleanShutdown =
55+
(input.code !== null && CLEAN_SHUTDOWN_EXIT_CODES.has(input.code)) ||
56+
(input.signal !== null && CLEAN_SHUTDOWN_SIGNALS.has(input.signal));
57+
return cleanShutdown ? { kind: "recover" } : { kind: "crash" };
58+
};

apps/desktop/src/main/sidecar.ts

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ import {
2424
import { loadOrMintLocalAuthToken } from "./local-auth";
2525
import { getServerSettings } from "./settings";
2626
import { reportSidecarCrash, sidecarCrashReportingEnv } from "./diagnostics";
27+
import { classifyPostBootSidecarExit } from "./sidecar-exit";
2728
import { SERVER_SETTINGS_USERNAME, type DesktopServerSettings } from "../shared/server-settings";
2829

2930
// Sidecar output is echoed to the terminal (visible when Electron is run
@@ -39,6 +40,15 @@ const STDERR_TAIL_LIMIT = 8 * 1024;
3940
// their exits are expected and must not be reported as crashes.
4041
const expectedExits = new WeakSet<ChildProcess>();
4142

43+
// Flipped by main/index.ts the moment the app begins quitting. A sidecar exit
44+
// during teardown is part of shutdown even when a process-group signal (SIGINT)
45+
// races ahead of stopSidecar's SIGTERM — so once this is set, no post-boot exit
46+
// is reported as a crash. One-way latch: the process is on its way out.
47+
let appQuitting = false;
48+
export const markAppQuitting = (): void => {
49+
appQuitting = true;
50+
};
51+
4252
// Main/index.ts subscribes to swap the dead web UI for the in-window crash
4353
// screen. A callback (not an import) keeps this module free of window
4454
// concerns.
@@ -409,13 +419,29 @@ export async function startSidecar(options: StartOptions = {}): Promise<SidecarC
409419

410420
const onExit = (code: number | null, signal: NodeJS.Signals | null) => {
411421
if (resolved) {
412-
// Post-boot exit: expected when we stopped it ourselves (quit,
413-
// restart, update); anything else is a sidecar crash under a live
414-
// window — log it and report upstream with the stderr tail.
415-
if (expectedExits.has(child)) {
422+
// Post-boot exit. Expected when we stopped it ourselves (quit, restart,
423+
// update) or the app is mid-quit; a clean signal-driven exit that raced
424+
// past stopSidecar still needs the recovery screen but isn't a crash;
425+
// anything else is a real fault — report it with the stderr tail.
426+
const decision = classifyPostBootSidecarExit({
427+
expected: expectedExits.has(child),
428+
appQuitting,
429+
code,
430+
signal,
431+
});
432+
if (decision.kind === "expected") {
416433
sidecarLog.info(`exited (code=${code} signal=${signal})`);
417434
return;
418435
}
436+
if (decision.kind === "recover") {
437+
// A clean exit (code 0/130/143 or a raw SIGINT/SIGTERM) that bypassed
438+
// stopSidecar — e.g. a process-group signal during teardown. The web
439+
// UI is dead, so surface the recovery screen, but don't file a healthy
440+
// shutdown as a crash.
441+
sidecarLog.warn(`exited via clean shutdown (code=${code} signal=${signal})`);
442+
unexpectedExitListener?.();
443+
return;
444+
}
419445
const message = `Sidecar exited unexpectedly (code=${code} signal=${signal})`;
420446
sidecarLog.error(message);
421447
reportSidecarCrash(message, stderrBuffer);
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
// Desktop-only: proves the *telemetry* contract around a dying sidecar — the
2+
// distinction the on-screen crash flow can't show, because a clean shutdown and
3+
// a hard kill paint the SAME recovery screen. The only observable difference is
4+
// what reaches Sentry, so this launches the real Electron app pointed at a local
5+
// Sentry envelope sink (via the non-packaged EXECUTOR_DESKTOP_SENTRY_DSN seam)
6+
// and asserts:
7+
// - SIGKILL (hard kill) → a "Sidecar exited unexpectedly" crash IS reported
8+
// - SIGINT (clean stop) → the recovery screen shows but NOTHING is reported
9+
//
10+
// The negative is made conclusive with a fence: a second SIGKILL after the
11+
// SIGINT. Sentry's transport is FIFO per client, so once the fence's crash
12+
// envelope has arrived, any envelope the SIGINT would have produced (enqueued
13+
// earlier) must already be present too — its absence is real, not just slow.
14+
import { execFile } from "node:child_process";
15+
import { existsSync, mkdtempSync, readFileSync, rmSync } from "node:fs";
16+
import { createServer, type Server } from "node:http";
17+
import { createRequire } from "node:module";
18+
import type { AddressInfo } from "node:net";
19+
import { tmpdir } from "node:os";
20+
import { join } from "node:path";
21+
import { fileURLToPath } from "node:url";
22+
import { promisify } from "node:util";
23+
import { gunzipSync } from "node:zlib";
24+
25+
import { expect } from "@effect/vitest";
26+
import { Effect } from "effect";
27+
import { _electron } from "playwright";
28+
29+
import { scenario } from "../src/scenario";
30+
import { RunDir } from "../src/services";
31+
32+
const appDir = fileURLToPath(new URL("../../apps/desktop/", import.meta.url));
33+
const electronBinary = createRequire(join(appDir, "package.json"))("electron") as string;
34+
35+
const SIDECAR_CRASH_MESSAGE = "Sidecar exited unexpectedly";
36+
37+
scenario(
38+
"Desktop · a clean sidecar shutdown recovers without a crash report",
39+
{ timeout: 300_000 },
40+
Effect.gen(function* () {
41+
const runDir = yield* RunDir;
42+
yield* Effect.promise(() => run(runDir));
43+
}),
44+
);
45+
46+
/** Minimal Sentry ingest: every POSTed envelope body (gunzipped) is buffered. */
47+
const startSentrySink = async (): Promise<{
48+
server: Server;
49+
dsn: string;
50+
envelopes: string[];
51+
}> => {
52+
const envelopes: string[] = [];
53+
const server = createServer((req, res) => {
54+
if (req.method !== "POST") {
55+
res.writeHead(200);
56+
res.end("ok");
57+
return;
58+
}
59+
const chunks: Buffer[] = [];
60+
req.on("data", (chunk: Buffer) => chunks.push(chunk));
61+
req.on("end", () => {
62+
let body = Buffer.concat(chunks);
63+
if (req.headers["content-encoding"] === "gzip") {
64+
// oxlint-disable-next-line executor/no-try-catch-or-throw -- boundary: tolerate a non-gzip body rather than dropping the envelope
65+
try {
66+
body = gunzipSync(body);
67+
} catch {
68+
// fall through with the raw bytes
69+
}
70+
}
71+
envelopes.push(body.toString("utf8"));
72+
res.writeHead(200, { "content-type": "application/json" });
73+
res.end(JSON.stringify({ id: "e2e" }));
74+
});
75+
});
76+
await new Promise<void>((resolve) => server.listen(0, "127.0.0.1", () => resolve()));
77+
const { port } = server.address() as AddressInfo;
78+
// DSN shape: http://<publicKey>@<host>/<projectId> — the SDK POSTs envelopes
79+
// to http://<host>/api/<projectId>/envelope/, which the sink accepts wholesale.
80+
return { server, dsn: `http://e2e@127.0.0.1:${port}/1`, envelopes };
81+
};
82+
83+
const run = async (runDir: string) => {
84+
const { server: sink, dsn, envelopes } = await startSentrySink();
85+
const home = mkdtempSync(join(tmpdir(), "executor-desktop-sentry-e2e-"));
86+
const videoTmp = join(runDir, ".video-tmp");
87+
let stepIndex = 0;
88+
89+
const crashReports = () => envelopes.filter((e) => e.includes(SIDECAR_CRASH_MESSAGE));
90+
const sigkillReports = () =>
91+
crashReports().filter((e) => e.includes(`${SIDECAR_CRASH_MESSAGE} (code=null signal=SIGKILL)`));
92+
93+
const waitFor = async (predicate: () => boolean, label: string, timeoutMs = 60_000) => {
94+
const start = Date.now();
95+
while (!predicate()) {
96+
if (Date.now() - start > timeoutMs) {
97+
throw new Error(`timed out waiting for ${label}`);
98+
}
99+
await new Promise((resolve) => setTimeout(resolve, 200));
100+
}
101+
};
102+
103+
const app = await _electron.launch({
104+
executablePath: electronBinary,
105+
args: [appDir],
106+
cwd: appDir,
107+
// EXECUTOR_DESKTOP_SENTRY_DSN turns on main-process crash reporting in the
108+
// non-packaged build and routes it at the sink. HOME isolates a fresh data
109+
// dir from any real install on this machine.
110+
env: { ...process.env, HOME: home, EXECUTOR_DESKTOP_SENTRY_DSN: dsn },
111+
recordVideo: { dir: videoTmp, size: { width: 1280, height: 800 } },
112+
timeout: 120_000,
113+
});
114+
115+
const manifestPath = join(home, ".executor/server-control/server.json");
116+
const sidecarPid = (): number => {
117+
const manifest = JSON.parse(readFileSync(manifestPath, "utf8")) as { pid: number };
118+
expect(manifest.pid, "sidecar pid recorded in the server manifest").toBeGreaterThan(0);
119+
return manifest.pid;
120+
};
121+
122+
try {
123+
const page = await app.firstWindow({ timeout: 120_000 });
124+
const step = async (label: string, body: () => Promise<void>) => {
125+
await body();
126+
stepIndex += 1;
127+
const slug = label.toLowerCase().replace(/[^a-z0-9]+/g, "-");
128+
await page.screenshot({
129+
path: join(runDir, `${String(stepIndex).padStart(2, "0")}-${slug}.png`),
130+
});
131+
};
132+
133+
await step("app boots into the web console", async () => {
134+
await page.getByText("Settings").first().waitFor({ timeout: 120_000 });
135+
});
136+
137+
// SIGKILL #1 — the report path. Establishes that the sink works and a hard
138+
// kill IS reported, which also calibrates that envelopes arrive in time.
139+
await step("a hard kill (SIGKILL) is reported as a crash", async () => {
140+
process.kill(sidecarPid(), "SIGKILL");
141+
await page.getByText("stopped unexpectedly").waitFor({ timeout: 30_000 });
142+
await waitFor(() => sigkillReports().length >= 1, "the first SIGKILL crash report");
143+
});
144+
145+
await step("restart heals the app", async () => {
146+
await page.locator("#restart").click();
147+
await page.getByText("Settings").first().waitFor({ timeout: 120_000 });
148+
});
149+
150+
// SIGINT — the clean-shutdown path. The dev sidecar handles SIGINT and exits
151+
// 0; the app surfaces the SAME recovery screen, but this must NOT be reported.
152+
await step("a clean stop (SIGINT) shows recovery but is not reported", async () => {
153+
process.kill(sidecarPid(), "SIGINT");
154+
await page.getByText("stopped unexpectedly").waitFor({ timeout: 30_000 });
155+
});
156+
157+
await step("restart heals the app again", async () => {
158+
await page.locator("#restart").click();
159+
await page.getByText("Settings").first().waitFor({ timeout: 120_000 });
160+
});
161+
162+
// SIGKILL #2 — the fence. Once its crash envelope has landed, FIFO ordering
163+
// guarantees any envelope the SIGINT would have produced is already here.
164+
await step("fence: a second hard kill is reported", async () => {
165+
process.kill(sidecarPid(), "SIGKILL");
166+
await page.getByText("stopped unexpectedly").waitFor({ timeout: 30_000 });
167+
await waitFor(() => sigkillReports().length >= 2, "the fence SIGKILL crash report");
168+
});
169+
170+
// The verdict: every sidecar-exit crash reported was a SIGKILL — the SIGINT
171+
// in between contributed nothing.
172+
const reports = crashReports();
173+
expect(
174+
reports.length,
175+
`only the two SIGKILLs were reported, got: ${JSON.stringify(reports.map(firstLine))}`,
176+
).toBe(2);
177+
expect(
178+
reports.every((e) => e.includes("signal=SIGKILL")),
179+
"no clean SIGINT shutdown (code=0 / signal=null) was reported as a crash",
180+
).toBe(true);
181+
} finally {
182+
const page = app.windows()[0];
183+
const video = page?.video();
184+
await app.close().catch(() => {});
185+
await new Promise<void>((resolve) => sink.close(() => resolve()));
186+
const recordedPath = await video?.path().catch(() => undefined);
187+
if (recordedPath && existsSync(recordedPath)) {
188+
await promisify(execFile)("ffmpeg", [
189+
"-y",
190+
"-i",
191+
recordedPath,
192+
"-c:v",
193+
"libx264",
194+
"-preset",
195+
"veryfast",
196+
"-crf",
197+
"26",
198+
"-pix_fmt",
199+
"yuv420p",
200+
"-movflags",
201+
"+faststart",
202+
join(runDir, "session.mp4"),
203+
]).catch(() => {});
204+
}
205+
rmSync(videoTmp, { recursive: true, force: true });
206+
rmSync(home, { recursive: true, force: true });
207+
}
208+
};
209+
210+
const firstLine = (envelope: string): string => {
211+
const match = envelope.match(new RegExp(`${SIDECAR_CRASH_MESSAGE}[^"\\\\]*`));
212+
return match ? match[0] : envelope.slice(0, 80);
213+
};

0 commit comments

Comments
 (0)