Skip to content

Commit aa79a34

Browse files
JohnMcLearclaude
andcommitted
test(ci): heartbeat + running-test pointer in backend test diagnostics
Backend tests silently die with code 255 mid-suite ~22% of the time on develop (most often Windows-with-plugins, Node 24). Each kill lands 300±50 ms after the previous test's clean ✔ teardown line and produces no failing-test marker, no error, no Mocha summary, and — despite the unconditional handlers in `diagnostics.ts` — none of the JS-level death events fire either. Recent example: run 26311025244 (`Windows with Plugins (24)`); both attempts crashed at completely different "last test" locations, so the dying test itself isn't to blame. The existing diagnostics only set lastSeenTest in afterEach, so if the kill lands during the NEXT test's setup or body — which is exactly the ~300ms gap we observe — the pointer reads as the previous (passing) test. That hides whether we're between tests or inside one, and which one. Two changes: 1. Track currentTest in beforeEach as well as lastFinishedTest in afterEach. Every diag line now carries both, so the death point is bracketable regardless of which lifecycle phase the kill interrupts. 2. Add a 1Hz heartbeat that writeSyncs the running-test name plus `process.memoryUsage()` (rss, heap) and the active-handle and active-request counts. The interval is unref'd so it never holds the event loop open by itself. Cost is roughly one extra log line per second of mocha runtime (~60-120 lines per CI run). When the next failure fires, the last heartbeat narrows the kill window to ≤1s, the running pointer names the test on the rails at that moment, and the handle/memory trace gives a sparkline that exposes sudden spikes — a leaked socket, an unref'd timer, a runaway map — that would otherwise be invisible at the runner-log level. No behavior change on successful runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d106ed9 commit aa79a34

1 file changed

Lines changed: 51 additions & 12 deletions

File tree

src/tests/backend/diagnostics.ts

Lines changed: 51 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,14 @@
1010
// in a way that bypassed JS handlers — SIGKILL, OOM, or a fatal native
1111
// error — OR mocha itself called process.exit before the handlers ran.
1212
//
13+
// Subsequent runs on develop (e.g. 26311025244, Windows with plugins,
14+
// Node 24) reproduced the same fingerprint: `[diag +0ms] diagnostics
15+
// loaded` lands but no other diag line appears before the silent
16+
// ELIFECYCLE. Death lands 300±50 ms after the previous test's teardown
17+
// and the lastSeenTest pointer (only updated in afterEach) tells us
18+
// nothing about whether the next test had started or whether mocha was
19+
// between tests at the kill moment.
20+
//
1321
// This file:
1422
// 1. Registers handlers UNCONDITIONALLY at mocha startup (common.ts is
1523
// only imported by ~27 of 47 specs, so its handlers may register
@@ -18,9 +26,17 @@
1826
// complete before the kernel returns from the syscall, so the line
1927
// lands in the runner log even if the process is killed
2028
// milliseconds later.
21-
// 3. Tracks the last-seen test via a mocha root afterEach hook so the
22-
// death point is identified.
23-
// 4. Logs exit-related events so we can discriminate:
29+
// 3. Tracks BOTH the currently-running test (set in beforeEach) and the
30+
// last-finished test (set in afterEach), so the death point can be
31+
// bracketed even when the kill lands inside the next test's setup or
32+
// body, before its own afterEach has a chance to update the pointer.
33+
// 4. Emits a 1Hz heartbeat carrying the running-test name, RSS / heap
34+
// usage, and active handle / request counts. If the process dies
35+
// without firing any JS handler, the last heartbeat narrows the
36+
// kill window to <=1s and the handle-count trace exposes leaks
37+
// (sockets, timers, native bindings) that would otherwise be
38+
// invisible at the runner-log level.
39+
// 5. Logs exit-related events so we can discriminate:
2440
// beforeExit + exit -> clean event-loop drain (Linux CI, local)
2541
// only exit -> process.exit() called — expected when mocha
2642
// is launched with --exit (the Windows CI
@@ -35,7 +51,8 @@
3551
import {writeSync} from 'node:fs';
3652

3753
const t0 = Date.now();
38-
let lastSeenTest = '<no test seen yet>';
54+
let currentTest = '<no test running>';
55+
let lastFinishedTest = '<no test finished yet>';
3956

4057
const diag = (msg: string): void => {
4158
const line = `[diag +${Date.now() - t0}ms] ${msg}\n`;
@@ -48,18 +65,33 @@ const diag = (msg: string): void => {
4865

4966
diag('diagnostics loaded');
5067

68+
// Heartbeat. unref()'d so it never holds the event loop open by itself —
69+
// it only fires if mocha is otherwise alive. The interval cadence (1Hz) is
70+
// the trade-off between log noise (~60-120 extra lines per run) and how
71+
// tightly we can bracket the kill timestamp.
72+
const heartbeat = setInterval(() => {
73+
const mem = process.memoryUsage();
74+
const handles = (process as any)._getActiveHandles?.().length ?? -1;
75+
const requests = (process as any)._getActiveRequests?.().length ?? -1;
76+
diag(`hb running="${currentTest}" lastFinished="${lastFinishedTest}" ` +
77+
`rss=${Math.round(mem.rss / 1024 / 1024)}M ` +
78+
`heap=${Math.round(mem.heapUsed / 1024 / 1024)}M ` +
79+
`handles=${handles} requests=${requests}`);
80+
}, 1000);
81+
heartbeat.unref();
82+
5183
process.on('unhandledRejection', (reason: any) => {
5284
diag(`unhandledRejection: ${
5385
reason && reason.stack ? reason.stack : String(reason)
54-
} (lastTest="${lastSeenTest}")`);
86+
} (running="${currentTest}", lastFinished="${lastFinishedTest}")`);
5587
// Re-throw so existing common.ts handlers / mocha behavior is preserved.
5688
throw reason;
5789
});
5890

5991
process.on('uncaughtException', (err: any) => {
6092
diag(`uncaughtException: ${
6193
err && err.stack ? err.stack : String(err)
62-
} (lastTest="${lastSeenTest}")`);
94+
} (running="${currentTest}", lastFinished="${lastFinishedTest}")`);
6395
// Force fail-fast. Specs that don't import common.ts only have THIS handler,
6496
// and Node won't exit on its own once an uncaughtException listener is
6597
// registered. Without the explicit exit a fatal error would be swallowed.
@@ -69,18 +101,18 @@ process.on('uncaughtException', (err: any) => {
69101

70102
process.on('beforeExit', (code: number) => {
71103
diag(`beforeExit code=${code} exitCode=${process.exitCode} ` +
72-
`lastTest="${lastSeenTest}"`);
104+
`running="${currentTest}" lastFinished="${lastFinishedTest}"`);
73105
});
74106

75107
process.on('exit', (code: number) => {
76-
diag(`exit code=${code} lastTest="${lastSeenTest}"`);
108+
diag(`exit code=${code} running="${currentTest}" lastFinished="${lastFinishedTest}"`);
77109
});
78110

79111
for (const sig of ['SIGINT', 'SIGTERM', 'SIGHUP', 'SIGBREAK'] as const) {
80112
// SIGHUP / SIGBREAK don't exist on every platform; ignore registration errors.
81113
try {
82114
process.on(sig as any, () => {
83-
diag(`received ${sig} (lastTest="${lastSeenTest}")`);
115+
diag(`received ${sig} (running="${currentTest}", lastFinished="${lastFinishedTest}")`);
84116
// Let the default behavior (exit) happen.
85117
process.exit(128);
86118
});
@@ -89,12 +121,19 @@ for (const sig of ['SIGINT', 'SIGTERM', 'SIGHUP', 'SIGBREAK'] as const) {
89121
}
90122
}
91123

92-
// Mocha root hook — only registered if mocha picks up this file via --require.
93-
// We track the most recently-finished test so the death point is visible.
124+
// Mocha root hooks — only registered if mocha picks up this file via --require.
125+
// beforeEach sets the running pointer so a mid-test kill is attributable to a
126+
// specific test, not just the previous one that successfully finished.
94127
export const mochaHooks = {
128+
beforeEach(this: any) {
129+
if (this.currentTest) {
130+
currentTest = this.currentTest.fullTitle();
131+
}
132+
},
95133
afterEach(this: any) {
96134
if (this.currentTest) {
97-
lastSeenTest = this.currentTest.fullTitle();
135+
lastFinishedTest = this.currentTest.fullTitle();
136+
currentTest = '<no test running>';
98137
}
99138
},
100139
};

0 commit comments

Comments
 (0)