test(ci): schedule a mid-test snapshot 150ms after every beforeEach#7842
Conversation
Run 26401801404 (PR #7841 after merging develop) captured the dying test's beforeEach node-report — be-0258, written 75ms after socketio.ts > "Pad-wide settings creator gate different browsers" entered — but no further state. The kill landed 321ms into the test body, between 1 Hz heartbeat ticks, and the 100ms boundary throttle prevented further beforeEach writes inside the same test. The report we have shows only the listening server socket; the connections that the test body creates (and that presumably precede the kill) never get snapshotted. Schedule an unref'd setTimeout from beforeEach that fires 150ms after the test entered. If it's still the running test at fire time (i.e. slow enough that the death window applies), capture a node-report from INSIDE the test body — the moment when the real TCP / socket.io activity is in flight. Fast tests (<150ms) skip the write because afterEach has already cleared currentTest by the time the timer fires. Result on the next reproduction of the death pattern: - be-NNNN report at +75ms (beforeEach, body not yet started) - mt-MMMM report at +150ms (mid-test, body in flight, before kill at +320ms) - kill, no further reports Cost: only slow tests (>150ms) generate an mt report, so the artifact size growth is bounded by the count of tests that take longer than 150ms — typically a small minority. Locally verified against a 3-test probe: 2 fast tests skipped, 1 300ms test produced the expected mt-NNNN snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoAdd mid-test diagnostic snapshot 150ms after test start
WalkthroughsDescription• Schedule mid-test snapshot 150ms after beforeEach • Capture node-report from inside test body during execution • Diagnose slow tests with TCP/socket.io activity • Fast tests skip write, only slow tests generate reports Diagramflowchart LR
BE["beforeEach hook"] -->|schedule unref'd timer| TIMER["setTimeout 150ms"]
TIMER -->|fire at 150ms| CHECK{"currentTest still running?"}
CHECK -->|yes, slow test| REPORT["Write mt-NNNN report"]
CHECK -->|no, fast test| SKIP["Skip write"]
REPORT --> DIAG["Capture TCP/socket.io state"]
File Changes1. src/tests/backend/diagnostics.ts
|
Code Review by Qodo
1.
|
Run 26402211271 produced be-0207 (the dying test's beforeEach snapshot) but no mid-test snapshot, even though setTimeout(150ms) was scheduled and the test body lived another 280 ms after that deadline. setTimeout under Windows-runner load is being starved past the deadline — we already saw the previous test's mt fire at +252 ms (102 ms late) when the deadline was 150 ms, so the dying test's timer was likely scheduled to fire well after the kill at +425 ms. setInterval has fired reliably throughout the investigation (every heartbeat in every run lands within ~1 s of schedule, even when setTimeout misses). Bump heartbeat to 200 ms (5 Hz) so any death window ≥200 ms is sampled inside the test body, independent of how starved setTimeout is. Cost on the Windows runner: the existing log shows writeReport completes in <1 ms (from "Writing Node.js report" to "Node.js report completed" timestamps), so 5 Hz adds ~5 ms/s of overhead. Artifact growth: ~500 reports for a 100 s test phase (~25 MB raw, ~5 MB compressed). The setTimeout mid-test snapshot stays — it's belt-and-suspenders cheap and fires for slow tests where the heartbeat alone might not align with the death window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Report Qodo flagged two real issues on #7842: 1. The single shared `lastReportT` made `mt` writes poison the `be` throttle window. Slow tests trigger an `mt` write at +150 ms, then the test ends a few ms later, and the NEXT test's `beforeEach` landed within the 100 ms throttle from the `mt` write — so its own `be` snapshot was suppressed. That's the exact boundary coverage the throttle is supposed to PROTECT. Local repro with a 180 ms slow test followed by a fast one confirmed: the fast test's `be-0004` is now captured instead of swallowed. Fix: split into `lastBoundaryT` used and updated only by `be` writes. `hb` and `mt` pass `updateThrottle=false` and never advance the boundary timestamp. 2. `setTimeout` was being scheduled in `beforeEach` for every test even when `canWriteReport` is false (Linux backend matrix, local dev). That's a wasted timer per test for no possible diagnostic output. Gate the schedule itself on `canWriteReport`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Follow-up to #7838. Schedule an unref'd
setTimeoutfrombeforeEachthat fires 150 ms after a test enters: if it's still the running test at fire time (i.e. slow enough that the death window applies), capture a node-report from INSIDE the test body.Why
Run 26401801404 captured the dying test's
beforeEachnode-report —be-0258, written 75 ms aftersocketio.ts > Pad-wide settings creator gate different browsersentered — but no further state. The kill landed 321 ms into the test body, between 1 Hz heartbeat ticks, and the 100 ms boundary throttle prevented furtherbeforeEachwrites inside the same test. The report we have shows only the listening server socket; the connections that the test body creates (and that presumably precede the kill) never get snapshotted.What this changes
Pure diagnostic. No behavior change on successful runs.
mochaHooks.beforeEachschedulessetTimeout(150ms)and capturescurrentTestin closure.unref()-d so it never holds the loop open.currentTest === enteredTest, write a node-report (prefixmt-). Fast tests (<150 ms) have already moved on — afterEach cleared the pointer — so the write is skipped.Expected result on next reproduction of the kill pattern
The mt-MMMM snapshot is the V8 stack + libuv handle snapshot we've been chasing — the one with the open TCP / socket.io connections that the kill correlates with.
Cost
Only slow tests (>150 ms) generate an
mt-report, so artifact growth is bounded by the count of slow tests — typically a small minority. Most tests fire-and-skip.Locally verified
3-test probe (10 ms, 300 ms, 5 ms tests):
🤖 Generated with Claude Code