Skip to content

test(ci): schedule a mid-test snapshot 150ms after every beforeEach#7842

Merged
JohnMcLear merged 3 commits into
developfrom
test-ci-diag-mid-test-snapshot
May 25, 2026
Merged

test(ci): schedule a mid-test snapshot 150ms after every beforeEach#7842
JohnMcLear merged 3 commits into
developfrom
test-ci-diag-mid-test-snapshot

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Follow-up to #7838. Schedule an unref'd setTimeout from beforeEach that fires 150 ms after a test enters: if it's still the running test at fire time (i.e. slow enough that the death window applies), capture a node-report from INSIDE the test body.

Why

Run 26401801404 captured the dying test's beforeEach node-report — be-0258, written 75 ms after socketio.ts > Pad-wide settings creator gate different browsers entered — but no further state. The kill landed 321 ms into the test body, between 1 Hz heartbeat ticks, and the 100 ms boundary throttle prevented further beforeEach writes inside the same test. The report we have shows only the listening server socket; the connections that the test body creates (and that presumably precede the kill) never get snapshotted.

What this changes

Pure diagnostic. No behavior change on successful runs.

  • mochaHooks.beforeEach schedules setTimeout(150ms) and captures currentTest in closure.
  • Timer is unref()-d so it never holds the loop open.
  • On fire, if currentTest === enteredTest, write a node-report (prefix mt-). Fast tests (<150 ms) have already moved on — afterEach cleared the pointer — so the write is skipped.

Expected result on next reproduction of the kill pattern

be-NNNN report  +75 ms   (beforeEach, body not yet started)
mt-MMMM report  +150 ms  (mid-test, body in flight, before kill at +320 ms)
kill            +320 ms  (no further reports)

The mt-MMMM snapshot is the V8 stack + libuv handle snapshot we've been chasing — the one with the open TCP / socket.io connections that the kill correlates with.

Cost

Only slow tests (>150 ms) generate an mt- report, so artifact growth is bounded by the count of slow tests — typically a small minority. Most tests fire-and-skip.

Locally verified

3-test probe (10 ms, 300 ms, 5 ms tests):

[diag +9ms]   test start: fast (10ms)
[diag +46ms]  test start: slow (300ms)
[diag +347ms] test start: fast 2

--- files ---
be-0001-fast_10ms.json
mt-0002-slow_300ms.json   ← only the slow test triggered the mid-test write
be-0003-fast_2.json

🤖 Generated with Claude Code

Run 26401801404 (PR #7841 after merging develop) captured the dying
test's beforeEach node-report — be-0258, written 75ms after
socketio.ts > "Pad-wide settings creator gate different browsers"
entered — but no further state. The kill landed 321ms into the test
body, between 1 Hz heartbeat ticks, and the 100ms boundary throttle
prevented further beforeEach writes inside the same test. The report
we have shows only the listening server socket; the connections that
the test body creates (and that presumably precede the kill) never
get snapshotted.

Schedule an unref'd setTimeout from beforeEach that fires 150ms after
the test entered. If it's still the running test at fire time (i.e.
slow enough that the death window applies), capture a node-report
from INSIDE the test body — the moment when the real TCP / socket.io
activity is in flight. Fast tests (<150ms) skip the write because
afterEach has already cleared currentTest by the time the timer
fires.

Result on the next reproduction of the death pattern:
  - be-NNNN report at +75ms (beforeEach, body not yet started)
  - mt-MMMM report at +150ms (mid-test, body in flight, before kill
    at +320ms)
  - kill, no further reports

Cost: only slow tests (>150ms) generate an mt report, so the
artifact size growth is bounded by the count of tests that take
longer than 150ms — typically a small minority. Locally verified
against a 3-test probe: 2 fast tests skipped, 1 300ms test produced
the expected mt-NNNN snapshot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Add mid-test diagnostic snapshot 150ms after test start

🧪 Tests

Grey Divider

Walkthroughs

Description
• Schedule mid-test snapshot 150ms after beforeEach
• Capture node-report from inside test body during execution
• Diagnose slow tests with TCP/socket.io activity
• Fast tests skip write, only slow tests generate reports
Diagram
flowchart LR
  BE["beforeEach hook"] -->|schedule unref'd timer| TIMER["setTimeout 150ms"]
  TIMER -->|fire at 150ms| CHECK{"currentTest still running?"}
  CHECK -->|yes, slow test| REPORT["Write mt-NNNN report"]
  CHECK -->|no, fast test| SKIP["Skip write"]
  REPORT --> DIAG["Capture TCP/socket.io state"]

Loading

File Changes

1. src/tests/backend/diagnostics.ts 🧪 Tests +18/-0

Add mid-test diagnostic snapshot scheduling

• Added mid-test snapshot scheduling in beforeEach hook
• Creates unref'd setTimeout that fires 150ms after test entry
• Checks if test is still running before writing mt- prefixed report
• Captures node-report from inside test body for slow tests only

src/tests/backend/diagnostics.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 25, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Mid-snapshot suppresses boundary report ✓ Resolved 🐞 Bug ◔ Observability
Description
The new tryWriteReport('mt', 0) updates the shared lastReportT timestamp, which can cause the next
test's tryWriteReport('be', 100) to be throttled and skipped if it starts within 100ms after the
mid-test snapshot. This reduces boundary report coverage specifically after slow tests (the exact
tests most likely to generate mt reports).
Code

src/tests/backend/diagnostics.ts[R220-222]

Evidence
The mt write is added in beforeEach and calls tryWriteReport('mt', 0). tryWriteReport enforces
throttling via a single shared lastReportT and always sets lastReportT=now on any successful write,
so an mt write can move lastReportT forward and make a subsequent be write fall within the 100ms
suppression window.

src/tests/backend/diagnostics.ts[97-112]
src/tests/backend/diagnostics.ts[192-224]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tryWriteReport()` uses a single global `lastReportT` for all report types. The newly added mid-test report (`mt`) updates `lastReportT`, which can inadvertently throttle and suppress subsequent boundary reports (`be`) in the next test’s `beforeEach`.
## Issue Context
`beforeEach` always calls `tryWriteReport('be', 100)` and now additionally schedules a mid-test `tryWriteReport('mt', 0)`.
## Fix Focus Areas
- src/tests/backend/diagnostics.ts[97-121]
- src/tests/backend/diagnostics.ts[192-224]
### Suggested fix direction
Introduce separate throttle state for boundary vs. mid-test reports (e.g., `lastBoundaryReportT` for `be` and a different timestamp for `hb/mt`), or add an option so `mt` writes do not advance the timestamp used for `be` throttling.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

2. Unconditional mid-test timer overhead ✓ Resolved 🐞 Bug ➹ Performance
Description
beforeEach schedules a setTimeout for every test even when canWriteReport is false, so the callback
work is guaranteed to be wasted on runs without node-report enabled. This adds unnecessary
timer/callback churn across the whole suite for no possible diagnostic output.
Code

src/tests/backend/diagnostics.ts[R218-224]

Evidence
canWriteReport is the global gate for report writing and tryWriteReport returns immediately when it
is false, but the new code schedules setTimeout regardless, meaning on canWriteReport=false runs the
timer always fires and does only the compare + no-op call path.

src/tests/backend/diagnostics.ts[97-110]
src/tests/backend/diagnostics.ts[192-224]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new mid-test snapshot timer is scheduled unconditionally, but the callback can never produce a report when `canWriteReport` is false because `tryWriteReport()` returns immediately.
## Issue Context
`canWriteReport` already gates report writing, but the timer is still created per test.
## Fix Focus Areas
- src/tests/backend/diagnostics.ts[97-110]
- src/tests/backend/diagnostics.ts[192-224]
### Suggested fix direction
Only schedule the 150ms `setTimeout` when `canWriteReport` is true (or inline-check before scheduling). This preserves the diagnostic behavior in CI configurations that enable node-report while avoiding per-test timers in runs where reports are disabled.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Run 26402211271 produced be-0207 (the dying test's beforeEach
snapshot) but no mid-test snapshot, even though setTimeout(150ms)
was scheduled and the test body lived another 280 ms after that
deadline. setTimeout under Windows-runner load is being starved
past the deadline — we already saw the previous test's mt fire at
+252 ms (102 ms late) when the deadline was 150 ms, so the dying
test's timer was likely scheduled to fire well after the kill at
+425 ms.

setInterval has fired reliably throughout the investigation
(every heartbeat in every run lands within ~1 s of schedule, even
when setTimeout misses). Bump heartbeat to 200 ms (5 Hz) so any
death window ≥200 ms is sampled inside the test body, independent
of how starved setTimeout is.

Cost on the Windows runner: the existing log shows writeReport
completes in <1 ms (from "Writing Node.js report" to "Node.js
report completed" timestamps), so 5 Hz adds ~5 ms/s of overhead.
Artifact growth: ~500 reports for a 100 s test phase (~25 MB raw,
~5 MB compressed). The setTimeout mid-test snapshot stays — it's
belt-and-suspenders cheap and fires for slow tests where the
heartbeat alone might not align with the death window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Report

Qodo flagged two real issues on #7842:

1. The single shared `lastReportT` made `mt` writes poison the `be`
   throttle window. Slow tests trigger an `mt` write at +150 ms, then
   the test ends a few ms later, and the NEXT test's `beforeEach`
   landed within the 100 ms throttle from the `mt` write — so its
   own `be` snapshot was suppressed. That's the exact boundary
   coverage the throttle is supposed to PROTECT. Local repro with a
   180 ms slow test followed by a fast one confirmed: the fast
   test's `be-0004` is now captured instead of swallowed.

   Fix: split into `lastBoundaryT` used and updated only by `be`
   writes. `hb` and `mt` pass `updateThrottle=false` and never
   advance the boundary timestamp.

2. `setTimeout` was being scheduled in `beforeEach` for every test
   even when `canWriteReport` is false (Linux backend matrix, local
   dev). That's a wasted timer per test for no possible diagnostic
   output. Gate the schedule itself on `canWriteReport`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear merged commit 29ee19b into develop May 25, 2026
29 of 30 checks passed
@JohnMcLear JohnMcLear deleted the test-ci-diag-mid-test-snapshot branch May 25, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant