Skip to content

test(ci): drain event loop with setImmediate after every test (mitigation hypothesis)#7844

Closed
JohnMcLear wants to merge 1 commit into
developfrom
test-ci-flake-afterEach-drain
Closed

test(ci): drain event loop with setImmediate after every test (mitigation hypothesis)#7844
JohnMcLear wants to merge 1 commit into
developfrom
test-ci-flake-afterEach-drain

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Insert a single setImmediate yield in the mocha root afterEach so the event loop drains queued I/O callbacks at every test boundary. Pure mitigation hypothesis test — no other changes.

Why

Ten captured deaths on the merged diagnostic infrastructure (#7838, #7842) show a consistent shape:

  • 200–400 ms after a test starts, the 5 Hz heartbeat setInterval goes silent for the entire death window (no timer events at all).
  • The process is then externally terminated, bypassing unhandledRejection, uncaughtException, --report-on-fatalerror, --report-uncaught-exception, and all signal handlers.
  • Pre-kill state in the libuv handle trace is nominal (3–7 handles, no leak, no spike).
  • Dying tests span supertest+JWT HTTP, socket.io connect bursts, and DOCX export round-trips — different surface code, same fingerprint.

The common substrate across all 10 deaths is rapid loopback TCP and queued I/O across test boundaries. This experiment tests one specific hypothesis: cumulative event-loop pressure across tests is the trigger. A setImmediate yield in afterEach forces a deterministic drain at every boundary instead of letting work stack across tests.

Expected outcome

Result Conclusion
Win+plugins flake rate drops materially Cumulative I/O pressure is the trigger; this is a working mitigation. Promote to permanent.
Flake rate unchanged Cumulative pressure ruled out. Move on to per-test pathologies (jose CNG, Express middleware, native modules).

Cost

~600 tests × 1 setImmediate yield ≈ negligible compared to the multi-minute backend test phase. Locally verified — a 3-test probe runs cleanly with the new async afterEach.

🤖 Generated with Claude Code

Mitigation hypothesis for the Windows backend silent ELIFECYCLE flake.

Ten captured deaths so far on the merged diagnostic infrastructure
(#7838, #7842) show a consistent shape: 200-400 ms after a test
starts, the heartbeat (5 Hz setInterval) goes silent for the entire
death window — clear evidence the event loop has stopped servicing
timers — and the process is then externally terminated, bypassing
all of Node's JS-level handlers, --report-on-fatalerror, and
--report-uncaught-exception. Pre-kill state in the libuv handle
trace is nominal (3-7 handles, no leak, no spike).

Dying tests span supertest+JWT HTTP, socket.io connect bursts, and
DOCX export round-trips — different surface code, same fingerprint.
The common substrate is rapid loopback TCP and queued I/O across
test boundaries.

Insert a single setImmediate yield in the mocha root afterEach so
the event loop has a deterministic drain point at every test
boundary. If kill rate drops materially on the Windows backend
matrix after this lands, cumulative event-loop pressure is the
trigger and we have a working mitigation; if it doesn't change,
we rule that out and look at per-test pathologies (jose CNG,
specific Express middleware paths, etc.).

Cost: ~600 tests × 1 setImmediate ≈ negligible compared to the
multi-minute backend test phase. Locally verified: a 3-test probe
runs cleanly with the new async afterEach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Drain event loop with setImmediate after every test

🧪 Tests

Grey Divider

Walkthroughs

Description
• Add setImmediate yield in mocha afterEach hook
• Drain event loop at every test boundary deterministically
• Mitigate Windows backend silent ELIFECYCLE flake hypothesis
• Cumulative I/O pressure across tests suspected root cause
Diagram
flowchart LR
  A["Test execution"] -->|afterEach hook| B["setImmediate yield"]
  B -->|event loop drain| C["Queued I/O callbacks processed"]
  C -->|deterministic boundary| D["Next test starts clean"]
  E["Hypothesis: cumulative I/O pressure"] -->|mitigation| B

Loading

File Changes

1. src/tests/backend/diagnostics.ts 🧪 Tests +23/-1

Add event loop drain to mocha afterEach hook

• Convert afterEach hook from synchronous to async function
• Add await new Promise((r) => setImmediate(r)) to drain event loop
• Insert comprehensive comment explaining Windows ELIFECYCLE flake hypothesis
• Document suspected root cause: rapid TCP/socket.io/DOCX I/O across test boundaries
• Outline expected outcomes and negligible performance cost

src/tests/backend/diagnostics.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 25, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Global drain not gated 🐞 Bug ☼ Reliability
Description
The new async afterEach unconditionally awaits a setImmediate, and diagnostics.ts is loaded by
the main backend test command, so this mitigation experiment affects every backend test run on all
platforms. If the intent is to evaluate a Windows-specific hypothesis, the lack of gating increases
blast radius and can confound comparisons between platforms/runs.
Code

src/tests/backend/diagnostics.ts[R256-282]

Evidence
The backend test script always --requires this diagnostics module, so its root hooks run for the
entire suite; the PR adds an unconditional awaited setImmediate in afterEach, making the
mitigation apply globally rather than being scoped to Windows or an opt-in experiment flag.

src/package.json[150-153]
src/tests/backend/diagnostics.ts[256-283]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`src/tests/backend/diagnostics.ts` is required by the standard backend test script, so adding an unconditional `await setImmediate(...)` in the Mocha root `afterEach` changes execution behavior for *all* backend test runs. This PR describes the change as a Windows-backend mitigation experiment; to reduce blast radius and keep the experiment targeted/controllable, gate the drain behind `process.platform === 'win32'` and/or a dedicated env flag.

### Issue Context
- The backend test command loads `diagnostics.ts` via `--require`, so the Mocha root hooks always run.
- The new async `afterEach` introduces an extra event-loop turn after every test.

### Fix Focus Areas
- src/tests/backend/diagnostics.ts[256-283]
- src/package.json[150-153]

### Suggested change
Wrap the await in something like:
- `if (process.platform === 'win32' || process.env.MOCHA_DRAIN_IMMEDIATE === '1') { await new Promise((r) => setImmediate(r)); }`

Optionally default the env flag off on non-Windows so local/Linux behavior stays unchanged unless explicitly enabled.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@JohnMcLear
Copy link
Copy Markdown
Member Author

Closing — the setImmediate drain in afterEach causes real plugin-test regressions (ep_subscript_and_superscript's export alignment to HTML > returns HTML with Subscript HTML tags tests pass cleanly on develop eeac0a6 but fail on this PR). The drain is shifting microtask ordering in a way that breaks plugin tests that share state across describe-block tests. Mitigation hypothesis (cumulative event-loop pressure) is NOT validated by this experiment — but neither is it ruled out, because the experiment couldn't run cleanly. Will try a different angle.

@JohnMcLear JohnMcLear closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant