Skip to content

test: fix flaky CI tests (multipart tmpdir teardown + egg-scripts start readiness)#6015

Merged
killagu merged 2 commits into
eggjs:nextfrom
killagu:fix/multipart-rmdir-flaky-teardown
Jun 28, 2026
Merged

test: fix flaky CI tests (multipart tmpdir teardown + egg-scripts start readiness)#6015
killagu merged 2 commits into
eggjs:nextfrom
killagu:fix/multipart-rmdir-flaky-teardown

Conversation

@killagu

@killagu killagu commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Two unrelated flaky test fixes surfaced while stabilizing CI for the snapshot work (#6011). Test-only; no src changes.

1. @eggjs/multipartENOTEMPTY teardown race

file-mode.test.ts flaked on macOS Node 24 (passing on every other matrix job):

Error: ENOTEMPTY: directory not empty, rmdir '.../egg-multipart-tmp/multipart-file-mode-demo/2026/06/28/05'

fs.rm(tmpdir, { force: true, recursive: true }) defaults to maxRetries: 0, so a transient ENOTEMPTY (a file in a date-bucket dir mid recursive-removal — async per-request cleanup draining / APFS latency) throws and fails the suite. The production clean_tmpdir schedule already swallows these; only the 7 test teardowns are fragile. Fix: add maxRetries: 3 to all 7.

2. @eggjs/scripts — flaky stop.test.ts start readiness

Test scripts (ubuntu-22) failed in stop.test.ts:

AssertionError: expected 'Starting custom-framework application…' to match /custom-framework started on http:\/\/…/

stop.test.ts waited a fixed 1s for a 2-worker egg app to boot before asserting its started on http://… log — while every sibling start-without-demon-*.test.ts waits 10s. On a loaded runner 1s isn't enough. Fix: replace the fixed scheduler.wait(waitTime) before each startup/shutdown assertion with a waitFor(getText, pattern, timeout) poll (test/utils.ts) that returns as soon as the expected output appears (up to 10s) — race-free and faster than a fixed delay. Windows keeps the fixed settle for the signal-driven shutdown checks (SIGTERM isn't handled there).

Verification

  • multipart: the 7 teardowns now retry; affected tests pass locally.
  • scripts: stop.test.ts passes locally after build (8 passed, 1 skipped).

Re Gemini's review on the multipart hunks: it claimed Vitest runs afterAll in registration order — that's incorrect (default sequence.hooks: 'stack' → reverse; verified empirically on v4.1.9), so the current teardown already removes the tmpdir last (after server.close()/app.close()); the suggested reorder would move fs.rm first and reintroduce the race. Replied inline on each thread.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Improved cleanup reliability across multipart-related test suites by adding additional retry behavior when removing temporary directories during teardown.
    • Updated process lifecycle checks in test runs to wait for specific startup/shutdown output instead of relying on fixed timing, reducing flaky boot/shutdown timing.
    • Introduced a reusable polling helper to detect expected log output within a timeout.

The afterAll cleanup `fs.rm(tmpdir, { force: true, recursive: true })` defaults to
maxRetries: 0, so a transient ENOTEMPTY (a file appearing in a date-bucket dir
mid-removal — async per-request file cleanup, or APFS directory operations on
macOS) throws and fails the whole suite. This flaked `file-mode.test.ts` on
macOS Node 24 in CI ("ENOTEMPTY: directory not empty, rmdir .../2026/06/28/05")
while passing on every other matrix job.

Add `maxRetries: 3` to the recursive removal in all 7 multipart test teardowns so
Node retries with linear backoff on the transient ENOTEMPTY/EBUSY/EPERM family.
The production clean_tmpdir schedule already swallows these errors, so no
src change is needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 28, 2026 05:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Multipart test teardown now retries tmpdir removal, and stop-script tests now wait for startup and shutdown log output using a shared polling helper.

Changes

Multipart test cleanup retry hardening

Layer / File(s) Summary
Add maxRetries: 3 to afterAll fs.rm
plugins/multipart/test/dynamic-option.test.ts, plugins/multipart/test/enable-pathToRegexpModule.test.ts, plugins/multipart/test/file-mode-limit-filesize-per-request.test.ts, plugins/multipart/test/file-mode.test.ts, plugins/multipart/test/stream-mode-with-filematch-glob.test.ts, plugins/multipart/test/stream-mode-with-filematch.test.ts, plugins/multipart/test/ts.test.ts
Each multipart test suite's teardown removal of app.config.multipart.tmpdir now passes maxRetries: 3 to fs.rm.

Stop test log polling

Layer / File(s) Summary
Add waitFor helper
tools/scripts/test/utils.ts
A polling helper is exported that checks text against a regex until it matches or a timeout elapses.
Replace fixed delays with stdout polling
tools/scripts/test/stop.test.ts
The stop-script tests import waitFor and use it to wait for startup and shutdown log lines, with Windows-specific fixed delays kept in two timeout cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • eggjs/egg#5560: Updates multipart test teardown cleanup in the same plugins/multipart/test/*.ts suite.
  • eggjs/egg#5560: Introduces the multipart tmpdir cleanup pattern that these retry changes adjust.

Poem

🐇 I hop by logs and sniff the trail,
Wait for the start, wait for the hail.
Three tries for cleanup, tidy and neat,
Then back to the burrow on nimble feet.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the flaky test fixes in multipart teardown and egg-scripts readiness.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.95%. Comparing base (b39c019) to head (342de7f).
⚠️ Report is 1 commits behind head on next.

Additional details and impacted files
@@           Coverage Diff           @@
##             next    #6015   +/-   ##
=======================================
  Coverage   81.95%   81.95%           
=======================================
  Files         677      677           
  Lines       20651    20651           
  Branches     4099     4099           
=======================================
+ Hits        16924    16925    +1     
+ Misses       3214     3213    -1     
  Partials      513      513           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds maxRetries: 3 to fs.rm calls in several test files to prevent failures when deleting temporary directories. The reviewer suggests reordering the afterAll hooks so that server.close() and app.close() are executed before fs.rm. This ensures that the server and application are fully shut down and all file handles are released before attempting to delete the temporary directory, addressing the root cause of flaky ENOTEMPTY or EBUSY errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 23 to 27
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Vitest, afterAll hooks are executed sequentially in the order they are registered. Currently, the temporary directory removal (fs.rm) is registered before app.close() and server.close(). This means the test suite attempts to delete the temporary directory while the server and application are still active and potentially holding open file handles or writing files, which is the primary cause of the flaky ENOTEMPTY / EBUSY errors.

Reordering the hooks so that server.close() and app.close() run first ensures that all network activity has ceased and all file handles are released before fs.rm is executed.

Suggested change
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());
afterAll(() => server.close());
afterAll(() => app.close());
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but this rests on an incorrect premise about Vitest. Vitest's default sequence.hooks is 'stack', so afterAll hooks run in reverse registration order — verified empirically on v4.1.9 (3rd-registered runs first, 1st-registered last). The current order therefore already executes server.close()app.close()fs.rm(...): the tmpdir is removed last, after the server and app are closed. Applying the suggested reorder would move fs.rm to run first, reintroducing exactly the race described. The residual ENOTEMPTY is a transient FS-level race during the recursive removal (macOS APFS / async per-request file cleanup draining), which maxRetries: 3 handles. Keeping the current order + maxRetries.

Comment on lines 25 to 29
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Vitest, afterAll hooks are executed sequentially in the order they are registered. Currently, the temporary directory removal (fs.rm) is registered before app.close() and server.close(). This means the test suite attempts to delete the temporary directory while the server and application are still active and potentially holding open file handles or writing files, which is the primary cause of the flaky ENOTEMPTY / EBUSY errors.

Reordering the hooks so that server.close() and app.close() run first ensures that all network activity has ceased and all file handles are released before fs.rm is executed.

Suggested change
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());
afterAll(() => server.close());
afterAll(() => app.close());
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but this rests on an incorrect premise about Vitest. Vitest's default sequence.hooks is 'stack', so afterAll hooks run in reverse registration order — verified empirically on v4.1.9 (3rd-registered runs first, 1st-registered last). The current order therefore already executes server.close()app.close()fs.rm(...): the tmpdir is removed last, after the server and app are closed. Applying the suggested reorder would move fs.rm to run first, reintroducing exactly the race described. The residual ENOTEMPTY is a transient FS-level race during the recursive removal (macOS APFS / async per-request file cleanup draining), which maxRetries: 3 handles. Keeping the current order + maxRetries.

Comment on lines 24 to 28
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Vitest, afterAll hooks are executed sequentially in the order they are registered. Currently, the temporary directory removal (fs.rm) is registered before app.close() and server.close(). This means the test suite attempts to delete the temporary directory while the server and application are still active and potentially holding open file handles or writing files, which is the primary cause of the flaky ENOTEMPTY / EBUSY errors.

Reordering the hooks so that server.close() and app.close() run first ensures that all network activity has ceased and all file handles are released before fs.rm is executed.

Suggested change
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());
afterAll(() => server.close());
afterAll(() => app.close());
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but this rests on an incorrect premise about Vitest. Vitest's default sequence.hooks is 'stack', so afterAll hooks run in reverse registration order — verified empirically on v4.1.9 (3rd-registered runs first, 1st-registered last). The current order therefore already executes server.close()app.close()fs.rm(...): the tmpdir is removed last, after the server and app are closed. Applying the suggested reorder would move fs.rm to run first, reintroducing exactly the race described. The residual ENOTEMPTY is a transient FS-level race during the recursive removal (macOS APFS / async per-request file cleanup draining), which maxRetries: 3 handles. Keeping the current order + maxRetries.

Comment on lines 30 to 34
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Vitest, afterAll hooks are executed sequentially in the order they are registered. Currently, the temporary directory removal (fs.rm) is registered before app.close() and server.close(). This means the test suite attempts to delete the temporary directory while the server and application are still active and potentially holding open file handles or writing files, which is the primary cause of the flaky ENOTEMPTY / EBUSY errors.

Reordering the hooks so that server.close() and app.close() run first ensures that all network activity has ceased and all file handles are released before fs.rm is executed.

Suggested change
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());
afterAll(() => server.close());
afterAll(() => app.close());
afterAll(async () => {
await fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but this rests on an incorrect premise about Vitest. Vitest's default sequence.hooks is 'stack', so afterAll hooks run in reverse registration order — verified empirically on v4.1.9 (3rd-registered runs first, 1st-registered last). The current order therefore already executes server.close()app.close()fs.rm(...): the tmpdir is removed last, after the server and app are closed. Applying the suggested reorder would move fs.rm to run first, reintroducing exactly the race described. The residual ENOTEMPTY is a transient FS-level race during the recursive removal (macOS APFS / async per-request file cleanup draining), which maxRetries: 3 handles. Keeping the current order + maxRetries.

Comment on lines 27 to 31
afterAll(() => {
return fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
return fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Vitest, afterAll hooks are executed sequentially in the order they are registered. Currently, the temporary directory removal (fs.rm) is registered before app.close() and server.close(). This means the test suite attempts to delete the temporary directory while the server and application are still active and potentially holding open file handles or writing files, which is the primary cause of the flaky ENOTEMPTY / EBUSY errors.

Reordering the hooks so that server.close() and app.close() run first ensures that all network activity has ceased and all file handles are released before fs.rm is executed.

Suggested change
afterAll(() => {
return fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true });
return fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});
afterAll(() => app.close());
afterAll(() => server.close());
afterAll(() => server.close());
afterAll(() => app.close());
afterAll(() => {
return fs.rm(app.config.multipart.tmpdir, { force: true, recursive: true, maxRetries: 3 });
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but this rests on an incorrect premise about Vitest. Vitest's default sequence.hooks is 'stack', so afterAll hooks run in reverse registration order — verified empirically on v4.1.9 (3rd-registered runs first, 1st-registered last). The current order therefore already executes server.close()app.close()fs.rm(...): the tmpdir is removed last, after the server and app are closed. Applying the suggested reorder would move fs.rm to run first, reintroducing exactly the race described. The residual ENOTEMPTY is a transient FS-level race during the recursive removal (macOS APFS / async per-request file cleanup draining), which maxRetries: 3 handles. Keeping the current order + maxRetries.

`stop.test.ts` waited a fixed 1s for a 2-worker egg app to boot before asserting
its "started on http://..." log — while every sibling start-without-demon test
waits 10s. On a loaded CI runner (seen on ubuntu Node 22) 1s is not enough, so the
app had only logged "Starting custom-framework application…" and the assertion
failed.

Replace the fixed `scheduler.wait(waitTime)` before each startup/shutdown
assertion with a `waitFor(getText, pattern, timeout)` poll (in test/utils.ts) that
returns as soon as the expected output appears (up to 10s) — race-free and faster
than a fixed delay. Windows keeps the fixed settle for the signal-driven shutdown
checks (SIGTERM is not handled there).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@killagu killagu changed the title test(multipart): retry tmpdir removal to fix flaky ENOTEMPTY teardown test: fix flaky CI tests (multipart tmpdir teardown + egg-scripts start readiness) Jun 28, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/scripts/test/utils.ts`:
- Around line 67-69: Reset the regex state in waitFor before each poll:
RegExp.test() on a stateful pattern with /g or /y can advance lastIndex and
cause missed matches, so update waitFor to use a fresh RegExp instance or
explicitly set pattern.lastIndex = 0 before calling pattern.test(getText()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e672d1b-6d9c-4b7e-83bc-85d96e582a25

📥 Commits

Reviewing files that changed from the base of the PR and between 8b6e451 and 342de7f.

📒 Files selected for processing (2)
  • tools/scripts/test/stop.test.ts
  • tools/scripts/test/utils.ts

Comment on lines +67 to +69
export async function waitFor(getText: () => string, pattern: RegExp, timeout = 10000): Promise<void> {
const deadline = Date.now() + timeout;
while (!pattern.test(getText())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP --type=ts 'waitFor\s*\([^,]+,\s*/(?:\\.|[^/])+/[a-z]*[gy][a-z]*' tools/scripts/test

Repository: eggjs/egg

Length of output: 147


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== utils.ts ==\n'
sed -n '1,180p' tools/scripts/test/utils.ts

printf '\n== waitFor call sites ==\n'
rg -n --type=ts 'waitFor\s*\(' tools/scripts/test

Repository: eggjs/egg

Length of output: 3423


Reset regex state before polling. RegExp.test() advances lastIndex for /g and /y, so a stateful regex can make this loop miss a match or behave inconsistently. Clone the regex or set lastIndex = 0 before each test.

🧰 Tools
🪛 ast-grep (0.44.0)

[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { ChildProcess } from 'node:child_process';
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/scripts/test/utils.ts` around lines 67 - 69, Reset the regex state in
waitFor before each poll: RegExp.test() on a stateful pattern with /g or /y can
advance lastIndex and cause missed matches, so update waitFor to use a fresh
RegExp instance or explicitly set pattern.lastIndex = 0 before calling
pattern.test(getText()).

@killagu killagu merged commit 4023920 into eggjs:next Jun 28, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants