Skip to content

Fix infinite fork loop when node renderer worker fails to bind port#2843

Closed
yassa-tamer wants to merge 5 commits into
shakacode:mainfrom
yassa-tamer:yassatamer/fix-infinite-fork-loop-on-port-in-use
Closed

Fix infinite fork loop when node renderer worker fails to bind port#2843
yassa-tamer wants to merge 5 commits into
shakacode:mainfrom
yassa-tamer:yassatamer/fix-infinite-fork-loop-on-port-in-use

Conversation

@yassa-tamer
Copy link
Copy Markdown
Contributor

@yassa-tamer yassa-tamer commented Mar 25, 2026

Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master previously reforked unconditionally, causing an infinite fork/crash loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master before exiting. The master sets an abort flag and exits with a clear error message instead of reforking. Scheduled restarts and runtime crashes continue to refork as before.

Pull Request checklist

Remove this line after checking all the items here. If the item is not applicable to the PR, both check it out and wrap it by ~.

  • Add/update test to cover these changes
  • Update documentation
  • Update CHANGELOG file

Add the CHANGELOG entry at the top of the file.

Summary by CodeRabbit

  • New Features
    • Workers now send structured startup-failure reports so the master can abort and surface clear errors.
  • Bug Fixes
    • Improved detection of worker startup failures; clearer user messages (e.g., port-in-use) and prevention of repeated silent restarts on startup errors.
  • Tests
    • Added tests covering startup-failure messaging and master/worker handling scenarios.
  • Chores
    • CI skips a code-review action when its secret is absent; added a URL exclusion to link-checking.
  • Documentation
    • Updated documentation links in release notes.

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59601569-189d-486e-bd6a-2f40f562d208

📥 Commits

Reviewing files that changed from the base of the PR and between ea800d5 and c1c8a3f.

📒 Files selected for processing (1)
  • .lychee.toml
✅ Files skipped from review due to trivial changes (1)
  • .lychee.toml

Walkthrough

Workers now send structured startup-failure IPC messages to the master when listen fails; the master records the first such failure and, on that worker's non-scheduled exit, emits a startup-specific error (special-casing EADDRINUSE) and exits with status 1 instead of forking a replacement.

Changes

Cohort / File(s) Summary
Worker Message Types
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts
Add WORKER_STARTUP_FAILURE, WorkerStartupFailureMessage (stage 'listen', optional code/errno/syscall, required host/port/message) and isWorkerStartupFailureMessage type guard.
Worker Startup Error Handling
packages/react-on-rails-pro-node-renderer/src/worker.ts
On app.listen error in worker mode, send WorkerStartupFailureMessage via process.send (if available) and process.exit(1); otherwise preserve previous direct-exit behavior.
Master Process Failure Handling
packages/react-on-rails-pro-node-renderer/src/master.ts
Add isAbortingForStartupFailure and fatalStartupFailure state, cluster.on('message') handler to record startup-failure messages, and modify cluster.on('exit') to abort-and-exit (special-case EADDRINUSE) instead of forking when a startup failure was reported. Scheduled restarts still fork.
Tests
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts, packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts
New tests for the type guard, IPC message shape, clustered vs single-process behavior, EADDRINUSE messaging, master abort/exit logic, scheduled restarts, unexpected crashes, and ignoring non-matching messages.
CI Workflow
.github/workflows/claude-code-review.yml
Make Claude Code Review step conditional on secrets.CLAUDE_CODE_OAUTH_TOKEN; add env var and a fallback step that prints a skip message.
Docs / Config
NEWS.md, .lychee.toml
Update two docs links to new URL in NEWS.md; add a Lychee exclusion pattern for a specific PDF URL in .lychee.toml.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Worker Process
    participant Cluster as Cluster IPC
    participant Master as Master Process
    participant OS as Operating System

    Worker->>OS: app.listen(host, port)
    OS-->>Worker: listen error (e.g. EADDRINUSE)
    Worker->>Worker: build WorkerStartupFailureMessage
    Worker->>Cluster: process.send(message)
    Cluster->>Master: deliver "message" event
    Master->>Master: record fatalStartupFailure / set isAbortingForStartupFailure = true
    Worker->>OS: process.exit(1)
    OS->>Cluster: emit "exit" for worker
    Cluster->>Master: deliver "exit" event
    Master->>Master: detect abort flag + non-scheduled exit
    Master->>Master: emit startup-failure error (EADDRINUSE -> port-in-use text)
    Master->>OS: process.exit(1)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped to bind a port beneath the moonlit light,
I tapped a tiny message when the listen failed that night.
The master read my whisper and chose to close the gate,
No noisy forks or retries — a single, tidy fate.
🥕 Off I go to nap, content the logs are straight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix infinite fork loop when node renderer worker fails to bind port' accurately summarizes the main change: preventing infinite fork loops by handling worker startup failures (specifically port binding) with a new IPC mechanism.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

This PR fixes an infinite fork/crash loop that occurred when a node renderer worker failed to bind its port during startup (e.g. EADDRINUSE). Previously the master process would unconditionally refork any dying worker, creating a never-ending cycle. The fix introduces a typed IPC message (WORKER_STARTUP_FAILURE) that workers send to the master before exiting on a listen error; the master sets an abort flag on receipt and calls process.exit(1) with a descriptive message instead of reforking.

Key changes:

  • workerMessages.ts — new module with the WorkerStartupFailureMessage type, WORKER_STARTUP_FAILURE constant, and isWorkerStartupFailureMessage type-guard.
  • worker.ts — on a listen error in cluster-worker mode, constructs and sends the IPC message in the process.send() callback (guaranteeing flush before exit), then returns; non-cluster path falls through to the existing process.exit(1).
  • master.ts — registers a cluster.on('message') handler that sets isAbortingForStartupFailure on first WORKER_STARTUP_FAILURE receipt (deduplication); the exit handler checks the flag and either logs + exits (startup failure), reforques (scheduled restart), or reforques (runtime crash).
  • Two new test files provide isolated unit coverage for the type-guard, the worker IPC path, and all master decision branches.

Confidence Score: 4/5

  • Safe to merge — the primary bug is correctly fixed; two minor P2 style issues remain but do not affect correctness in normal operation.
  • The core logic is sound: workers signal intent before exiting, the master deduplicates via a boolean flag, and all three exit paths (startup failure, scheduled restart, runtime crash) are correctly handled and well-tested. The two noted concerns are a dead return statement (cosmetic) and a theoretical IPC ordering race that, even if triggered, limits the infinite loop to at most one extra fork cycle rather than restoring it.
  • packages/react-on-rails-pro-node-renderer/src/master.ts — specifically the IPC ordering note and the unreachable return.

Important Files Changed

Filename Overview
packages/react-on-rails-pro-node-renderer/src/master.ts Adds IPC-message-based startup-failure detection, deduplication flag, and abort-on-exit logic; one dead return after process.exit() and a theoretical IPC-message/exit ordering race exist but are non-critical.
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts New module defining the WorkerStartupFailureMessage type, the WORKER_STARTUP_FAILURE constant, and a type-guard predicate; clean and correct.
packages/react-on-rails-pro-node-renderer/src/worker.ts Sends WORKER_STARTUP_FAILURE IPC message on listen error (cluster worker path only) and exits in the send callback, ensuring the message is flushed before the process terminates.
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts Tests the master's decision logic in isolation via simulated event handlers; covers EADDRINUSE, generic errors, multi-worker deduplication, scheduled restarts, and runtime crashes.
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts Unit-tests isWorkerStartupFailureMessage type guard and verifies that the worker sends the correct IPC message before exiting in cluster mode, and exits directly in single-process mode.

Sequence Diagram

sequenceDiagram
    participant M as Master
    participant W as Worker (cluster)

    M->>W: cluster.fork()
    W->>W: app.listen({ port, host })
    alt listen succeeds
        W-->>M: (normal operation)
    else listen fails (e.g. EADDRINUSE)
        W->>M: process.send(WORKER_STARTUP_FAILURE)
        note over M: cluster.on('message')<br/>sets isAbortingForStartupFailure=true<br/>stores fatalStartupFailure
        W->>W: process.exit(1) [in send callback]
        W-->>M: cluster.on('exit')
        note over M: isAbortingForStartupFailure===true<br/>→ errorReporter.message()<br/>→ process.exit(1)
    end

    alt isScheduledRestart
        W-->>M: cluster.on('exit')
        M->>W: cluster.fork() [scheduled restart]
    else unexpected runtime crash
        W-->>M: cluster.on('exit')
        M->>W: cluster.fork() [runtime refork]
    end
Loading

Reviews (1): Last reviewed commit: "Fix infinite fork loop when node rendere..." | Re-trigger Greptile

Comment on lines +111 to +112
process.exit(1);
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead code after process.exit()

process.exit(1) terminates the Node.js process synchronously, so the return on the next line is unreachable dead code. It can be removed.

Suggested change
process.exit(1);
return;
errorReporter.message(msg);
process.exit(1);

Comment on lines +88 to 113
cluster.on('message', (worker, message) => {
if (!isWorkerStartupFailureMessage(message) || isAbortingForStartupFailure) return;

isAbortingForStartupFailure = true;
fatalStartupFailure = { workerId: worker.id, failure: message };
});

// Listen for dying workers:
cluster.on('exit', (worker) => {
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
} else {
// TODO: Track last rendering request per worker.id
// TODO: Consider blocking a given rendering request if it kills a worker more than X times
const msg = `Worker ${worker.id} died UNEXPECTEDLY :(, restarting`;
cluster.fork();
return;
}

if (isAbortingForStartupFailure) {
const failure = fatalStartupFailure?.failure;
const msg =
failure?.code === 'EADDRINUSE'
? `Node renderer startup failed: port ${failure.port} is already in use`
: `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;

errorReporter.message(msg);
process.exit(1);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Potential IPC race condition between message and exit events

Node.js cluster does not guarantee that a worker's message event is processed before its exit event on the master side. If the OS delivers SIGCHLD (triggering exit) before the IPC socket data is read from the event loop (triggering message), the exit handler will observe isAbortingForStartupFailure === false and call cluster.fork() — spawning a replacement worker that will also fail.

In practice this is rare because the worker only calls process.exit(1) inside the process.send() callback (after the data has been flushed to the kernel buffer), so the timing window is very small. If it does trigger, it will cause one extra fork-and-crash cycle before the master eventually receives the deferred message event and shuts down cleanly — so it won't recreate an infinite loop, but it can still produce a confusing intermediate restart.

One mitigation would be to listen for the worker's IPC channel disconnect event and delay the refork decision briefly, or to cross-check the dead worker's id against any buffered-but-unprocessed WORKER_STARTUP_FAILURE messages. This is a known Node.js cluster caveat worth documenting if not addressed now.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/react-on-rails-pro-node-renderer/src/master.ts`:
- Around line 103-108: The generic startup-failure message currently uses
worker.id which may not be the recorded failing worker; update the message
construction in the isAbortingForStartupFailure branch to prefer
fatalStartupFailure.workerId (when present) instead of worker.id, and keep using
failure?.message or `exit code ${worker.process.exitCode}` for the error
details; ensure you still handle the EADDRINUSE case using failure.port and fall
back appropriately if fatalStartupFailure or its workerId is undefined.

In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`:
- Around line 14-19: The current isWorkerStartupFailureMessage type guard only
checks the top-level type field so malformed IPC payloads slip through; update
the function isWorkerStartupFailureMessage to fully validate the IPC shape by
confirming the object has a type === WORKER_STARTUP_FAILURE and also contains
the expected fields with correct types (e.g., stage is a string, port is a
number, and message is a string) before narrowing to
WorkerStartupFailureMessage; reference the WORKER_STARTUP_FAILURE constant and
the isWorkerStartupFailureMessage function when making the change so master.ts
can safely read stage, port, and message without risking undefined values.

In
`@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`:
- Around line 41-73: The tests currently duplicate the production logic in local
functions handleMessage and handleExit; instead, mock or spy on cluster.on to
capture the actual callbacks that masterRun registers for 'message' and 'exit',
then invoke those captured callbacks with test fixtures so assertions validate
the real handlers from masterRun; change the test to call masterRun(), grab the
registered handlers from your cluster mock (the callbacks passed to
cluster.on('message', ...) and cluster.on('exit', ...)), and drive those
captured callbacks with the same worker/message scenarios used today to assert
reportedMessages, forkCount and exitCode.

In
`@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`:
- Around line 40-119: The test suite is not exercising the actual listen-error
branch in run() of worker.ts because it constructs the WORKER_STARTUP_FAILURE
payload and calls process.send/process.exit directly; update tests to either (A)
drive the real app.listen error callback by starting the module's run() (or
invoking the server instance) and producing an EADDRINUSE error so the run()
path sends the actual WorkerStartupFailureMessage, or (B) extract the
listen-error IPC logic into a helper (e.g., sendWorkerStartupFailure or
handleListenError) and write a unit test that calls that helper with the Error
object and verifies process.send behavior and exit handling using
cluster.isWorker and isWorkerStartupFailureMessage/WorkerStartupFailureMessage;
ensure tests toggle cluster.isWorker and stub process.send/process.exit only
around the actual code path being exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 09e8287d-fe75-49a5-8584-9d4da70f441e

📥 Commits

Reviewing files that changed from the base of the PR and between c584ab1 and 10200bc.

📒 Files selected for processing (5)
  • packages/react-on-rails-pro-node-renderer/src/master.ts
  • packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts
  • packages/react-on-rails-pro-node-renderer/src/worker.ts
  • packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
  • packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts

Comment on lines +103 to +108
if (isAbortingForStartupFailure) {
const failure = fatalStartupFailure?.failure;
const msg =
failure?.code === 'EADDRINUSE'
? `Node renderer startup failed: port ${failure.port} is already in use`
: `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the recorded failing worker ID in the generic error path.

Once isAbortingForStartupFailure is set, the next exit event is not guaranteed to come from the same worker that sent the IPC message. Using worker.id here can misreport which worker actually failed even though fatalStartupFailure.workerId already captured it.

🐛 Proposed fix
     if (isAbortingForStartupFailure) {
       const failure = fatalStartupFailure?.failure;
+      const failedWorkerId = fatalStartupFailure?.workerId ?? worker.id;
       const msg =
         failure?.code === 'EADDRINUSE'
           ? `Node renderer startup failed: port ${failure.port} is already in use`
-          : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
+          : `Node renderer startup failed in worker ${failedWorkerId}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
 
       errorReporter.message(msg);
       process.exit(1);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (isAbortingForStartupFailure) {
const failure = fatalStartupFailure?.failure;
const msg =
failure?.code === 'EADDRINUSE'
? `Node renderer startup failed: port ${failure.port} is already in use`
: `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
if (isAbortingForStartupFailure) {
const failure = fatalStartupFailure?.failure;
const failedWorkerId = fatalStartupFailure?.workerId ?? worker.id;
const msg =
failure?.code === 'EADDRINUSE'
? `Node renderer startup failed: port ${failure.port} is already in use`
: `Node renderer startup failed in worker ${failedWorkerId}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/src/master.ts` around lines 103 -
108, The generic startup-failure message currently uses worker.id which may not
be the recorded failing worker; update the message construction in the
isAbortingForStartupFailure branch to prefer fatalStartupFailure.workerId (when
present) instead of worker.id, and keep using failure?.message or `exit code
${worker.process.exitCode}` for the error details; ensure you still handle the
EADDRINUSE case using failure.port and fall back appropriately if
fatalStartupFailure or its workerId is undefined.

Comment on lines +14 to +19
export function isWorkerStartupFailureMessage(value: unknown): value is WorkerStartupFailureMessage {
return (
typeof value === 'object' &&
value !== null &&
(value as { type?: string }).type === WORKER_STARTUP_FAILURE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the full IPC payload shape in the type guard.

packages/react-on-rails-pro-node-renderer/src/master.ts reads stage, port, and message immediately after this check. Right now any { type: WORKER_STARTUP_FAILURE } payload passes, so a malformed IPC message can trigger a fatal abort and produce undefined diagnostics.

🛡️ Proposed fix
 export function isWorkerStartupFailureMessage(value: unknown): value is WorkerStartupFailureMessage {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    (value as { type?: string }).type === WORKER_STARTUP_FAILURE
-  );
+  if (typeof value !== 'object' || value === null) {
+    return false;
+  }
+
+  const message = value as Partial<WorkerStartupFailureMessage>;
+  return (
+    message.type === WORKER_STARTUP_FAILURE &&
+    message.stage === 'listen' &&
+    typeof message.host === 'string' &&
+    typeof message.port === 'number' &&
+    typeof message.message === 'string' &&
+    (message.code === undefined || typeof message.code === 'string') &&
+    (message.errno === undefined || typeof message.errno === 'number') &&
+    (message.syscall === undefined || typeof message.syscall === 'string')
+  );
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`
around lines 14 - 19, The current isWorkerStartupFailureMessage type guard only
checks the top-level type field so malformed IPC payloads slip through; update
the function isWorkerStartupFailureMessage to fully validate the IPC shape by
confirming the object has a type === WORKER_STARTUP_FAILURE and also contains
the expected fields with correct types (e.g., stage is a string, port is a
number, and message is a string) before narrowing to
WorkerStartupFailureMessage; reference the WORKER_STARTUP_FAILURE constant and
the isWorkerStartupFailureMessage function when making the change so master.ts
can safely read stage, port, and message without risking undefined values.

Comment on lines +41 to +73
function handleMessage(worker: { id: number }, message: unknown) {
if (!isWorkerStartupFailureMessage(message) || isAbortingForStartupFailure) return;

isAbortingForStartupFailure = true;
fatalStartupFailure = { workerId: worker.id, failure: message };
}

function handleExit(worker: {
id: number;
isScheduledRestart?: boolean;
process: { exitCode: number | null };
}) {
if (worker.isScheduledRestart) {
forkCount += 1; // cluster.fork()
return;
}

if (isAbortingForStartupFailure) {
const failure = fatalStartupFailure?.failure;
const msg =
failure?.code === 'EADDRINUSE'
? `Node renderer startup failed: port ${failure.port} is already in use`
: `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;

reportedMessages.push(msg);
exitCode = 1;
return;
}

const msg = `Worker ${worker.id} died UNEXPECTEDLY :(, restarting`;
reportedMessages.push(msg);
forkCount += 1; // cluster.fork()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These tests duplicate packages/react-on-rails-pro-node-renderer/src/master.ts instead of testing it.

handleMessage and handleExit are local copies of the production branches, so the assertions below can stay green even if masterRun() stops registering the cluster handlers or changes their control flow. Please capture the real cluster.on('message') / cluster.on('exit') callbacks from masterRun() under mocks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`
around lines 41 - 73, The tests currently duplicate the production logic in
local functions handleMessage and handleExit; instead, mock or spy on cluster.on
to capture the actual callbacks that masterRun registers for 'message' and
'exit', then invoke those captured callbacks with test fixtures so assertions
validate the real handlers from masterRun; change the test to call masterRun(),
grab the registered handlers from your cluster mock (the callbacks passed to
cluster.on('message', ...) and cluster.on('exit', ...)), and drive those
captured callbacks with the same worker/message scenarios used today to assert
reportedMessages, forkCount and exitCode.

Comment on lines +40 to +119
describe('worker startup failure IPC', () => {
const originalSend = process.send;
const originalExit = process.exit;
const originalIsWorker = cluster.isWorker;

afterEach(() => {
process.send = originalSend;
process.exit = originalExit;
Object.defineProperty(cluster, 'isWorker', { value: originalIsWorker, writable: true });
});

it('sends WORKER_STARTUP_FAILURE message in clustered mode', () => {
// Simulate a cluster worker environment
Object.defineProperty(cluster, 'isWorker', { value: true, writable: true });
const sentMessages: unknown[] = [];
const exitCalls: number[] = [];

process.send = ((msg: unknown, _handle?: unknown, _options?: unknown, callback?: () => void) => {
sentMessages.push(msg);
if (callback) callback();
return true;
}) as typeof process.send;

process.exit = ((code?: number) => {
exitCalls.push(code ?? 0);
}) as typeof process.exit;

// Simulate what the worker does on listen error
const err = Object.assign(new Error('listen EADDRINUSE: address already in use :::3800'), {
code: 'EADDRINUSE',
errno: -48,
syscall: 'listen',
});
const host = 'localhost';
const port = 3800;

const startupFailure: WorkerStartupFailureMessage = {
type: WORKER_STARTUP_FAILURE,
stage: 'listen',
code: err.code,
errno: err.errno,
syscall: err.syscall,
host,
port,
message: err.message,
};

process.send!(startupFailure, undefined, undefined, () => {
process.exit(1);
});

expect(sentMessages).toHaveLength(1);
expect(isWorkerStartupFailureMessage(sentMessages[0])).toBe(true);
expect((sentMessages[0] as WorkerStartupFailureMessage).code).toBe('EADDRINUSE');
expect(exitCalls).toEqual([1]);
});

it('exits without IPC in single-process mode', () => {
Object.defineProperty(cluster, 'isWorker', { value: false, writable: true });
process.send = undefined;

const exitCalls: number[] = [];
process.exit = ((code?: number) => {
exitCalls.push(code ?? 0);
}) as typeof process.exit;

// In single-process mode, cluster.isWorker is false and process.send is undefined
if (cluster.isWorker && process.send) {
// Should not reach here
process.send({ type: WORKER_STARTUP_FAILURE }, undefined, undefined, () => {
process.exit(1);
});
return;
}

process.exit(1);

expect(exitCalls).toEqual([1]);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This suite doesn't exercise packages/react-on-rails-pro-node-renderer/src/worker.ts.

It reconstructs the payload and calls process.send/process.exit directly, so it still passes if the listen-error branch in run() stops emitting WORKER_STARTUP_FAILURE. Please drive the real app.listen error callback or extract that branch into a helper and test the helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`
around lines 40 - 119, The test suite is not exercising the actual listen-error
branch in run() of worker.ts because it constructs the WORKER_STARTUP_FAILURE
payload and calls process.send/process.exit directly; update tests to either (A)
drive the real app.listen error callback by starting the module's run() (or
invoking the server instance) and producing an EADDRINUSE error so the run()
path sends the actual WorkerStartupFailureMessage, or (B) extract the
listen-error IPC logic into a helper (e.g., sendWorkerStartupFailure or
handleListenError) and write a unit test that calls that helper with the Error
object and verifies process.send behavior and exit handling using
cluster.isWorker and isWorkerStartupFailureMessage/WorkerStartupFailureMessage;
ensure tests toggle cluster.isWorker and stub process.send/process.exit only
around the actual code path being exercised.

@justin808
Copy link
Copy Markdown
Member

@claude review this PR
@codex review this PR
@cursoragent review this PR

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@justin808
Copy link
Copy Markdown
Member

See #2881

@justin808 justin808 closed this Mar 28, 2026
justin808 added a commit that referenced this pull request Apr 7, 2026
…2881)

## Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash
loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Based on #2843 by
@Yassa-hue — recreated on the main repo (not a fork) so CI runs with
full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

## Test plan

- [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`,
`workerStartupFailure.test.ts`)
- [ ] Confirm EADDRINUSE scenario aborts master instead of infinite
reforking
- [ ] Confirm scheduled restarts and runtime crashes still refork as
before

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches clustered process lifecycle and shutdown behavior in the Pro
Node renderer; a mistake could cause premature master exits or missed
reforks in edge cases despite added test coverage.
> 
> **Overview**
> Prevents the Pro Node renderer master process from entering an
infinite refork loop when a worker fails during `app.listen()` (e.g.,
`EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE`
IPC message before exiting, and the master captures the first failure,
disconnects workers, and exits with a clear error instead of reforking.
> 
> Also adds `port` coercion + TCP-range validation in `configBuilder`,
and introduces focused unit tests covering startup-failure messaging,
master abort semantics, and the “wait one tick” behavior to distinguish
startup failures from runtime crashes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ca6ac01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Yassa-hue <yassatamer99@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
…2881)

## Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash
loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Based on #2843 by
@Yassa-hue — recreated on the main repo (not a fork) so CI runs with
full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

## Test plan

- [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`,
`workerStartupFailure.test.ts`)
- [ ] Confirm EADDRINUSE scenario aborts master instead of infinite
reforking
- [ ] Confirm scheduled restarts and runtime crashes still refork as
before

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches clustered process lifecycle and shutdown behavior in the Pro
Node renderer; a mistake could cause premature master exits or missed
reforks in edge cases despite added test coverage.
> 
> **Overview**
> Prevents the Pro Node renderer master process from entering an
infinite refork loop when a worker fails during `app.listen()` (e.g.,
`EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE`
IPC message before exiting, and the master captures the first failure,
disconnects workers, and exits with a clear error instead of reforking.
> 
> Also adds `port` coercion + TCP-range validation in `configBuilder`,
and introduces focused unit tests covering startup-failure messaging,
master abort semantics, and the “wait one tick” behavior to distinguish
startup failures from runtime crashes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ca6ac01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Yassa-hue <yassatamer99@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
…2881)

## Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash
loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Based on #2843 by
@Yassa-hue — recreated on the main repo (not a fork) so CI runs with
full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

## Test plan

- [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`,
`workerStartupFailure.test.ts`)
- [ ] Confirm EADDRINUSE scenario aborts master instead of infinite
reforking
- [ ] Confirm scheduled restarts and runtime crashes still refork as
before

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches clustered process lifecycle and shutdown behavior in the Pro
Node renderer; a mistake could cause premature master exits or missed
reforks in edge cases despite added test coverage.
> 
> **Overview**
> Prevents the Pro Node renderer master process from entering an
infinite refork loop when a worker fails during `app.listen()` (e.g.,
`EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE`
IPC message before exiting, and the master captures the first failure,
disconnects workers, and exits with a clear error instead of reforking.
> 
> Also adds `port` coercion + TCP-range validation in `configBuilder`,
and introduces focused unit tests covering startup-failure messaging,
master abort semantics, and the “wait one tick” behavior to distinguish
startup failures from runtime crashes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ca6ac01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Yassa-hue <yassatamer99@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants