Fix infinite fork loop when node renderer worker fails to bind port by yassa-tamer · Pull Request #2843 · shakacode/react_on_rails

yassa-tamer · 2026-03-25T17:58:37Z

Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master previously reforked unconditionally, causing an infinite fork/crash loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master before exiting. The master sets an abort flag and exits with a clear error message instead of reforking. Scheduled restarts and runtime crashes continue to refork as before.

Pull Request checklist

Remove this line after checking all the items here. If the item is not applicable to the PR, both check it out and wrap it by ~.

Add/update test to cover these changes
Update documentation
Update CHANGELOG file

Add the CHANGELOG entry at the top of the file.

Summary by CodeRabbit

New Features
- Workers now send structured startup-failure reports so the master can abort and surface clear errors.
Bug Fixes
- Improved detection of worker startup failures; clearer user messages (e.g., port-in-use) and prevention of repeated silent restarts on startup errors.
Tests
- Added tests covering startup-failure messaging and master/worker handling scenarios.
Chores
- CI skips a code-review action when its secret is absent; added a URL exclusion to link-checking.
Documentation
- Updated documentation links in release notes.

When a worker fails during app.listen() (e.g. EADDRINUSE), the master previously reforked unconditionally, causing an infinite fork/crash loop. Workers now send a WORKER_STARTUP_FAILURE IPC message to the master before exiting. The master sets an abort flag and exits with a clear error message instead of reforking. Scheduled restarts and runtime crashes continue to refork as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-03-25T17:58:43Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

coderabbitai · 2026-03-25T17:59:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59601569-189d-486e-bd6a-2f40f562d208

📥 Commits

Reviewing files that changed from the base of the PR and between ea800d5 and c1c8a3f.

📒 Files selected for processing (1)

.lychee.toml

✅ Files skipped from review due to trivial changes (1)

.lychee.toml

Walkthrough

Workers now send structured startup-failure IPC messages to the master when listen fails; the master records the first such failure and, on that worker's non-scheduled exit, emits a startup-specific error (special-casing EADDRINUSE) and exits with status 1 instead of forking a replacement.

Changes

Cohort / File(s)	Summary
Worker Message Types `packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`	Add `WORKER_STARTUP_FAILURE`, `WorkerStartupFailureMessage` (stage `'listen'`, optional `code`/`errno`/`syscall`, required `host`/`port`/`message`) and `isWorkerStartupFailureMessage` type guard.
Worker Startup Error Handling `packages/react-on-rails-pro-node-renderer/src/worker.ts`	On `app.listen` error in worker mode, send `WorkerStartupFailureMessage` via `process.send` (if available) and `process.exit(1)`; otherwise preserve previous direct-exit behavior.
Master Process Failure Handling `packages/react-on-rails-pro-node-renderer/src/master.ts`	Add `isAbortingForStartupFailure` and `fatalStartupFailure` state, `cluster.on('message')` handler to record startup-failure messages, and modify `cluster.on('exit')` to abort-and-exit (special-case `EADDRINUSE`) instead of forking when a startup failure was reported. Scheduled restarts still fork.
Tests `packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`, `packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`	New tests for the type guard, IPC message shape, clustered vs single-process behavior, EADDRINUSE messaging, master abort/exit logic, scheduled restarts, unexpected crashes, and ignoring non-matching messages.
CI Workflow `.github/workflows/claude-code-review.yml`	Make Claude Code Review step conditional on `secrets.CLAUDE_CODE_OAUTH_TOKEN`; add env var and a fallback step that prints a skip message.
Docs / Config `NEWS.md`, `.lychee.toml`	Update two docs links to new URL in `NEWS.md`; add a Lychee exclusion pattern for a specific PDF URL in `.lychee.toml`.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Worker Process
    participant Cluster as Cluster IPC
    participant Master as Master Process
    participant OS as Operating System

    Worker->>OS: app.listen(host, port)
    OS-->>Worker: listen error (e.g. EADDRINUSE)
    Worker->>Worker: build WorkerStartupFailureMessage
    Worker->>Cluster: process.send(message)
    Cluster->>Master: deliver "message" event
    Master->>Master: record fatalStartupFailure / set isAbortingForStartupFailure = true
    Worker->>OS: process.exit(1)
    OS->>Cluster: emit "exit" for worker
    Cluster->>Master: deliver "exit" event
    Master->>Master: detect abort flag + non-scheduled exit
    Master->>Master: emit startup-failure error (EADDRINUSE -> port-in-use text)
    Master->>OS: process.exit(1)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped to bind a port beneath the moonlit light,
I tapped a tiny message when the listen failed that night.
The master read my whisper and chose to close the gate,
No noisy forks or retries — a single, tidy fate.
🥕 Off I go to nap, content the logs are straight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix infinite fork loop when node renderer worker fails to bind port' accurately summarizes the main change: preventing infinite fork loops by handling worker startup failures (specifically port binding) with a new IPC mechanism.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-25T18:04:32Z

Greptile Summary

This PR fixes an infinite fork/crash loop that occurred when a node renderer worker failed to bind its port during startup (e.g. EADDRINUSE). Previously the master process would unconditionally refork any dying worker, creating a never-ending cycle. The fix introduces a typed IPC message (WORKER_STARTUP_FAILURE) that workers send to the master before exiting on a listen error; the master sets an abort flag on receipt and calls process.exit(1) with a descriptive message instead of reforking.

Key changes:

workerMessages.ts — new module with the WorkerStartupFailureMessage type, WORKER_STARTUP_FAILURE constant, and isWorkerStartupFailureMessage type-guard.
worker.ts — on a listen error in cluster-worker mode, constructs and sends the IPC message in the process.send() callback (guaranteeing flush before exit), then returns; non-cluster path falls through to the existing process.exit(1).
master.ts — registers a cluster.on('message') handler that sets isAbortingForStartupFailure on first WORKER_STARTUP_FAILURE receipt (deduplication); the exit handler checks the flag and either logs + exits (startup failure), reforques (scheduled restart), or reforques (runtime crash).
Two new test files provide isolated unit coverage for the type-guard, the worker IPC path, and all master decision branches.

Confidence Score: 4/5

Safe to merge — the primary bug is correctly fixed; two minor P2 style issues remain but do not affect correctness in normal operation.
The core logic is sound: workers signal intent before exiting, the master deduplicates via a boolean flag, and all three exit paths (startup failure, scheduled restart, runtime crash) are correctly handled and well-tested. The two noted concerns are a dead return statement (cosmetic) and a theoretical IPC ordering race that, even if triggered, limits the infinite loop to at most one extra fork cycle rather than restoring it.
packages/react-on-rails-pro-node-renderer/src/master.ts — specifically the IPC ordering note and the unreachable return.

Important Files Changed

Filename	Overview
packages/react-on-rails-pro-node-renderer/src/master.ts	Adds IPC-message-based startup-failure detection, deduplication flag, and abort-on-exit logic; one dead `return` after `process.exit()` and a theoretical IPC-message/exit ordering race exist but are non-critical.
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts	New module defining the `WorkerStartupFailureMessage` type, the `WORKER_STARTUP_FAILURE` constant, and a type-guard predicate; clean and correct.
packages/react-on-rails-pro-node-renderer/src/worker.ts	Sends `WORKER_STARTUP_FAILURE` IPC message on listen error (cluster worker path only) and exits in the send callback, ensuring the message is flushed before the process terminates.
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts	Tests the master's decision logic in isolation via simulated event handlers; covers EADDRINUSE, generic errors, multi-worker deduplication, scheduled restarts, and runtime crashes.
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts	Unit-tests `isWorkerStartupFailureMessage` type guard and verifies that the worker sends the correct IPC message before exiting in cluster mode, and exits directly in single-process mode.

Sequence Diagram

sequenceDiagram
    participant M as Master
    participant W as Worker (cluster)

    M->>W: cluster.fork()
    W->>W: app.listen({ port, host })
    alt listen succeeds
        W-->>M: (normal operation)
    else listen fails (e.g. EADDRINUSE)
        W->>M: process.send(WORKER_STARTUP_FAILURE)
        note over M: cluster.on('message')<br/>sets isAbortingForStartupFailure=true<br/>stores fatalStartupFailure
        W->>W: process.exit(1) [in send callback]
        W-->>M: cluster.on('exit')
        note over M: isAbortingForStartupFailure===true<br/>→ errorReporter.message()<br/>→ process.exit(1)
    end

    alt isScheduledRestart
        W-->>M: cluster.on('exit')
        M->>W: cluster.fork() [scheduled restart]
    else unexpected runtime crash
        W-->>M: cluster.on('exit')
        M->>W: cluster.fork() [runtime refork]
    end

_{Reviews (1): Last reviewed commit: "Fix infinite fork loop when node rendere..." | Re-trigger Greptile}

greptile-apps · 2026-03-25T18:04:36Z

+      process.exit(1);
+      return;


Dead code after process.exit()

process.exit(1) terminates the Node.js process synchronously, so the return on the next line is unreachable dead code. It can be removed.

Suggested change

process.exit(1);

return;

errorReporter.message(msg);

process.exit(1);

greptile-apps · 2026-03-25T18:04:36Z

+  cluster.on('message', (worker, message) => {
+    if (!isWorkerStartupFailureMessage(message) || isAbortingForStartupFailure) return;
+
+    isAbortingForStartupFailure = true;
+    fatalStartupFailure = { workerId: worker.id, failure: message };
+  });
+
  // Listen for dying workers:
  cluster.on('exit', (worker) => {
    if (worker.isScheduledRestart) {
      log.info('Restarting worker #%d on schedule', worker.id);
-    } else {
-      // TODO: Track last rendering request per worker.id
-      // TODO: Consider blocking a given rendering request if it kills a worker more than X times
-      const msg = `Worker ${worker.id} died UNEXPECTEDLY :(, restarting`;
+      cluster.fork();
+      return;
+    }
+
+    if (isAbortingForStartupFailure) {
+      const failure = fatalStartupFailure?.failure;
+      const msg =
+        failure?.code === 'EADDRINUSE'
+          ? `Node renderer startup failed: port ${failure.port} is already in use`
+          : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
+
      errorReporter.message(msg);
+      process.exit(1);
+      return;
    }


Potential IPC race condition between message and exit events

Node.js cluster does not guarantee that a worker's message event is processed before its exit event on the master side. If the OS delivers SIGCHLD (triggering exit) before the IPC socket data is read from the event loop (triggering message), the exit handler will observe isAbortingForStartupFailure === false and call cluster.fork() — spawning a replacement worker that will also fail.

In practice this is rare because the worker only calls process.exit(1) inside the process.send() callback (after the data has been flushed to the kernel buffer), so the timing window is very small. If it does trigger, it will cause one extra fork-and-crash cycle before the master eventually receives the deferred message event and shuts down cleanly — so it won't recreate an infinite loop, but it can still produce a confusing intermediate restart.

One mitigation would be to listen for the worker's IPC channel disconnect event and delay the refork decision briefly, or to cross-check the dead worker's id against any buffered-but-unprocessed WORKER_STARTUP_FAILURE messages. This is a known Node.js cluster caveat worth documenting if not addressed now.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/react-on-rails-pro-node-renderer/src/master.ts`:
- Around line 103-108: The generic startup-failure message currently uses
worker.id which may not be the recorded failing worker; update the message
construction in the isAbortingForStartupFailure branch to prefer
fatalStartupFailure.workerId (when present) instead of worker.id, and keep using
failure?.message or `exit code ${worker.process.exitCode}` for the error
details; ensure you still handle the EADDRINUSE case using failure.port and fall
back appropriately if fatalStartupFailure or its workerId is undefined.

In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`:
- Around line 14-19: The current isWorkerStartupFailureMessage type guard only
checks the top-level type field so malformed IPC payloads slip through; update
the function isWorkerStartupFailureMessage to fully validate the IPC shape by
confirming the object has a type === WORKER_STARTUP_FAILURE and also contains
the expected fields with correct types (e.g., stage is a string, port is a
number, and message is a string) before narrowing to
WorkerStartupFailureMessage; reference the WORKER_STARTUP_FAILURE constant and
the isWorkerStartupFailureMessage function when making the change so master.ts
can safely read stage, port, and message without risking undefined values.

In
`@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`:
- Around line 41-73: The tests currently duplicate the production logic in local
functions handleMessage and handleExit; instead, mock or spy on cluster.on to
capture the actual callbacks that masterRun registers for 'message' and 'exit',
then invoke those captured callbacks with test fixtures so assertions validate
the real handlers from masterRun; change the test to call masterRun(), grab the
registered handlers from your cluster mock (the callbacks passed to
cluster.on('message', ...) and cluster.on('exit', ...)), and drive those
captured callbacks with the same worker/message scenarios used today to assert
reportedMessages, forkCount and exitCode.

In
`@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`:
- Around line 40-119: The test suite is not exercising the actual listen-error
branch in run() of worker.ts because it constructs the WORKER_STARTUP_FAILURE
payload and calls process.send/process.exit directly; update tests to either (A)
drive the real app.listen error callback by starting the module's run() (or
invoking the server instance) and producing an EADDRINUSE error so the run()
path sends the actual WorkerStartupFailureMessage, or (B) extract the
listen-error IPC logic into a helper (e.g., sendWorkerStartupFailure or
handleListenError) and write a unit test that calls that helper with the Error
object and verifies process.send behavior and exit handling using
cluster.isWorker and isWorkerStartupFailureMessage/WorkerStartupFailureMessage;
ensure tests toggle cluster.isWorker and stub process.send/process.exit only
around the actual code path being exercised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 09e8287d-fe75-49a5-8584-9d4da70f441e

📥 Commits

Reviewing files that changed from the base of the PR and between c584ab1 and 10200bc.

📒 Files selected for processing (5)

packages/react-on-rails-pro-node-renderer/src/master.ts
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts
packages/react-on-rails-pro-node-renderer/src/worker.ts
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts

coderabbitai · 2026-03-25T18:09:45Z

+    if (isAbortingForStartupFailure) {
+      const failure = fatalStartupFailure?.failure;
+      const msg =
+        failure?.code === 'EADDRINUSE'
+          ? `Node renderer startup failed: port ${failure.port} is already in use`
+          : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;


⚠️ Potential issue | 🟡 Minor

Use the recorded failing worker ID in the generic error path.

Once isAbortingForStartupFailure is set, the next exit event is not guaranteed to come from the same worker that sent the IPC message. Using worker.id here can misreport which worker actually failed even though fatalStartupFailure.workerId already captured it.

🐛 Proposed fix

if (isAbortingForStartupFailure) { const failure = fatalStartupFailure?.failure; + const failedWorkerId = fatalStartupFailure?.workerId ?? worker.id; const msg = failure?.code === 'EADDRINUSE' ? `Node renderer startup failed: port ${failure.port} is already in use` - : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`; + : `Node renderer startup failed in worker ${failedWorkerId}: ${failure?.message || `exit code ${worker.process.exitCode}`}`; errorReporter.message(msg); process.exit(1);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (isAbortingForStartupFailure) {

const failure = fatalStartupFailure?.failure;

const msg =

failure?.code === 'EADDRINUSE'

? `Node renderer startup failed: port ${failure.port} is already in use`

: `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;

if (isAbortingForStartupFailure) {

const failure = fatalStartupFailure?.failure;

const failedWorkerId = fatalStartupFailure?.workerId ?? worker.id;

const msg =

failure?.code === 'EADDRINUSE'

? `Node renderer startup failed: port ${failure.port} is already in use`

: `Node renderer startup failed in worker ${failedWorkerId}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/react-on-rails-pro-node-renderer/src/master.ts` around lines 103 - 108, The generic startup-failure message currently uses worker.id which may not be the recorded failing worker; update the message construction in the isAbortingForStartupFailure branch to prefer fatalStartupFailure.workerId (when present) instead of worker.id, and keep using failure?.message or `exit code ${worker.process.exitCode}` for the error details; ensure you still handle the EADDRINUSE case using failure.port and fall back appropriately if fatalStartupFailure or its workerId is undefined.

coderabbitai · 2026-03-25T18:09:45Z

+export function isWorkerStartupFailureMessage(value: unknown): value is WorkerStartupFailureMessage {
+  return (
+    typeof value === 'object' &&
+    value !== null &&
+    (value as { type?: string }).type === WORKER_STARTUP_FAILURE
+  );


⚠️ Potential issue | 🟠 Major

Validate the full IPC payload shape in the type guard.

packages/react-on-rails-pro-node-renderer/src/master.ts reads stage, port, and message immediately after this check. Right now any { type: WORKER_STARTUP_FAILURE } payload passes, so a malformed IPC message can trigger a fatal abort and produce undefined diagnostics.

🛡️ Proposed fix

export function isWorkerStartupFailureMessage(value: unknown): value is WorkerStartupFailureMessage { - return ( - typeof value === 'object' && - value !== null && - (value as { type?: string }).type === WORKER_STARTUP_FAILURE - ); + if (typeof value !== 'object' || value === null) { + return false; + } + + const message = value as Partial<WorkerStartupFailureMessage>; + return ( + message.type === WORKER_STARTUP_FAILURE && + message.stage === 'listen' && + typeof message.host === 'string' && + typeof message.port === 'number' && + typeof message.message === 'string' && + (message.code === undefined || typeof message.code === 'string') && + (message.errno === undefined || typeof message.errno === 'number') && + (message.syscall === undefined || typeof message.syscall === 'string') + ); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts` around lines 14 - 19, The current isWorkerStartupFailureMessage type guard only checks the top-level type field so malformed IPC payloads slip through; update the function isWorkerStartupFailureMessage to fully validate the IPC shape by confirming the object has a type === WORKER_STARTUP_FAILURE and also contains the expected fields with correct types (e.g., stage is a string, port is a number, and message is a string) before narrowing to WorkerStartupFailureMessage; reference the WORKER_STARTUP_FAILURE constant and the isWorkerStartupFailureMessage function when making the change so master.ts can safely read stage, port, and message without risking undefined values.

coderabbitai · 2026-03-25T18:09:45Z

+  function handleMessage(worker: { id: number }, message: unknown) {
+    if (!isWorkerStartupFailureMessage(message) || isAbortingForStartupFailure) return;
+
+    isAbortingForStartupFailure = true;
+    fatalStartupFailure = { workerId: worker.id, failure: message };
+  }
+
+  function handleExit(worker: {
+    id: number;
+    isScheduledRestart?: boolean;
+    process: { exitCode: number | null };
+  }) {
+    if (worker.isScheduledRestart) {
+      forkCount += 1; // cluster.fork()
+      return;
+    }
+
+    if (isAbortingForStartupFailure) {
+      const failure = fatalStartupFailure?.failure;
+      const msg =
+        failure?.code === 'EADDRINUSE'
+          ? `Node renderer startup failed: port ${failure.port} is already in use`
+          : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
+
+      reportedMessages.push(msg);
+      exitCode = 1;
+      return;
+    }
+
+    const msg = `Worker ${worker.id} died UNEXPECTEDLY :(, restarting`;
+    reportedMessages.push(msg);
+    forkCount += 1; // cluster.fork()
+  }


⚠️ Potential issue | 🟠 Major

These tests duplicate packages/react-on-rails-pro-node-renderer/src/master.ts instead of testing it.

handleMessage and handleExit are local copies of the production branches, so the assertions below can stay green even if masterRun() stops registering the cluster handlers or changes their control flow. Please capture the real cluster.on('message') / cluster.on('exit') callbacks from masterRun() under mocks.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts` around lines 41 - 73, The tests currently duplicate the production logic in local functions handleMessage and handleExit; instead, mock or spy on cluster.on to capture the actual callbacks that masterRun registers for 'message' and 'exit', then invoke those captured callbacks with test fixtures so assertions validate the real handlers from masterRun; change the test to call masterRun(), grab the registered handlers from your cluster mock (the callbacks passed to cluster.on('message', ...) and cluster.on('exit', ...)), and drive those captured callbacks with the same worker/message scenarios used today to assert reportedMessages, forkCount and exitCode.

coderabbitai · 2026-03-25T18:09:46Z

+describe('worker startup failure IPC', () => {
+  const originalSend = process.send;
+  const originalExit = process.exit;
+  const originalIsWorker = cluster.isWorker;
+
+  afterEach(() => {
+    process.send = originalSend;
+    process.exit = originalExit;
+    Object.defineProperty(cluster, 'isWorker', { value: originalIsWorker, writable: true });
+  });
+
+  it('sends WORKER_STARTUP_FAILURE message in clustered mode', () => {
+    // Simulate a cluster worker environment
+    Object.defineProperty(cluster, 'isWorker', { value: true, writable: true });
+    const sentMessages: unknown[] = [];
+    const exitCalls: number[] = [];
+
+    process.send = ((msg: unknown, _handle?: unknown, _options?: unknown, callback?: () => void) => {
+      sentMessages.push(msg);
+      if (callback) callback();
+      return true;
+    }) as typeof process.send;
+
+    process.exit = ((code?: number) => {
+      exitCalls.push(code ?? 0);
+    }) as typeof process.exit;
+
+    // Simulate what the worker does on listen error
+    const err = Object.assign(new Error('listen EADDRINUSE: address already in use :::3800'), {
+      code: 'EADDRINUSE',
+      errno: -48,
+      syscall: 'listen',
+    });
+    const host = 'localhost';
+    const port = 3800;
+
+    const startupFailure: WorkerStartupFailureMessage = {
+      type: WORKER_STARTUP_FAILURE,
+      stage: 'listen',
+      code: err.code,
+      errno: err.errno,
+      syscall: err.syscall,
+      host,
+      port,
+      message: err.message,
+    };
+
+    process.send!(startupFailure, undefined, undefined, () => {
+      process.exit(1);
+    });
+
+    expect(sentMessages).toHaveLength(1);
+    expect(isWorkerStartupFailureMessage(sentMessages[0])).toBe(true);
+    expect((sentMessages[0] as WorkerStartupFailureMessage).code).toBe('EADDRINUSE');
+    expect(exitCalls).toEqual([1]);
+  });
+
+  it('exits without IPC in single-process mode', () => {
+    Object.defineProperty(cluster, 'isWorker', { value: false, writable: true });
+    process.send = undefined;
+
+    const exitCalls: number[] = [];
+    process.exit = ((code?: number) => {
+      exitCalls.push(code ?? 0);
+    }) as typeof process.exit;
+
+    // In single-process mode, cluster.isWorker is false and process.send is undefined
+    if (cluster.isWorker && process.send) {
+      // Should not reach here
+      process.send({ type: WORKER_STARTUP_FAILURE }, undefined, undefined, () => {
+        process.exit(1);
+      });
+      return;
+    }
+
+    process.exit(1);
+
+    expect(exitCalls).toEqual([1]);
+  });
+});


⚠️ Potential issue | 🟠 Major

This suite doesn't exercise packages/react-on-rails-pro-node-renderer/src/worker.ts.

It reconstructs the payload and calls process.send/process.exit directly, so it still passes if the listen-error branch in run() stops emitting WORKER_STARTUP_FAILURE. Please drive the real app.listen error callback or extract that branch into a helper and test the helper.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts` around lines 40 - 119, The test suite is not exercising the actual listen-error branch in run() of worker.ts because it constructs the WORKER_STARTUP_FAILURE payload and calls process.send/process.exit directly; update tests to either (A) drive the real app.listen error callback by starting the module's run() (or invoking the server instance) and producing an EADDRINUSE error so the run() path sends the actual WorkerStartupFailureMessage, or (B) extract the listen-error IPC logic into a helper (e.g., sendWorkerStartupFailure or handleListenError) and write a unit test that calls that helper with the Error object and verifies process.send behavior and exit handling using cluster.isWorker and isWorkerStartupFailureMessage/WorkerStartupFailureMessage; ensure tests toggle cluster.isWorker and stub process.send/process.exit only around the actual code path being exercised.

justin808 · 2026-03-26T22:50:13Z

@claude review this PR
@codex review this PR
@cursoragent review this PR

chatgpt-codex-connector · 2026-03-26T22:50:29Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

justin808 · 2026-03-28T23:29:43Z

See #2881

…2881) ## Summary When a worker fails during app.listen() (e.g. EADDRINUSE), the master previously reforked unconditionally, causing an infinite fork/crash loop. Workers now send a WORKER_STARTUP_FAILURE IPC message to the master before exiting. The master sets an abort flag and exits with a clear error message instead of reforking. Scheduled restarts and runtime crashes continue to refork as before. Based on #2843 by @Yassa-hue — recreated on the main repo (not a fork) so CI runs with full secrets. Closes #2843 fix issue: shakacode/react_on_rails_pro#582 ## Test plan - [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`, `workerStartupFailure.test.ts`) - [ ] Confirm EADDRINUSE scenario aborts master instead of infinite reforking - [ ] Confirm scheduled restarts and runtime crashes still refork as before 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Touches clustered process lifecycle and shutdown behavior in the Pro Node renderer; a mistake could cause premature master exits or missed reforks in edge cases despite added test coverage. > > **Overview** > Prevents the Pro Node renderer master process from entering an infinite refork loop when a worker fails during `app.listen()` (e.g., `EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE` IPC message before exiting, and the master captures the first failure, disconnects workers, and exits with a clear error instead of reforking. > > Also adds `port` coercion + TCP-range validation in `configBuilder`, and introduces focused unit tests covering startup-failure messaging, master abort semantics, and the “wait one tick” behavior to distinguish startup failures from runtime crashes. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit ca6ac01. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  --------- Co-authored-by: Yassa-hue <yassatamer99@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Mar 25, 2026

View reviewed changes

coderabbitai Bot requested changes Mar 25, 2026

View reviewed changes

justin808 added 4 commits March 26, 2026 17:05

Fix claude review for fork PRs

fcf1119

Use token-gated Claude review on pull_request

d09b98e

Replace legacy Gitbook URL in NEWS

ea800d5

Exclude flaky diva-portal PDF from lychee checks

c1c8a3f

justin808 mentioned this pull request Mar 28, 2026

Fix infinite fork loop when node renderer worker fails to bind port #2881

Merged

3 tasks

justin808 closed this Mar 28, 2026

Seifeldin7 mentioned this pull request Apr 27, 2026

fix(tanstack-router): remove Suspense gate around RouterProvider during hydration #3213

Open

4 tasks

Uh oh!

Conversation

yassa-tamer commented Mar 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pull Request checklist

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector Bot commented Mar 25, 2026

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

justin808 commented Mar 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 26, 2026

Uh oh!

justin808 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yassa-tamer commented Mar 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading