Skip to content

Fix infinite fork loop when node renderer worker fails to bind port#2881

Merged
justin808 merged 20 commits into
mainfrom
jg/fix-fork-loop-port-bind
Apr 7, 2026
Merged

Fix infinite fork loop when node renderer worker fails to bind port#2881
justin808 merged 20 commits into
mainfrom
jg/fix-fork-loop-port-bind

Conversation

@justin808
Copy link
Copy Markdown
Member

@justin808 justin808 commented Mar 28, 2026

Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master previously reforked unconditionally, causing an infinite fork/crash loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master before exiting. The master sets an abort flag and exits with a clear error message instead of reforking. Scheduled restarts and runtime crashes continue to refork as before.

Based on #2843 by @Yassa-hue — recreated on the main repo (not a fork) so CI runs with full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

Test plan

  • Verify new unit tests pass (masterStartupFailure.test.ts, workerStartupFailure.test.ts)
  • Confirm EADDRINUSE scenario aborts master instead of infinite reforking
  • Confirm scheduled restarts and runtime crashes still refork as before

🤖 Generated with Claude Code


Note

Medium Risk
Touches clustered process lifecycle and shutdown behavior in the Pro Node renderer; a mistake could cause premature master exits or missed reforks in edge cases despite added test coverage.

Overview
Prevents the Pro Node renderer master process from entering an infinite refork loop when a worker fails during app.listen() (e.g., EADDRINUSE). Workers now report a structured WORKER_STARTUP_FAILURE IPC message before exiting, and the master captures the first failure, disconnects workers, and exits with a clear error instead of reforking.

Also adds port coercion + TCP-range validation in configBuilder, and introduces focused unit tests covering startup-failure messaging, master abort semantics, and the “wait one tick” behavior to distinguish startup failures from runtime crashes.

Reviewed by Cursor Bugbot for commit ca6ac01. Bugbot is set up for automated code reviews on this repo. Configure here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Master now detects worker startup failures reported over IPC (WORKER_STARTUP_FAILURE) and, on such a fatal startup failure, records the failure, sets an aborting flag, and exits the master process with a clear error (instead of reforking). Workers send a structured startup-failure message before exiting when app.listen() fails.

Changes

Cohort / File(s) Summary
Shared Types
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts
Adds WORKER_STARTUP_FAILURE constant, WorkerStartupFailureMessage interface (stage 'listen', optional error metadata, host/port/message), and isWorkerStartupFailureMessage() type guard.
Master Failure Detection
packages/react-on-rails-pro-node-renderer/src/master.ts
Adds cluster.on('message') handler to detect startup-failure IPC messages, introduces fatalStartupFailure and isAbortingForStartupFailure state, and changes cluster.on('exit') to abort and process.exit(1) on recorded startup failures (special-casing EADDRINUSE); scheduled restarts still fork.
Worker Failure Reporting
packages/react-on-rails-pro-node-renderer/src/worker.ts
On app.listen error, sends a WorkerStartupFailureMessage over process.send() when IPC is available, logs on send failure, then exits with status 1; non-worker/non-IPC paths preserve prior immediate exit behavior.
Master Behavior Tests
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
New Jest tests verifying message handler registration order, detection and deduplication of startup failures, EADDRINUSE-specific error formatting, abort-with-exit behavior, preserved scheduled reforks, and ignoring invalid messages.
Worker Behavior Tests
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts
New Jest tests for isWorkerStartupFailureMessage() and worker-side IPC: asserts correct payload sent in worker mode and immediate exit behavior in single-process mode.
Repro Script
scripts/repro-node-renderer-startup-failure.mjs
Adds a repro script that optionally builds, blocks a port, spawns the renderer pointed at the blocked port, captures output, enforces timeout, and asserts expected abort vs. refork behavior.

Sequence Diagram

sequenceDiagram
    participant Worker
    participant IPC
    participant Master

    rect rgba(255, 0, 0, 0.5)
    note over Worker,Master: Startup failure reporting & abort
    Worker->>Worker: app.listen() error (e.g. EADDRINUSE)
    Worker->>IPC: process.send(WorkerStartupFailureMessage)
    IPC->>Master: deliver message
    Master->>Master: isWorkerStartupFailureMessage() -> true
    Master->>Master: record fatalStartupFailure & set isAbortingForStartupFailure
    Worker->>Worker: process.exit(1)
    Worker->>Master: cluster 'exit' event
    Master->>Master: detect recorded failure -> format message (port if EADDRINUSE)
    Master->>Master: process.exit(1)
    end

    rect rgba(0, 255, 0, 0.5)
    note over Worker,Master: Scheduled restart / runtime crash preserved
    Worker->>Master: cluster 'exit' event (scheduled restart or runtime crash without startup failure)
    Master->>Master: isScheduledRestart? -> true (or runtime crash handling)
    Master->>Master: cluster.fork() (preserved behavior)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped where sockets clash and sparks ignite,
A tiny note I sent through IPC's light,
"EADDRINUSE" I squeaked, then hopped away,
The master read, then stopped the frantic fray,
No endless forks tonight — a peaceful night ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing an infinite fork loop that occurs when a node renderer worker fails to bind to a port, which is the core problem addressed across all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jg/fix-fork-loop-port-bind

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7563af6c62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 28, 2026

Greptile Summary

This PR fixes an infinite fork/crash loop in the node renderer master process by introducing a WORKER_STARTUP_FAILURE IPC message that workers send to the master before exiting when app.listen() fails (e.g. EADDRINUSE). The master sets an abort flag on receipt and terminates with a clear error message instead of reforking. Scheduled restarts and runtime crashes are correctly preserved to continue reforking as before.

Key changes:

  • workerMessages.ts — New shared module defining the WorkerStartupFailureMessage discriminated-union type and a narrowing type guard.
  • worker.ts — On a listen error in cluster-worker mode, sends the IPC failure message (with error code, port, host) and exits in the process.send callback to guarantee message delivery before exit.
  • master.ts — Adds a cluster.on('message') handler that sets isAbortingForStartupFailure on the first failure message, and restructures the cluster.on('exit') handler to abort (not refork) when that flag is set.
  • Tests — Two new test files covering EADDRINUSE, non-EADDRINUSE, multi-worker dedup, scheduled restart, and runtime crash scenarios.
  • CI workflow — Adds a secret-availability guard so the Claude review step is skipped gracefully on fork PRs that lack CLAUDE_CODE_OAUTH_TOKEN.

Confidence Score: 5/5

Safe to merge — the core fix is correct and well-tested; only minor style/defensive-coding suggestions remain.

All findings are P2 (style and defensive hardening). The abort-flag + IPC message pattern is sound: Node.js guarantees IPC messages are delivered before the corresponding exit event fires, so the master will always see the failure message before the worker's exit. process.exit() is synchronous/terminal, so only one errorReporter.message call is made even with multiple workers. The test suite covers the main branches comprehensively.

worker.ts — consider adding a try/catch around process.send to guard against ERR_IPC_CHANNEL_CLOSED in pathological cases.

Important Files Changed

Filename Overview
packages/react-on-rails-pro-node-renderer/src/master.ts Adds IPC-message listener and abort flag to stop reforking when a worker startup failure is received; logic is sound with one minor unreachable return after process.exit(1).
packages/react-on-rails-pro-node-renderer/src/worker.ts Sends WORKER_STARTUP_FAILURE IPC message before exiting on listen error; process.send call lacks a try/catch guard for the ERR_IPC_CHANNEL_CLOSED edge case.
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts New module defining the WorkerStartupFailureMessage type and its discriminated-union type guard; clean and well-scoped.
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts Unit tests simulate master decision logic in isolation; covers EADDRINUSE, non-EADDRINUSE, multi-worker dedup, scheduled restart, and runtime crash scenarios comprehensively.
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts Tests the IPC message type guard and the worker send-then-exit sequence; minor descriptor-restore issue in afterEach for cluster.isWorker.
.github/workflows/claude-code-review.yml Adds a secret-availability guard so the Claude review step is skipped gracefully on fork PRs that lack CLAUDE_CODE_OAUTH_TOKEN; standard GitHub Actions pattern, correct.

Sequence Diagram

sequenceDiagram
    participant M as Master
    participant W1 as Worker 1
    participant W2 as Worker N

    M->>W1: cluster.fork()
    M->>W2: cluster.fork()

    W1->>W1: app.listen() → EADDRINUSE
    W2->>W2: app.listen() → EADDRINUSE

    W1-->>M: IPC: WORKER_STARTUP_FAILURE {code, port, host}
    Note over M: isAbortingForStartupFailure = true
    W2-->>M: IPC: WORKER_STARTUP_FAILURE (ignored — flag already set)

    W1->>W1: process.exit(1)
    W2->>W2: process.exit(1)

    M->>M: cluster 'exit' (worker 1)
    Note over M: errorReporter.message(...)<br/>process.exit(1) — master terminates
Loading

Reviews (1): Last reviewed commit: "Fix infinite fork loop when node rendere..." | Re-trigger Greptile

Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/worker.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

size-limit report 📦

Path Size
react-on-rails/client bundled (gzip) 62.63 KB (-0.01% 🔽)
react-on-rails/client bundled (gzip) (time) 62.63 KB (0%)
react-on-rails/client bundled (brotli) 53.7 KB (0%)
react-on-rails/client bundled (brotli) (time) 53.7 KB (0%)
react-on-rails-pro/client bundled (gzip) 63.65 KB (0%)
react-on-rails-pro/client bundled (gzip) (time) 63.65 KB (0%)
react-on-rails-pro/client bundled (brotli) 54.67 KB (0%)
react-on-rails-pro/client bundled (brotli) (time) 54.67 KB (0%)
registerServerComponent/client bundled (gzip) 127.46 KB (0%)
registerServerComponent/client bundled (gzip) (time) 127.46 KB (0%)
registerServerComponent/client bundled (brotli) 61.65 KB (0%)
registerServerComponent/client bundled (brotli) (time) 61.65 KB (0%)
wrapServerComponentRenderer/client bundled (gzip) 122.02 KB (0%)
wrapServerComponentRenderer/client bundled (gzip) (time) 122.02 KB (0%)
wrapServerComponentRenderer/client bundled (brotli) 56.77 KB (0%)
wrapServerComponentRenderer/client bundled (brotli) (time) 56.77 KB (0%)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts (1)

51-117: Exercise src/worker.ts, not a hand-built copy of the branch.

These assertions build the payload and call process.send / process.exit directly, so they can stay green even if the real app.listen error path stops sending the message or changes the payload shape. Extract the listen-failure branch into a helper and call that helper from both src/worker.ts and this suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`
around lines 51 - 117, The tests are exercising a hand-built error path instead
of the real code in src/worker.ts; extract the listen-failure logic into a
reusable exported helper (e.g., sendWorkerStartupFailure or handleListenError)
that builds the WorkerStartupFailureMessage and calls process.send/process.exit,
then call that helper from both src/worker.ts where the app.listen error handler
currently lives and from tests so the suite verifies the actual production
logic; update worker.ts to import/export the helper and replace the inline
listen-error branch with a call to that helper (refer to symbols
WORKER_STARTUP_FAILURE, WorkerStartupFailureMessage,
isWorkerStartupFailureMessage and the app.listen error handler) so tests
exercise the real implementation.
packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts (1)

14-19: Tighten the IPC payload guard.

cluster.on('message') hands you unknown, but this guard only checks type. A malformed { type: WORKER_STARTUP_FAILURE } payload would still arm the fatal-abort path and later produce incomplete diagnostics. Validate stage, host, port, and message here.

💡 Proposed fix
 export function isWorkerStartupFailureMessage(value: unknown): value is WorkerStartupFailureMessage {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    (value as { type?: string }).type === WORKER_STARTUP_FAILURE
-  );
+  if (typeof value !== 'object' || value === null) {
+    return false;
+  }
+
+  const message = value as Partial<WorkerStartupFailureMessage>;
+
+  return (
+    message.type === WORKER_STARTUP_FAILURE &&
+    message.stage === 'listen' &&
+    typeof message.host === 'string' &&
+    typeof message.port === 'number' &&
+    typeof message.message === 'string'
+  );
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`
around lines 14 - 19, The guard in isWorkerStartupFailureMessage currently only
checks the top-level type and can be spoofed; update
isWorkerStartupFailureMessage to validate the full payload shape by checking
that (value as any).stage and .host and .message are strings and .port is a
number (and not NaN), in addition to the existing (value as { type?: string
}).type === WORKER_STARTUP_FAILURE check; ensure you reference
WORKER_STARTUP_FAILURE and the isWorkerStartupFailureMessage function so the
cluster.on('message') handler only treats objects with valid stage, host, port,
and message fields as WorkerStartupFailureMessage.
packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts (1)

40-73: Avoid re-implementing src/master.ts inside the test.

handleMessage and handleExit duplicate the production control flow instead of invoking it, so this suite will not catch wiring regressions in the real cluster listeners and it currently mirrors the same worker-id mix-up as src/master.ts. Extract the decision logic into importable helpers and assert against those.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`
around lines 40 - 73, The test duplicates production flow by re-implementing
handleMessage/handleExit instead of exercising src/master.ts logic; refactor the
decision logic into small, exported helpers (e.g., decideStartupFailureMessage
or formatWorkerExitMessage) in the production code, export them from
src/master.ts (or a new module), update the test to import and call those
helpers with simulated worker and message objects, remove the local
handleMessage/handleExit in the test, and assert on the helper return values;
also ensure the helper uses the passed worker.id consistently to fix the
worker-id mix-up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/react-on-rails-pro-node-renderer/src/master.ts`:
- Around line 103-108: The abort message uses the currently-iterated worker id
(worker.id) rather than the worker that actually sent the fatal startup failure;
update the message construction in the isAbortingForStartupFailure branch to
reference fatalStartupFailure.workerId (and still use
failure/failure.port/failure.message and fallback to worker.process.exitCode
where appropriate) so the abort log points to the worker that reported the
startup failure (symbols: isAbortingForStartupFailure, fatalStartupFailure,
worker.id, worker.process.exitCode).

---

Nitpick comments:
In `@packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts`:
- Around line 14-19: The guard in isWorkerStartupFailureMessage currently only
checks the top-level type and can be spoofed; update
isWorkerStartupFailureMessage to validate the full payload shape by checking
that (value as any).stage and .host and .message are strings and .port is a
number (and not NaN), in addition to the existing (value as { type?: string
}).type === WORKER_STARTUP_FAILURE check; ensure you reference
WORKER_STARTUP_FAILURE and the isWorkerStartupFailureMessage function so the
cluster.on('message') handler only treats objects with valid stage, host, port,
and message fields as WorkerStartupFailureMessage.

In
`@packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts`:
- Around line 40-73: The test duplicates production flow by re-implementing
handleMessage/handleExit instead of exercising src/master.ts logic; refactor the
decision logic into small, exported helpers (e.g., decideStartupFailureMessage
or formatWorkerExitMessage) in the production code, export them from
src/master.ts (or a new module), update the test to import and call those
helpers with simulated worker and message objects, remove the local
handleMessage/handleExit in the test, and assert on the helper return values;
also ensure the helper uses the passed worker.id consistently to fix the
worker-id mix-up.

In
`@packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts`:
- Around line 51-117: The tests are exercising a hand-built error path instead
of the real code in src/worker.ts; extract the listen-failure logic into a
reusable exported helper (e.g., sendWorkerStartupFailure or handleListenError)
that builds the WorkerStartupFailureMessage and calls process.send/process.exit,
then call that helper from both src/worker.ts where the app.listen error handler
currently lives and from tests so the suite verifies the actual production
logic; update worker.ts to import/export the helper and replace the inline
listen-error branch with a call to that helper (refer to symbols
WORKER_STARTUP_FAILURE, WorkerStartupFailureMessage,
isWorkerStartupFailureMessage and the app.listen error handler) so tests
exercise the real implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f3f10d4-3c39-48dd-b45e-01e7e35d8674

📥 Commits

Reviewing files that changed from the base of the PR and between defad95 and 7563af6.

📒 Files selected for processing (6)
  • .github/workflows/claude-code-review.yml
  • packages/react-on-rails-pro-node-renderer/src/master.ts
  • packages/react-on-rails-pro-node-renderer/src/shared/workerMessages.ts
  • packages/react-on-rails-pro-node-renderer/src/worker.ts
  • packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
  • packages/react-on-rails-pro-node-renderer/tests/workerStartupFailure.test.ts

Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/worker.ts Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 29, 2026

Code Review

Good fix for a real operational problem. Three issues worth addressing before merge:

Bug 1 — Wrong worker ID in error message (master.ts:108)
The exit handler uses worker.id (whichever worker happened to exit first) rather than fatalStartupFailure.workerId (the worker that actually sent the failure). In a multi-worker cluster all workers try to bind the same port, so the exiting worker and the reporting worker may differ. See inline comment: #2881 (comment)

Bug 2 — Loose type guard arms a process-killing path (workerMessages.ts:14-19)
isWorkerStartupFailureMessage only checks type, so any IPC message with the right constant passes. A malformed payload would still set isAbortingForStartupFailure = true and produce an error with undefined fields. The guard should validate the required fields (stage, host, port, message). See inline comment: #2881 (comment)

Potential hang — process.send callback may never fire (worker.ts:463-466)
If the master has already exited when the worker tries to send the failure message (another worker triggered the abort first), the IPC channel can be closed and the flush callback is never called, leaving the worker hanging. The callback should accept the error argument and call process.exit(1) unconditionally, or add a fallback timeout. See inline comment: #2881 (comment)

Minor

  • The fatalStartupFailure?.failure optional chain in master.ts:104 is logically unreachable (it is always non-null when isAbortingForStartupFailure is true). Worth using a non-optional assertion to make the invariant explicit.
  • Tests in masterStartupFailure.test.ts and workerStartupFailure.test.ts replicate production logic by hand rather than importing and exercising it — they wont catch wiring regressions. Follow-up refactor, not a blocker.

@justin808
Copy link
Copy Markdown
Member Author

Follow-up on review summaries:

  • Addressed the blocking correctness items in commit ec87045d1:
    • register startup-failure message listener before initial forks
    • report startup abort with the recorded failing worker id
    • tighten isWorkerStartupFailureMessage payload validation
  • Deferring the suggested test-architecture refactors (extracting/importing helpers instead of mirroring flow in tests) to a separate follow-up so this PR stays focused.

@justin808
Copy link
Copy Markdown
Member Author

Thanks for the review.

  • Fixed Bug 1 and Bug 2 in ec87045d1.
  • For the process.send hardening note: agreed this is useful defensive hardening, and I’m tracking it as a follow-up since it is not required for the fork-loop correctness fix.
  • The optional-chain invariant/readability cleanup is also deferred as non-blocking.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Test diverges from production error message logic
    • Updated the test handler to use fatalStartupFailure?.workerId ?? worker.id and added coverage for differing failed/exiting workers to match production logic.

Create PR

Or push these changes by commenting:

@cursor push 47dd98baeb
Preview (47dd98baeb)
diff --git a/packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts b/packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
--- a/packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
+++ b/packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
@@ -57,10 +57,11 @@
 
     if (isAbortingForStartupFailure) {
       const failure = fatalStartupFailure?.failure;
+      const failedWorkerId = fatalStartupFailure?.workerId ?? worker.id;
       const msg =
         failure?.code === 'EADDRINUSE'
           ? `Node renderer startup failed: port ${failure.port} is already in use`
-          : `Node renderer startup failed in worker ${worker.id}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
+          : `Node renderer startup failed in worker ${failedWorkerId}: ${failure?.message || `exit code ${worker.process.exitCode}`}`;
 
       reportedMessages.push(msg);
       exitCode = 1;
@@ -113,6 +114,24 @@
     ]);
   });
 
+  it('uses the originally failed worker id when a different worker exits', () => {
+    const failedWorker = { id: 2, process: { exitCode: 1 } };
+    const exitingWorker = { id: 7, process: { exitCode: 1 } };
+    const failure = buildStartupFailureMessage({
+      code: 'EACCES',
+      message: 'listen EACCES: permission denied 0.0.0.0:80',
+    });
+
+    handleMessage(failedWorker, failure);
+    handleExit(exitingWorker);
+
+    expect(forkCount).toBe(0);
+    expect(exitCode).toBe(1);
+    expect(reportedMessages).toEqual([
+      'Node renderer startup failed in worker 2: listen EACCES: permission denied 0.0.0.0:80',
+    ]);
+  });
+
   it('deduplicates multiple workers failing simultaneously', () => {
     const worker1 = { id: 1, process: { exitCode: 1 } };
     const worker2 = { id: 2, process: { exitCode: 1 } };

You can send follow-ups to this agent here.

Comment thread packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts Outdated
@justin808
Copy link
Copy Markdown
Member Author

Update: the process.send hardening follow-up is now done in commit 1f7bda31f.

  • wraps startup-failure process.send in try/catch
  • handles callback sendErr explicitly
  • guarantees worker exit with a short fallback timer if callback never fires

Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/worker.ts Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 29, 2026

Review: Fix infinite fork loop on node renderer port bind failure

Good fix overall — the core idea (typed IPC message + abort flag) is sound and solves the real problem. A few issues worth addressing before merge:

Reliability concern (medium): IPC message vs. exit event ordering is not guaranteed. The message handler sets isAbortingForStartupFailure, and the exit handler reads it, but Node.js cluster does not guarantee the message event fires before the exit event for the same worker. The 500 ms fallback timer in worker.ts narrows the window but does not close it. See inline comment on master.ts:84-89 for a suggested mitigation.

Correctness nit (low): Multiple workers failing simultaneously (all binding the same port) each trigger an exit event. The first calls errorReporter.message and process.exit(1); if a second exit is already queued before the process terminates, the error message fires twice. See inline on master.ts:103-114. The unreachable return after process.exit(1) on line 113 can also be removed (process.exit is typed never).

Testing gap (low): masterStartupFailure.test.ts tests a hand-rolled copy of the logic, not the actual masterRun() function. A regression in the real wiring would not be caught. Worth adding even one thin integration-level test that calls masterRun() against a mocked cluster. See inline comment on the test file.

Documentation nit (low): The 500 ms fallback timer in worker.ts is unexplained. A one-line comment stating why it exists would help future maintainers. See inline on worker.ts:463-466.

Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts
Comment thread packages/react-on-rails-pro-node-renderer/src/worker.ts Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 29, 2026

Review Summary

The fix is well-structured and correctly solves the infinite refork loop. The IPC message approach is the right pattern - typed message, validator guard, and deduplication via isAbortingForStartupFailure are all solid. Test coverage is thorough.

A few items worth addressing before merge:

Medium priority

  • Sibling worker cleanup (master.ts line 111): With workersCount > 1, when one worker fails at startup the master calls process.exit(1) while sibling workers are still running. Those workers lose their IPC channel abruptly and may emit confusing logs. Explicitly killing live workers before exiting would keep the shutdown cleaner.

  • Race condition comment (master.ts line 103): If a second worker exits first after isAbortingForStartupFailure is set, the exit handler fires with fatalStartupFailure referencing the other worker's details. The abort behavior is still correct, but the reported context may be misleading. Worth documenting.

Minor / polish

  • Dead optional chaining (master.ts line 105): fatalStartupFailure is always non-null when isAbortingForStartupFailure is true (both set atomically in the same handler). The optional chaining and fallback to worker.id are unreachable. A non-null assertion makes the invariant explicit.

  • Port validation in type guard (workerMessages.ts line 24): The current check allows 0, negative values, and floats. Since the port is interpolated into operator-visible error messages, bounding to the valid [0, 65535] integer range is safer.

  • Magic 500ms timeout (worker.ts line 465): Should be a named constant with a brief comment explaining why 500ms was chosen.

  • return after process.exit (master.ts line 113): Unreachable in production but necessary when process.exit is mocked in tests. A short comment prevents future readers from removing it.

Overall the approach is sound and the test suite covers the key paths well. The sibling worker cleanup is the only change I would consider before shipping.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/repro-node-renderer-startup-failure.mjs`:
- Around line 38-51: The blocker server created as const blocker =
net.createServer() lacks an 'error' event handler so listen() failures (e.g.,
EADDRINUSE) become unhandled exceptions; add an error listener on blocker
(before calling blocker.listen) that logs the error (using console.error or
processLogger) and exits or otherwise handles the failure deterministically
(mirror the current process.exit(1) behavior used in the address check), and
ensure closeBlocker still works when the server errored.
- Around line 21-24: The current validation for timeoutMs (derived from
process.env.REPRO_TIMEOUT_MS) only rejects NaN or non-positive numbers but
allows non-finite values and values outside the Node setTimeout range; update
the check around timeoutMs to ensure Number.isFinite(timeoutMs) and
Number.isInteger(timeoutMs) and that timeoutMs is within the valid range (1
through 2147483647), and if not log an error including the raw
process.env.REPRO_TIMEOUT_MS and the parsed timeoutMs before process.exit(1) so
setTimeout is never called with Infinity or out-of-range values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a760fa37-a1cb-4951-b5ba-103e2a54c88a

📥 Commits

Reviewing files that changed from the base of the PR and between 1f7bda3 and 5546868.

📒 Files selected for processing (3)
  • packages/react-on-rails-pro-node-renderer/src/worker.ts
  • packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts
  • scripts/repro-node-renderer-startup-failure.mjs
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/react-on-rails-pro-node-renderer/tests/masterStartupFailure.test.ts

Comment thread scripts/repro-node-renderer-startup-failure.mjs Outdated
Comment thread scripts/repro-node-renderer-startup-failure.mjs Outdated
Comment thread packages/react-on-rails-pro-node-renderer/src/master.ts
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 6, 2026

Review

Good fix with thorough test coverage. The IPC-message approach is solid — workers signal their own startup failures, the master sets an abort flag, and a setImmediate defers the crash-vs-startup-failure classification to give in-flight IPC messages time to land. The timeout safety-nets on both sides (worker IPC send, master shutdown) follow the existing restartWorkers.ts pattern correctly.

Two minor items (left as inline comments)

  1. configBuilder.ts — redundant Number.isFinite check (validatePort): Number.isInteger already rejects NaN / ±Infinity, so the isFinite guard is dead code.

  2. master.ts — scheduled-restart fork in the IPC-arrival window: The isScheduledRestart branch forks synchronously without a setImmediate buffer. If a scheduled-restart exit fires in the same event-loop turn as a startup-failure IPC arrives, the abort flag isn't set yet and a spurious replacement worker is forked. It will fail with the same error and the master still shuts down cleanly — so it's not a correctness bug, just one unnecessary extra fork in a very unlikely scenario.

Everything else looks good

  • The ordered registration (on:messagefork × N → on:exit) is intentional and verified by the test.
  • The dual-flag pattern (isAbortingForStartupFailure + fatalStartupFailure) keeps TypeScript nullability happy without a non-null assertion while short-circuiting the type-guard on every ordinary IPC message post-abort.
  • The doExit once-flag, the IPC-send timeout, and the hard-deadline shutdown timer all guard against stuck processes correctly.
  • Test cases cover the key scenarios: EADDRINUSE message, generic failure, multiple-worker failures, exit-before-IPC race, malformed messages, and runtime crashes still reforking.

}

function validatePort(port: number): string | null {
if (!Number.isInteger(port) || !Number.isFinite(port) || port < 0 || port > 65535) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!Number.isFinite(port) is redundant here — Number.isInteger already returns false for NaN, Infinity, and -Infinity, so the finite check can never add a failure that isInteger didn't already catch.

Suggested change
if (!Number.isInteger(port) || !Number.isFinite(port) || port < 0 || port > 65535) {
if (!Number.isInteger(port) || port < 0 || port > 65535) {

Comment on lines +46 to +57
const doExit = (sendErr?: Error | null) => {
if (exited) return;
exited = true;
if (sendErr) log.error({ err: sendErr }, 'Failed to send startup failure message to master');
exitFn(1);
};
sendFn(startupFailure, undefined, undefined, doExit);
// Safety net: if the IPC channel is half-broken the callback may never
// fire, leaving this worker alive indefinitely. Force exit after a timeout.
const IPC_SEND_TIMEOUT_MS = 2000;
const timer = setTimeout(() => doExit(), IPC_SEND_TIMEOUT_MS);
if (typeof timer.unref === 'function') timer.unref();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout timer is never cleared when doExit fires via the IPC callback in the normal path. In production this is harmless (the process exits before the timer fires), and in tests the exited guard makes doExit idempotent — but clearing the timer explicitly inside doExit would be cleaner and avoids leaving a dangling handle:

Suggested change
const doExit = (sendErr?: Error | null) => {
if (exited) return;
exited = true;
if (sendErr) log.error({ err: sendErr }, 'Failed to send startup failure message to master');
exitFn(1);
};
sendFn(startupFailure, undefined, undefined, doExit);
// Safety net: if the IPC channel is half-broken the callback may never
// fire, leaving this worker alive indefinitely. Force exit after a timeout.
const IPC_SEND_TIMEOUT_MS = 2000;
const timer = setTimeout(() => doExit(), IPC_SEND_TIMEOUT_MS);
if (typeof timer.unref === 'function') timer.unref();
let exited = false;
let timer: NodeJS.Timeout | undefined;
const doExit = (sendErr?: Error | null) => {
if (exited) return;
exited = true;
clearTimeout(timer);
if (sendErr) log.error({ err: sendErr }, 'Failed to send startup failure message to master');
exitFn(1);
};
sendFn(startupFailure, undefined, undefined, doExit);
// Safety net: if the IPC channel is half-broken the callback may never
// fire, leaving this worker alive indefinitely. Force exit after a timeout.
const IPC_SEND_TIMEOUT_MS = 2000;
timer = setTimeout(() => doExit(), IPC_SEND_TIMEOUT_MS);
if (typeof timer.unref === 'function') timer.unref();

Comment on lines 139 to +143
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
} else {
cluster.fork();
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isScheduledRestart branch forks synchronously without the setImmediate deferred check that the runtime-crash branch uses. This creates an asymmetric race window: if a scheduled-restart exit fires while a WORKER_STARTUP_FAILURE IPC message is in-flight (sent but not yet processed), the master will fork a new replacement worker before setting the abort flag.

The new worker will likely fail with the same error and exit, triggering the abort on its exit — so this isn't a correctness failure (the master will still eventually abort), but it does transiently over-fork by one worker during the startup-failure shutdown sequence.

To close the window consistently, apply the same setImmediate deferral here:

Suggested change
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
} else {
cluster.fork();
return;
}
if (worker.isScheduledRestart) {
setImmediate(() => {
if (abortForStartupFailure()) return;
log.info('Restarting worker #%d on schedule', worker.id);
cluster.fork();
});
return;
}

The existing test 'does not refork a scheduled-restart worker when aborting for startup failure' already asserts the desired outcome for the synchronous case; adding a parallel async test (IPC arrives after the exit event) would cover this path.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 6, 2026

Code Review

Overall this is a clean, well-motivated fix with good test coverage. The IPC-based signalling design is solid — namespaced message type, validated type guard, idempotent shutdown guard, and a fallback hard-deadline timer. A few specific findings below.

Issues found

1. Scheduled-restart branch has the same IPC race as the runtime-crash branch (medium)
See inline comment on master.ts:139-143. The runtime-crash branch correctly defers its fork decision via setImmediate so an in-flight WORKER_STARTUP_FAILURE IPC has one event-loop tick to arrive. The isScheduledRestart branch forks synchronously, meaning a simultaneous scheduled-restart exit + startup-failure IPC can transiently produce one extra worker before shutdown. The master will still abort correctly on the extra worker's exit, but the asymmetry is worth closing.

2. Dangling timer reference in startupErrorHandler.ts (minor)
The setTimeout fallback timer is never cleared when doExit fires via the normal IPC callback path. The exited guard makes it a safe no-op, but explicitly clearing the timer inside doExit is cleaner. See inline comment with suggested fix.

3. Redundant Number.isFinite in validatePort (trivial)
Number.isInteger already returns false for NaN / ±Infinity, so the isFinite check is dead code. One-liner fix in the inline comment.

What looks good

  • Registration order: cluster.on('message') is registered before the initial cluster.fork() calls, so no startup-failure IPC can be missed.
  • hasInitiatedShutdown guard: prevents errorReporter.message and cluster.disconnect from being called more than once even when multiple workers exit during the abort window.
  • Type guard validation: isWorkerStartupFailureMessage validates the full message shape (including port range) so a malformed or spoofed IPC message cannot trigger an unintended abort.
  • Test coverage: the new test files are thorough — they cover the happy path, the IPC-before-exit and exit-before-IPC orderings, duplicate exit handling, malformed messages, timeout fallbacks, and the double-exit guard.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because on-demand usage is turned off. To enable Bugbot Autofix, turn on on-demand usage and set a spend limit in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 97784b9. Configure here.

}

function validatePort(port: number): string | null {
if (!Number.isInteger(port) || !Number.isFinite(port) || port < 0 || port > 65535) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead condition in port validation function

Low Severity

The validatePort condition !Number.isInteger(port) || !Number.isFinite(port) contains a dead sub-expression. Number.isInteger already implies Number.isFinite — every integer is finite, and Number.isInteger returns false for NaN, Infinity, and -Infinity. So when !Number.isInteger(port) is false (the only time !Number.isFinite is even evaluated), port is guaranteed to be finite, making !Number.isFinite(port) always false at that point. The !Number.isFinite check never affects the result and misleads readers into thinking there is an input class that Number.isInteger alone would miss.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 97784b9. Configure here.

The validatePort function uses Number.isInteger() which returns false
for string values like "3800". When users pass env-derived port values
(e.g. `port: env.RENDERER_PORT || 3800`), the env var is a string,
causing the validation to reject valid ports and crash the renderer.

This was the root cause of all Pro integration test failures in CI —
the node renderer exited immediately with "RENDERER_PORT must be an
integer between 0 and 65535. Received: 3800".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca6ac0162f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return (
message.type === WORKER_STARTUP_FAILURE &&
message.stage === 'listen' &&
typeof message.host === 'string' &&
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle undefined host in startup-failure message guard

This guard drops startup-failure IPC messages unless host is a string, but buildConfig can still propagate host: undefined from JS callers (for example host: process.env.RENDERER_HOST when unset). In that case, workers still send WORKER_STARTUP_FAILURE on app.listen errors, yet the master rejects the payload here and treats exits as ordinary crashes, re-entering the fork/crash loop this change is meant to prevent (notably on EADDRINUSE). Either normalize/validate host before workers start or allow missing host values in this guard.

Useful? React with 👍 / 👎.

}

function validatePort(port: number): string | null {
if (!Number.isInteger(port) || !Number.isFinite(port) || port < 0 || port > 65535) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!Number.isFinite(port) is redundant here. Number.isInteger() already returns false for NaN, Infinity, and -Infinity, so the isFinite guard adds no coverage.

Suggested change
if (!Number.isInteger(port) || !Number.isFinite(port) || port < 0 || port > 65535) {
if (!Number.isInteger(port) || port < 0 || port > 65535) {

Comment on lines 139 to +143
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
} else {
cluster.fork();
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setImmediate deferral that gives in-flight IPC messages one tick to arrive only applies to the non-scheduled-restart branch. If a scheduled-restart worker happens to exit in the same very-narrow window that a startup-failure IPC message is in-flight (i.e. isAbortingForStartupFailure is still false at the synchronous check on line 135 but the message arrives before the next tick), this branch will fork a new worker before the abort kicks in.

In practice this race is almost impossible — scheduled restarts fire minutes after startup — but it's worth a comment so future readers don't wonder why the parity is missing:

Suggested change
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
} else {
cluster.fork();
return;
}
if (worker.isScheduledRestart) {
log.info('Restarting worker #%d on schedule', worker.id);
// Scheduled restarts happen well after startup (order of minutes), so the
// startup-failure window has long closed; no setImmediate deferral needed.
cluster.fork();
return;
}

Comment on lines 126 to 131
for (let i = 0; i < workersCount; i += 1) {
cluster.fork();
}

// Listen for dying workers:
cluster.on('exit', (worker) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor ordering note: the exit listener is registered after workers are forked. A worker that fails synchronously and exits before the master reaches cluster.on('exit', …) (line 131) would be silently missed — no refork, no abort. In practice this is vanishingly unlikely (the forked process needs its own event loop to start and crash), but registering the listener before cluster.fork() (as is already done for cluster.on('message', …)) would make the intent clearer and eliminate the gap entirely.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 6, 2026

Review

The fix is well-structured. The core approach — workers send a typed IPC message before exiting, and the master arms an abort flag then defers classification of unexpected exits by one event-loop tick with setImmediate — is idiomatic Node.js cluster handling. The test coverage is thorough and covers the subtle ordering scenarios (message-before-exit, exit-before-message, duplicate exits, malformed messages).

Findings

One-line nit (configBuilder.ts)
!Number.isFinite(port) in validatePort is redundant — Number.isInteger already returns false for NaN and Infinity. See inline comment.

Design gap (master.ts — low risk)
The setImmediate deferral that lets in-flight IPC messages arrive before classifying an exit as a runtime crash is only applied to the non-scheduled-restart branch. If a scheduled-restart worker exited in the same narrow window a startup-failure message was in-flight, the master would fork once before aborting. As noted inline, this is practically impossible (scheduled restarts fire minutes after startup), but a short comment would make the asymmetry intentional rather than accidental.

Listener ordering (master.ts — cosmetic)
The cluster.on('exit', …) listener is registered after cluster.fork() calls, while cluster.on('message', …) is correctly registered before them. Moving the exit listener above the fork loop would be consistent and eliminate the theoretical (but unreachable in practice) gap where a worker exits before the listener is attached.

Summary

The logic is correct, the edge cases are handled, and the tests validate the key scenarios. The issues above are all minor. No blockers.

@justin808 justin808 merged commit 04db76e into main Apr 7, 2026
43 checks passed
@justin808 justin808 deleted the jg/fix-fork-loop-port-bind branch April 7, 2026 00:15
justin808 added a commit that referenced this pull request Apr 8, 2026
…e-dedup

* origin/main: (51 commits)
  Fix Content-Length mismatch and null renderingRequest errors in node renderer (#3069)
  Improve memory debugging docs with simpler heap snapshot approach (#3072)
  Enforce strict version matching in doctor and recommend doctor on errors (#3070)
  Remove immediate_hydration feature from codebase (#2834)
  Fix infinite fork loop when node renderer worker fails to bind port (#2881)
  Fix TanStack Router SSR hydration mismatch in Pro async path (#2932)
  Improve node renderer error messages for malformed render requests (#3068)
  Add interactive mode prompt to create-react-on-rails-app (#3063)
  docs: replace hardcoded version numbers with unversioned install commands (#2893)
  Remove Contributing section from docs sidebar (#3064)
  Consolidate docs comparison pages into single evaluation entry (#3065)
  Docs route cleanup: canonicalize worst verbose URL slugs (#3067)
  Add agent summary blocks to high-value docs pages (#3066)
  Fix brittle positional assertions in create-app tests (#2923)
  Auto-resolve renderer password from ENV in Rails Pro (#2921)
  Standardize bundle env vars from =yes to =true (#2925)
  Fix Pro generator multiline and template-literal rewrites (#2918)
  Fix spec/dummy Procfile.dev SERVER_BUNDLE_ONLY to match template convention (#2922)
  Add llms docs entry points (#2916)
  Bump version to 16.6.0.rc.0
  ...

# Conflicts:
#	packages/create-react-on-rails-app/src/create-app.ts
justin808 added a commit that referenced this pull request Apr 8, 2026
…lacement-2835-main

* origin/main: (44 commits)
  Fix Content-Length mismatch and null renderingRequest errors in node renderer (#3069)
  Improve memory debugging docs with simpler heap snapshot approach (#3072)
  Enforce strict version matching in doctor and recommend doctor on errors (#3070)
  Remove immediate_hydration feature from codebase (#2834)
  Fix infinite fork loop when node renderer worker fails to bind port (#2881)
  Fix TanStack Router SSR hydration mismatch in Pro async path (#2932)
  Improve node renderer error messages for malformed render requests (#3068)
  Add interactive mode prompt to create-react-on-rails-app (#3063)
  docs: replace hardcoded version numbers with unversioned install commands (#2893)
  Remove Contributing section from docs sidebar (#3064)
  Consolidate docs comparison pages into single evaluation entry (#3065)
  Docs route cleanup: canonicalize worst verbose URL slugs (#3067)
  Add agent summary blocks to high-value docs pages (#3066)
  Fix brittle positional assertions in create-app tests (#2923)
  Auto-resolve renderer password from ENV in Rails Pro (#2921)
  Standardize bundle env vars from =yes to =true (#2925)
  Fix Pro generator multiline and template-literal rewrites (#2918)
  Fix spec/dummy Procfile.dev SERVER_BUNDLE_ONLY to match template convention (#2922)
  Add llms docs entry points (#2916)
  Bump version to 16.6.0.rc.0
  ...

# Conflicts:
#	CHANGELOG.md
@justin808 justin808 mentioned this pull request Apr 8, 2026
2 tasks
justin808 added a commit that referenced this pull request Apr 8, 2026
### Summary

Adds missing changelog entries to the `[Unreleased]` section for PRs
merged since `v16.6.0.rc.0`. Analyzed 19 commits on `origin/main` since
the last tag, identified 1 missing user-visible PR (#3069), and skipped
10 non-user-visible PRs (docs, tests, internal refactors).

**New entries added:**
- **Added**: `[Pro] Configurable HTTP keep-alive timeout for node
renderer connections` (PR #3069)
- **Fixed**: `[Pro] Fixed SSR failures from stale persistent HTTP/2
connections` (PR #3069)

**Entries already present** (added in prior changelog updates): PRs
#2834, #2881, #2918, #2921, #2923, #2932, #3063, #3068, #3070.

**Skipped** (not user-visible): #2893 (docs), #2916 (docs), #2922 (test
fix), #2923 (test fix), #2925 (internal refactor), #3064 (docs), #3065
(docs), #3066 (docs), #3067 (docs), #3072 (docs).

### Pull Request checklist

- [x] ~Add/update test to cover these changes~
- [x] ~Update documentation~
- [x] Update CHANGELOG file

### Other Information

No code changes — CHANGELOG.md only.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk because this PR only changes documentation (`CHANGELOG.md`)
and does not modify runtime code.
> 
> **Overview**
> Updates `CHANGELOG.md` *[Unreleased]* to include missing Pro release
notes for PR `#3069`, documenting the new
`renderer_http_keep_alive_timeout` config and the associated fix for SSR
failures from stale persistent HTTP/2 connections to the node renderer.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
8d1a480. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Interactive mode selection prompt for CLI tool when no explicit mode
is specified
* Configurable keep-alive timeout setting for Pro users (default 30
seconds)

* **Bug Fixes**
  * Enhanced validation and error handling for invalid request payloads
  * Improved template literal handling in code generation
  * Better HTTP connection stability with enhanced diagnostic messaging

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
…2881)

## Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash
loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Based on #2843 by
@Yassa-hue — recreated on the main repo (not a fork) so CI runs with
full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

## Test plan

- [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`,
`workerStartupFailure.test.ts`)
- [ ] Confirm EADDRINUSE scenario aborts master instead of infinite
reforking
- [ ] Confirm scheduled restarts and runtime crashes still refork as
before

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches clustered process lifecycle and shutdown behavior in the Pro
Node renderer; a mistake could cause premature master exits or missed
reforks in edge cases despite added test coverage.
> 
> **Overview**
> Prevents the Pro Node renderer master process from entering an
infinite refork loop when a worker fails during `app.listen()` (e.g.,
`EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE`
IPC message before exiting, and the master captures the first failure,
disconnects workers, and exits with a clear error instead of reforking.
> 
> Also adds `port` coercion + TCP-range validation in `configBuilder`,
and introduces focused unit tests covering startup-failure messaging,
master abort semantics, and the “wait one tick” behavior to distinguish
startup failures from runtime crashes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ca6ac01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Yassa-hue <yassatamer99@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
### Summary

Adds missing changelog entries to the `[Unreleased]` section for PRs
merged since `v16.6.0.rc.0`. Analyzed 19 commits on `origin/main` since
the last tag, identified 1 missing user-visible PR (#3069), and skipped
10 non-user-visible PRs (docs, tests, internal refactors).

**New entries added:**
- **Added**: `[Pro] Configurable HTTP keep-alive timeout for node
renderer connections` (PR #3069)
- **Fixed**: `[Pro] Fixed SSR failures from stale persistent HTTP/2
connections` (PR #3069)

**Entries already present** (added in prior changelog updates): PRs
#2834, #2881, #2918, #2921, #2923, #2932, #3063, #3068, #3070.

**Skipped** (not user-visible): #2893 (docs), #2916 (docs), #2922 (test
fix), #2923 (test fix), #2925 (internal refactor), #3064 (docs), #3065
(docs), #3066 (docs), #3067 (docs), #3072 (docs).

### Pull Request checklist

- [x] ~Add/update test to cover these changes~
- [x] ~Update documentation~
- [x] Update CHANGELOG file

### Other Information

No code changes — CHANGELOG.md only.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk because this PR only changes documentation (`CHANGELOG.md`)
and does not modify runtime code.
> 
> **Overview**
> Updates `CHANGELOG.md` *[Unreleased]* to include missing Pro release
notes for PR `#3069`, documenting the new
`renderer_http_keep_alive_timeout` config and the associated fix for SSR
failures from stale persistent HTTP/2 connections to the node renderer.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
8d1a480. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Interactive mode selection prompt for CLI tool when no explicit mode
is specified
* Configurable keep-alive timeout setting for Pro users (default 30
seconds)

* **Bug Fixes**
  * Enhanced validation and error handling for invalid request payloads
  * Improved template literal handling in code generation
  * Better HTTP connection stability with enhanced diagnostic messaging

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
…2881)

## Summary

When a worker fails during app.listen() (e.g. EADDRINUSE), the master
previously reforked unconditionally, causing an infinite fork/crash
loop.

Workers now send a WORKER_STARTUP_FAILURE IPC message to the master
before exiting. The master sets an abort flag and exits with a clear
error message instead of reforking. Scheduled restarts and runtime
crashes continue to refork as before.

Based on #2843 by
@Yassa-hue — recreated on the main repo (not a fork) so CI runs with
full secrets.

Closes #2843

fix issue: shakacode/react_on_rails_pro#582

## Test plan

- [ ] Verify new unit tests pass (`masterStartupFailure.test.ts`,
`workerStartupFailure.test.ts`)
- [ ] Confirm EADDRINUSE scenario aborts master instead of infinite
reforking
- [ ] Confirm scheduled restarts and runtime crashes still refork as
before

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches clustered process lifecycle and shutdown behavior in the Pro
Node renderer; a mistake could cause premature master exits or missed
reforks in edge cases despite added test coverage.
> 
> **Overview**
> Prevents the Pro Node renderer master process from entering an
infinite refork loop when a worker fails during `app.listen()` (e.g.,
`EADDRINUSE`). Workers now report a structured `WORKER_STARTUP_FAILURE`
IPC message before exiting, and the master captures the first failure,
disconnects workers, and exits with a clear error instead of reforking.
> 
> Also adds `port` coercion + TCP-range validation in `configBuilder`,
and introduces focused unit tests covering startup-failure messaging,
master abort semantics, and the “wait one tick” behavior to distinguish
startup failures from runtime crashes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ca6ac01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Yassa-hue <yassatamer99@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justin808 added a commit that referenced this pull request Apr 12, 2026
### Summary

Adds missing changelog entries to the `[Unreleased]` section for PRs
merged since `v16.6.0.rc.0`. Analyzed 19 commits on `origin/main` since
the last tag, identified 1 missing user-visible PR (#3069), and skipped
10 non-user-visible PRs (docs, tests, internal refactors).

**New entries added:**
- **Added**: `[Pro] Configurable HTTP keep-alive timeout for node
renderer connections` (PR #3069)
- **Fixed**: `[Pro] Fixed SSR failures from stale persistent HTTP/2
connections` (PR #3069)

**Entries already present** (added in prior changelog updates): PRs
#2834, #2881, #2918, #2921, #2923, #2932, #3063, #3068, #3070.

**Skipped** (not user-visible): #2893 (docs), #2916 (docs), #2922 (test
fix), #2923 (test fix), #2925 (internal refactor), #3064 (docs), #3065
(docs), #3066 (docs), #3067 (docs), #3072 (docs).

### Pull Request checklist

- [x] ~Add/update test to cover these changes~
- [x] ~Update documentation~
- [x] Update CHANGELOG file

### Other Information

No code changes — CHANGELOG.md only.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk because this PR only changes documentation (`CHANGELOG.md`)
and does not modify runtime code.
> 
> **Overview**
> Updates `CHANGELOG.md` *[Unreleased]* to include missing Pro release
notes for PR `#3069`, documenting the new
`renderer_http_keep_alive_timeout` config and the associated fix for SSR
failures from stale persistent HTTP/2 connections to the node renderer.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
8d1a480. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Interactive mode selection prompt for CLI tool when no explicit mode
is specified
* Configurable keep-alive timeout setting for Pro users (default 30
seconds)

* **Bug Fixes**
  * Enhanced validation and error handling for invalid request payloads
  * Improved template literal handling in code generation
  * Better HTTP connection stability with enhanced diagnostic messaging

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants