Skip to content

fix(onboard): surface NEMOCLAW_DASHBOARD_PORT/GATEWAY_PORT override in preflight error#2578

Merged
ericksoa merged 1 commit into
NVIDIA:mainfrom
latenighthackathon:fix/onboard-port-override-discoverability
May 5, 2026
Merged

fix(onboard): surface NEMOCLAW_DASHBOARD_PORT/GATEWAY_PORT override in preflight error#2578
ericksoa merged 1 commit into
NVIDIA:mainfrom
latenighthackathon:fix/onboard-port-override-discoverability

Conversation

@latenighthackathon
Copy link
Copy Markdown
Contributor

@latenighthackathon latenighthackathon commented Apr 28, 2026

Summary

Closes #2497.

When nemoclaw onboard aborts at preflight because port 8080 or 18789 is held by an unrelated process, the error currently lists the conflicting process and tells the user to stop it (sudo kill <PID>) — but offers no way to keep the conflicting process and onboard on a different port. The override env vars NEMOCLAW_GATEWAY_PORT and NEMOCLAW_DASHBOARD_PORT have always been honored by parsePort() in src/lib/ports.ts; they were just undocumented and undiscoverable from the error message.

Problem

@BryanTegomoh hit this on a clean install where an unrelated dev server was already on 18789. Their report (#2497) lists three acceptable fixes; this PR implements option 2 ("Honor a NEMOCLAW_DASHBOARD_PORT env var and document it in the preflight error") — the smallest possible change that unblocks the user, since the env var already works and just needs surfacing.

The original error path (after my change):

  !! Port 18789 is not available.
     NemoClaw dashboard needs this port.

     Blocked by: node (PID 12345)

     To fix, stop the conflicting process:

       sudo kill 12345

     Or rerun with a different port:
       NEMOCLAW_DASHBOARD_PORT=<port> nemoclaw onboard

     Detail: ...

The new "Or rerun with a different port" hint is generated from a per-port envVar field threaded through the existing requiredPorts list, so the error message picks the correct override variable for each port (NEMOCLAW_GATEWAY_PORT for 8080 / NEMOCLAW_DASHBOARD_PORT for 18789) without any string matching.

Test plan

  • Unit: 209 onboard + preflight tests pass (npx vitest run --project cli --testTimeout=180000 test/onboard.test.ts src/lib/preflight.test.ts).
  • E2E: extended test/e2e/e2e-cloud-experimental/test-port8080-conflict.sh to assert the new NEMOCLAW_GATEWAY_PORT=<port> nemoclaw onboard hint appears in onboard output, locking the behavior in for future refactors.
  • Manual: confirmed NEMOCLAW_DASHBOARD_PORT=29999 node resolves DASHBOARD_PORT to 29999 via parsePort() (env var was already wired — only the error-message discoverability was missing).
  • Build clean, signed commit, DCO ✅.

Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com

Summary by CodeRabbit

  • New Features

    • Improved onboarding port-conflict messaging: when a required port is in use, the tool now includes a copy-paste ready environment-variable override hint to retry with a different port.
  • Tests

    • End-to-end test updated to verify the new override-hint appears in port-conflict failure output.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 22bcc368-4af3-4551-9594-4b32eb77fcf2

📥 Commits

Reviewing files that changed from the base of the PR and between 556457e and 37e0d02.

📒 Files selected for processing (1)
  • src/lib/onboard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard.ts

📝 Walkthrough

Walkthrough

Preflight port validation now attaches an envVar name to each required port and, when a port is in use, prints a copy‑paste hint showing the environment variable override (e.g., NEMOCLAW_GATEWAY_PORT=<port> nemoclaw onboard) alongside existing conflict diagnostics.

Changes

Preflight + Test

Layer / File(s) Summary
Data Shape
src/lib/onboard.ts
requiredPorts entries now include envVar (e.g., NEMOCLAW_GATEWAY_PORT / NEMOCLAW_DASHBOARD_PORT).
Core Check Logic
src/lib/onboard.ts
Port-check loop now destructures { port, label, envVar }; conflict handling appends an "Or rerun with a different port" hint using the envVar and port value.
Tests / Assertions
test/e2e/e2e-cloud-experimental/test-port8080-conflict.sh
E2E test extended to assert the onboarding failure output includes the new override hint (NEMOCLAW_GATEWAY_PORT=<port> nemoclaw onboard), failing the test if absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 I sniffed the ports and found one blocked,
A gentle hint was added, no need to knock.
"Set my env and try again," it said with cheer,
I hopped, I clicked, onboarding drew near. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: surfacing environment variable overrides (NEMOCLAW_DASHBOARD_PORT/GATEWAY_PORT) in the preflight error message, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR fully implements option 2 from issue #2497: it honors and documents the NEMOCLAW_DASHBOARD_PORT and NEMOCLAW_GATEWAY_PORT environment variables and surfaces them in the preflight error output.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the requirements: adding envVar field to requiredPorts data structure and updating error messaging to include override hints. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@wscurran wscurran added bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Apr 28, 2026
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR that identifies a bug in the onboard process and proposes a change to improve the user experience by surfacing port override options. This proposes an enhancement to the NemoClaw CLI to allow specifying alternative ports, which will improve the usability of the sandbox resource.


Related open issues:

@latenighthackathon latenighthackathon force-pushed the fix/onboard-port-override-discoverability branch from 7365139 to 556457e Compare April 28, 2026 15:31
@wscurran wscurran added status: rfr Ready for review — no conflicts, awaiting maintainer review VDR Linked to VDR finding labels Apr 29, 2026
@jyaunches jyaunches self-requested a review April 29, 2026 22:43
@jyaunches jyaunches self-assigned this Apr 29, 2026
…n preflight error

When `nemoclaw onboard` aborts at preflight because port 8080 or 18789 is held
by an unrelated process, the error currently lists the conflicting process
and tells the user to stop it (`sudo kill <PID>`) — but offers no way to keep
the conflicting process and onboard on a different port. The override env vars
NEMOCLAW_GATEWAY_PORT / NEMOCLAW_DASHBOARD_PORT have always been honored by
parsePort() in src/lib/ports.ts; they were just undocumented and undiscoverable.

Thread the env-var name through the requiredPorts list and append a
"rerun with a different port" hint immediately before the existing Detail
line. Users blocked by an unrelated dev server on 18789 can now copy-paste
NEMOCLAW_DASHBOARD_PORT=<N> nemoclaw onboard without leaving the wizard.

Lock the new hint in via the existing port8080-conflict E2E so a future
refactor of the error block doesn't silently regress discoverability.

Closes NVIDIA#2497.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
@latenighthackathon latenighthackathon force-pushed the fix/onboard-port-override-discoverability branch from 556457e to 37e0d02 Compare May 5, 2026 02:08
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 37e0d02. This is a narrow preflight failure-path change: it only threads the existing port override env var names into the already-failing port-unavailable diagnostic and adds E2E coverage for the new hint. The existing NEMOCLAW_GATEWAY_PORT/NEMOCLAW_DASHBOARD_PORT parsing remains unchanged, and the success path for gateway start, dashboard allocation, and sandbox creation is not modified.

@ericksoa ericksoa merged commit 3fa5c2f into NVIDIA:main May 5, 2026
10 of 13 checks passed
@latenighthackathon latenighthackathon deleted the fix/onboard-port-override-discoverability branch May 6, 2026 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). status: rfr Ready for review — no conflicts, awaiting maintainer review VDR Linked to VDR finding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Onboard] Preflight fails on fresh install when port 18789 is held by an unrelated process; no port override available

4 participants