Skip to content

fix(onboard): roll back orphaned sandbox when dashboard alloc fails (#2174 followup)#2627

Merged
cv merged 3 commits into
mainfrom
fix/2174-ghost-sandbox-rollback
Apr 29, 2026
Merged

fix(onboard): roll back orphaned sandbox when dashboard alloc fails (#2174 followup)#2627
cv merged 3 commits into
mainfrom
fix/2174-ghost-sandbox-rollback

Conversation

@cjagwani
Copy link
Copy Markdown
Contributor

@cjagwani cjagwani commented Apr 28, 2026

Summary

Followup to #2411 (merged). Closes the second half of #2174's title — "leaks ghost sandbox" — that the merged fix didn't address.

When findAvailableDashboardPort exhausts the port range it throws. By the time ensureDashboardForward runs (src/lib/onboard.ts:3830), the openshell sandbox has already been streamed into existence (streamSandboxCreate at line 3734, ready-wait at lines 3789–3803). The throw escapes unhandled, leaving the sandbox in openshell sandbox list with no NemoClaw registry entry — exactly the drift reported in #2174.

Issue comment with the analysis: #2174 (comment)

Related Issue

Followup to #2174 (closed by #2411). #2411 fixed the crash; this PR fixes the leak.

Changes

  • src/lib/onboard.tsensureDashboardForward takes a new { rollbackSandboxOnFailure?: boolean } option. On unrecoverable port-allocation failure (range exhaustion), the just-created openshell sandbox is deleted, an actionable error is printed, and the process exits with code 1 — mirroring the not-ready rollback pattern already in place at lines 3789–3803.
  • The post-create call site (line 3830) opts in (rollbackSandboxOnFailure: true).
  • The four reuse-path call sites (lines 3295, 3326, 3372, 3390) keep the default (false) — they operate on already-existing sandboxes where deleting on alloc failure would destroy user state.

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • npm run build:cli + npm run typecheck:cli green
  • npx vitest run --project cli — 2250/2250 passed
  • npx vitest run --project plugin — 410/410 passed
  • npx prek run --files src/lib/onboard.ts — green
  • No new tests added: the rollback path mirrors the existing not-ready rollback (3789–3803), which also has no dedicated unit test. Adding test infrastructure here would exceed the minimum scope of this followup.
  • No secrets, API keys, or credentials committed

Notes for reviewer

  • Diff is ~35 lines on a single file. Surgical fix matching an existing pattern in the same function.
  • Doc comment on ensureDashboardForward updated to describe the new option and reference the pattern it mirrors.

AI Disclosure

  • AI-assisted — tool: Claude Code

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Sandbox initialization now attempts automatic rollback if dashboard port allocation fails, and exits safely when rollback succeeds.
    • When automatic cleanup fails, users are shown clear manual-cleanup instructions and the exact command to remove the orphaned sandbox.
  • Tests

    • Added runtime tests verifying rollback messaging covers both successful and failed cleanup scenarios and accurate sandbox naming in instructions.

…2174)

#2411 (merged) addressed the crash half of #2174 — second onboard now
auto-allocates a free dashboard port instead of throwing — but the
"leaks ghost sandbox" half of the issue title is still live.

When findAvailableDashboardPort exhausts the port range, it throws.
ensureDashboardForward is called from createSandbox AFTER the openshell
sandbox has already been streamed-into-existence (line 3830, after
streamSandboxCreate at 3734 and the ready-wait at 3789-3803). The throw
escapes upward unhandled, leaving the openshell sandbox in
'openshell sandbox list' with no NemoClaw registry entry — exactly the
drift the original bug reported.

Add an opt-in 'rollbackSandboxOnFailure' option to ensureDashboardForward
that catches the throw, runs 'openshell sandbox delete <name>', emits an
actionable error, and process.exit(1) — matching the existing not-ready
rollback pattern at lines 3789-3803.

Only the post-create call site at line 3830 opts in. The four reuse-path
call sites operate on already-existing sandboxes where deleting on alloc
failure would destroy user state; they keep the default (false).

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 032e3b33-6818-4a01-80c4-4e5472d41f9b

📥 Commits

Reviewing files that changed from the base of the PR and between 4be7f5f and ef9b0d5.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-rollback.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/onboard-rollback.test.ts

📝 Walkthrough

Walkthrough

createSandbox now requests rollback from ensureDashboardForward. ensureDashboardForward accepts an options object, catches dashboard-port allocation errors, and—when rollback enabled—best-effort deletes the newly-created sandbox, prints retry or manual-cleanup guidance, and exits; when rollback disabled it rethrows the allocation error.

Changes

Cohort / File(s) Summary
Onboard runtime changes
src/lib/onboard.ts
createSandbox passes { rollbackSandboxOnFailure: true } into ensureDashboardForward. ensureDashboardForward gains an options argument, wraps findAvailableDashboardPort in try/catch, and—when rollback enabled—attempts a best-effort openshell sandbox delete (errors ignored), prints either a safe-retry message or manual cleanup instructions depending on delete success, and exits the process; when rollback disabled it rethrows the original error.
New helper export
src/lib/onboard.ts
Added and exported buildOrphanedSandboxRollbackMessage(sandboxName: string, err: unknown, deleteSucceeded: boolean): string[] to centralize rollback-path console message generation.
Tests
test/onboard-rollback.test.ts
Adds Vitest tests requiring the compiled helper from dist/lib/onboard, validates returned string-array lines for both successful and failed delete scenarios, checks non-Error inputs are stringified, and ensures the sandbox name is quoted exactly in manual-cleanup commands.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I’m a rabbit by the sandbox gate,
I nudge the shell and mend the state.
When ports revolt and errors sting,
I tidy up and softly sing. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding rollback functionality for orphaned sandboxes when dashboard port allocation fails, with clear reference to the follow-up issue context.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2174-ghost-sandbox-rollback

Comment @coderabbitai help to get the list of available commands and usage tips.

Extract the rollback error-line construction into a pure helper
buildOrphanedSandboxRollbackMessage so the rollback path is testable
without spawning subprocesses or exiting the process.

Add test/onboard-rollback.test.ts with 4 cases:
- successful delete → "removed — you can safely retry" line
- failed delete → manual-cleanup guidance with the exact openshell
  command for the orphaned sandbox name
- non-Error throwables coerced via String() so the alloc message survives
- arbitrary sandbox names round-trip into the cleanup command verbatim

ensureDashboardForward still drives runOpenshell + console.error +
process.exit (untestable from outside the module without mocking
subprocess); the helper isolates the deterministic part where the
behavior actually lives.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/onboard.ts (1)

6185-6202: Align the JSDoc blocks with the actual function targets.

The rollback behavior docs for ensureDashboardForward are currently separated by another JSDoc block; attach each block directly to its function to avoid doc drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6185 - 6202, The JSDoc paragraphs are
misaligned: the rollback behavior doc meant for ensureDashboardForward is
separated by another comment block and thus not attached to its function; move
the JSDoc that documents rollbackSandboxOnFailure/rollback behavior so it sits
immediately above the ensureDashboardForward function declaration, and likewise
place the doc for "Build the actionable error lines" directly above the pure
function that builds the error lines (the function that takes sandboxName,
alloc-error, delete-result), ensuring each JSDoc block directly precedes its
corresponding function (ensureDashboardForward and the error-line builder) to
prevent doc drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/onboard-rollback.test.ts`:
- Line 21: Replace the CommonJS require call that assigns loadedOnboardInternals
with a top-level ESM import; remove the require("../dist/lib/onboard") usage and
add an import (e.g., import * as loadedOnboardInternals from
"../dist/lib/onboard") at the top of test/onboard-rollback.test.ts so the test
file complies with the project ESM requirements and all usages of
loadedOnboardInternals continue to work.

---

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 6185-6202: The JSDoc paragraphs are misaligned: the rollback
behavior doc meant for ensureDashboardForward is separated by another comment
block and thus not attached to its function; move the JSDoc that documents
rollbackSandboxOnFailure/rollback behavior so it sits immediately above the
ensureDashboardForward function declaration, and likewise place the doc for
"Build the actionable error lines" directly above the pure function that builds
the error lines (the function that takes sandboxName, alloc-error,
delete-result), ensuring each JSDoc block directly precedes its corresponding
function (ensureDashboardForward and the error-line builder) to prevent doc
drift.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2f910e7e-fc67-448d-96b6-2fa3c82dd0cf

📥 Commits

Reviewing files that changed from the base of the PR and between 7df3016 and 4be7f5f.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-rollback.test.ts

);
}

const loadedOnboardInternals = require("../dist/lib/onboard");
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify test files under test/ are ESM-only (no CommonJS require/module.exports).
rg -n --glob 'test/**/*.{js,ts}' '\brequire\s*\(|\bmodule\.exports\b|\bexports\.'

Repository: NVIDIA/NemoClaw

Length of output: 33865


Replace CommonJS require with ESM import in this test.

Line 21 uses require(), which breaks the test-folder ESM requirement per coding guidelines.

Proposed fix
 import { describe, it, expect } from "vitest";
+import * as loadedOnboardInternals from "../dist/lib/onboard";
@@
-const loadedOnboardInternals = require("../dist/lib/onboard");
 const onboardInternals =
   typeof loadedOnboardInternals === "object" && loadedOnboardInternals !== null
     ? loadedOnboardInternals
     : null;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard-rollback.test.ts` at line 21, Replace the CommonJS require call
that assigns loadedOnboardInternals with a top-level ESM import; remove the
require("../dist/lib/onboard") usage and add an import (e.g., import * as
loadedOnboardInternals from "../dist/lib/onboard") at the top of
test/onboard-rollback.test.ts so the test file complies with the project ESM
requirements and all usages of loadedOnboardInternals continue to work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing back on this. Tried the conversion in commit ef9b0d5 then reverted.

Adding export to buildOrphanedSandboxRollbackMessage in src/lib/onboard.ts (so the named ESM import would type-check) reshapes dist/lib/onboard.d.ts from unknown-shaped to a typed module. That breaks two other tests:

  • test/ssh-known-hosts.test.ts:27 — TS2345
  • test/wsl2-probe-timeout.test.ts:26,29 — TS2559 + TS2345

Both rely on the d.ts staying opaque so their runtime guards (isOnboardKnownHostsInternals, isOnboardValidationCandidate) type-narrow against OnboardX | typeof import("../dist/lib/onboard") | null. Once import("../dist/lib/onboard") has known properties, TS rejects "no properties in common" with the OnboardXCandidate shape and the call sites stop compiling.

Keeping require() here matches the convention already used for this same file by:

  • test/credential-rotation.test.ts:43
  • test/gateway-start-wait.test.ts:15
  • test/gemini-probe-auth.test.ts:14
  • test/ssh-known-hosts.test.ts
  • test/wsl2-probe-timeout.test.ts

The mixed pattern in test/ reflects whether the imported module uses per-function export (pure-function modules like policies, runner, resolve-openshell) or bottom-of-file module.exports = {...} with runtime-guarded internals (onboard). The CLAUDE.md ESM rule applies cleanly to the former; the latter would need a coordinated sweep of onboard.ts + the four neighbouring tests, which is out of scope for #2174's leak fix.

Documented this rationale inline in test/onboard-rollback.test.ts so the next reader doesn't re-litigate.

The other CR finding (JSDoc alignment) is fixed in the same commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjagwani, that's a thorough and convincing explanation — withdrawing this comment. The cascading TS2345/TS2559 failures in ssh-known-hosts and wsl2-probe-timeout make clear that keeping dist/lib/onboard.d.ts opaque is load-bearing for the existing narrowing guards, and the coordinated sweep to fix all four sibling tests is correctly scoped out of this PR. Appreciate you documenting it inline too.


✏️ Learnings added
Learnt from: cjagwani
Repo: NVIDIA/NemoClaw PR: 2627
File: test/onboard-rollback.test.ts:27-27
Timestamp: 2026-04-28T16:54:08.508Z
Learning: In the NVIDIA/NemoClaw test suite, `require()` (CommonJS) is intentionally used—rather than an ESM `import * as`—when importing from `../dist/lib/onboard` in test files (e.g., `test/onboard-rollback.test.ts`, `test/credential-rotation.test.ts`, `test/gateway-start-wait.test.ts`, `test/gemini-probe-auth.test.ts`, `test/ssh-known-hosts.test.ts`, `test/wsl2-probe-timeout.test.ts`). The reason is that `src/lib/onboard.ts` uses a bottom-of-file `module.exports = { ... }` pattern, which keeps `dist/lib/onboard.d.ts` opaque (unknown-shaped). If a named `export` is added to `onboard.ts` (as required for a typed ESM import), TS gives the module known properties and breaks the "no properties in common" narrowing guards (`isOnboardKnownHostsInternals`, `isOnboardValidationCandidate`, etc.) in sibling tests with TS2345/TS2559 errors. The `require()` convention for this module is deliberate and should not be replaced with ESM imports without a coordinated sweep of `onboard.ts` and all four neighbouring test files.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: laitingsheng
Repo: NVIDIA/NemoClaw PR: 2511
File: test/onboard-brave-validation.test.ts:45-48
Timestamp: 2026-04-27T09:53:20.358Z
Learning: In this repo’s test suite, prefer the established POSIX PATH separator `:` when constructing `process.env.PATH` in tests (e.g., `PATH: `${fakeBin}:${process.env.PATH || ""}``). Do not replace it with `path.delimiter` in these unit/integration tests, because they only run on Linux runners in CI here; `windows-latest` is limited to WSL e2e tests in `.github/workflows/wsl-e2e.yaml` and is already POSIX-compliant.

Address CodeRabbit feedback on #2627:

- src/lib/onboard.ts: move the ensureDashboardForward JSDoc (with the
  rollbackSandboxOnFailure docs) so it sits directly above the function.
  Inserting buildOrphanedSandboxRollbackMessage between the JSDoc and its
  function had stranded the doc above the wrong function.

- test/onboard-rollback.test.ts: add a comment explaining why the test
  uses require("../dist/lib/onboard") instead of named ESM import. CR
  suggested converting to ESM; tried it. Adding `export` to the helper
  in source breaks the d.ts shape that ssh-known-hosts.test.ts and
  wsl2-probe-timeout.test.ts depend on for runtime-guard type narrowing
  (TS2345/TS2559). The require() pattern is already the convention for
  this file (matches credential-rotation, gemini-probe-auth,
  ssh-known-hosts, wsl2-probe-timeout). Adopting CR's suggestion would
  require a sweep across those neighbours, which is out of scope here.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@jyaunches jyaunches self-requested a review April 28, 2026 21:51
@wscurran wscurran added bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents dependencies Pull requests that update a dependency file Platform: All Applies to all platforms supported by NemoClaw labels Apr 29, 2026
@cjagwani cjagwani self-assigned this Apr 29, 2026
@cv cv merged commit 24e4ff3 into main Apr 29, 2026
18 checks passed
@wscurran wscurran added the VDR Linked to VDR finding label Apr 29, 2026
cv pushed a commit that referenced this pull request Apr 29, 2026
)

## Summary

Adds two regression assertions to Phase 4 of `test-double-onboard.sh`
that verify the #2174 auto-allocation fix works end-to-end:

1. **Log check** — Second-sandbox onboard output contains `"is taken.
Using port"` (confirms `ensureDashboardForward` auto-allocated)
2. **List check** — `nemoclaw list` shows two distinct `dashboard:
http://127.0.0.1:<port>` entries for both sandboxes

## Related

- Guards against regression of #2174 (fix landed via #2411 + #2627)
- Supersedes #2455 (which had a stale assertion string `"allocated
port"` that doesn't match any runtime output, and conflicting workflow
changes already on main)

## Changes

- `test/e2e/test-double-onboard.sh` — 21 lines added between the
existing #849 regression check and Phase 5

## Why this supersedes #2455

| Issue | #2455 | This PR |
|-------|-------|---------|
| Assertion string | `"allocated port"` (doesn't exist in codebase) |
`"is taken. Using port"` (matches `ensureDashboardForward` line ~6600) |
| Workflow changes | Adds a `double-onboard-e2e` job that already exists
on main → merge conflict | None needed — job already runs this script |
| Branch freshness | 95 commits behind main | Based on current main |

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `bash -n test/e2e/test-double-onboard.sh` — syntax check passes
- [x] shellcheck clean
- [x] No workflow changes needed — `double-onboard-e2e` nightly job
already runs this script
- [x] No secrets, API keys, or credentials committed

## AI Disclosure
- [x] AI-assisted — tool: Claude (pi agent)

---
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added end-to-end test assertions for multi-sandbox onboarding
scenarios
* Validates dashboard port auto-allocation prevents conflicts and
maintains isolation across concurrent sandbox environments

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…VIDIA#2174 followup) (NVIDIA#2627)

## Summary

Followup to NVIDIA#2411 (merged). Closes the second half of NVIDIA#2174's title —
"leaks ghost sandbox" — that the merged fix didn't address.

When `findAvailableDashboardPort` exhausts the port range it `throw`s.
By the time `ensureDashboardForward` runs (`src/lib/onboard.ts:3830`),
the openshell sandbox has already been streamed into existence
(`streamSandboxCreate` at line 3734, ready-wait at lines 3789–3803). The
throw escapes unhandled, leaving the sandbox in `openshell sandbox list`
with no NemoClaw registry entry — exactly the drift reported in NVIDIA#2174.

Issue comment with the analysis:
NVIDIA#2174 (comment)

## Related Issue

Followup to NVIDIA#2174 (closed by NVIDIA#2411). NVIDIA#2411 fixed the crash; this PR
fixes the leak.

## Changes

- `src/lib/onboard.ts` — `ensureDashboardForward` takes a new `{
rollbackSandboxOnFailure?: boolean }` option. On unrecoverable
port-allocation failure (range exhaustion), the just-created openshell
sandbox is deleted, an actionable error is printed, and the process
exits with code 1 — mirroring the not-ready rollback pattern already in
place at lines 3789–3803.
- The post-create call site (line 3830) opts in
(`rollbackSandboxOnFailure: true`).
- The four reuse-path call sites (lines 3295, 3326, 3372, 3390) keep the
default (`false`) — they operate on already-existing sandboxes where
deleting on alloc failure would destroy user state.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npm run build:cli` + `npm run typecheck:cli` green
- [x] `npx vitest run --project cli` — 2250/2250 passed
- [x] `npx vitest run --project plugin` — 410/410 passed
- [x] `npx prek run --files src/lib/onboard.ts` — green
- [x] No new tests added: the rollback path mirrors the existing
not-ready rollback (3789–3803), which also has no dedicated unit test.
Adding test infrastructure here would exceed the minimum scope of this
followup.
- [x] No secrets, API keys, or credentials committed

## Notes for reviewer

- Diff is ~35 lines on a single file. Surgical fix matching an existing
pattern in the same function.
- Doc comment on `ensureDashboardForward` updated to describe the new
option and reference the pattern it mirrors.

## AI Disclosure

- [x] AI-assisted — tool: Claude Code

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Sandbox initialization now attempts automatic rollback if dashboard
port allocation fails, and exits safely when rollback succeeds.
* When automatic cleanup fails, users are shown clear manual-cleanup
instructions and the exact command to remove the orphaned sandbox.

* **Tests**
* Added runtime tests verifying rollback messaging covers both
successful and failed cleanup scenarios and accurate sandbox naming in
instructions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…rd (NVIDIA#2709)

## Summary

Adds two regression assertions to Phase 4 of `test-double-onboard.sh`
that verify the NVIDIA#2174 auto-allocation fix works end-to-end:

1. **Log check** — Second-sandbox onboard output contains `"is taken.
Using port"` (confirms `ensureDashboardForward` auto-allocated)
2. **List check** — `nemoclaw list` shows two distinct `dashboard:
http://127.0.0.1:<port>` entries for both sandboxes

## Related

- Guards against regression of NVIDIA#2174 (fix landed via NVIDIA#2411 + NVIDIA#2627)
- Supersedes NVIDIA#2455 (which had a stale assertion string `"allocated
port"` that doesn't match any runtime output, and conflicting workflow
changes already on main)

## Changes

- `test/e2e/test-double-onboard.sh` — 21 lines added between the
existing NVIDIA#849 regression check and Phase 5

## Why this supersedes NVIDIA#2455

| Issue | NVIDIA#2455 | This PR |
|-------|-------|---------|
| Assertion string | `"allocated port"` (doesn't exist in codebase) |
`"is taken. Using port"` (matches `ensureDashboardForward` line ~6600) |
| Workflow changes | Adds a `double-onboard-e2e` job that already exists
on main → merge conflict | None needed — job already runs this script |
| Branch freshness | 95 commits behind main | Based on current main |

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `bash -n test/e2e/test-double-onboard.sh` — syntax check passes
- [x] shellcheck clean
- [x] No workflow changes needed — `double-onboard-e2e` nightly job
already runs this script
- [x] No secrets, API keys, or credentials committed

## AI Disclosure
- [x] AI-assisted — tool: Claude (pi agent)

---
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added end-to-end test assertions for multi-sandbox onboarding
scenarios
* Validates dashboard port auto-allocation prevents conflicts and
maintains isolation across concurrent sandbox environments

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependencies Pull requests that update a dependency file NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents Platform: All Applies to all platforms supported by NemoClaw VDR Linked to VDR finding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants