fix(entrypoint): relax config permissions before write after CAP_DAC_OVERRIDE drop#2659
Conversation
…OVERRIDE drop (#2653) After PR #917 drops CAP_DAC_OVERRIDE via capsh, root can no longer write to 444-mode config files. The E2E test uses plain docker run (no no-new-privileges), so capsh succeeds and the capability is actually dropped — unlike production where OpenShell's no-new-privileges prevents the drop. Three entrypoint functions (apply_model_override, apply_cors_override, apply_slack_token_override) open openclaw.json for in-place writing, raising PermissionError (EACCES) under set -euo pipefail. Phase 1 — Permission fix: - Add relax_config_for_write() / lock_config_after_write() helpers in scripts/lib/sandbox-init.sh that chmod 644/444 with symlink guards - Wrap all three override functions with these helpers Phase 2 — Spurious trigger guard: - Tighten apply_model_override() entry guard to only fire when NEMOCLAW_MODEL_OVERRIDE or NEMOCLAW_INFERENCE_API_OVERRIDE is set - NEMOCLAW_CONTEXT_WINDOW, NEMOCLAW_MAX_TOKENS, NEMOCLAW_REASONING are promoted from Dockerfile ARGs to ENV and always present — they should only take effect alongside an explicit model override - Update E2E tests 3-5, 9-11, 14 to pair standalone parameters with NEMOCLAW_MODEL_OVERRIDE Phase 3 — Test error visibility: - Create timestamped log file matching CI artifact glob - Replace all 2>/dev/null with 2>>$LOG_FILE in test helpers - Append run_override_stderr output to main log file Fixes #2653 Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds two sandbox permission helper functions, uses them to safely relax and re-lock config file modes around runtime override writers, narrows model-override gating, and routes Docker/validation stderr into timestamped CI test logs. Changes
Sequence Diagram(s)sequenceDiagram
participant Starter as "nemoclaw-start.sh"
participant SandboxLib as "sandbox-init.sh helpers"
participant Writer as "inline Python writer"
participant FS as "filesystem (/sandbox/.openclaw/openclaw.json)"
participant Hasher as "SHA256 recompute"
Starter->>SandboxLib: relax_config_for_write(target file)
SandboxLib-->>Starter: success / failure (stderr [SECURITY] on violations)
alt relax succeeded
Starter->>Writer: invoke writer (apply_*_override)
Writer-->>Starter: exit status
alt writer succeeded
Starter->>Hasher: recompute SHA256 of FS file
Hasher-->>Starter: hash value / status
else writer failed
Starter-->>Starter: record non-zero write status
end
end
Starter->>SandboxLib: lock_config_after_write(target file)
SandboxLib-->>Starter: success / failure (stderr [SECURITY] on violations)
Starter-->>Caller: return captured status (0 or non-zero)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
scripts/lib/sandbox-init.sh (1)
170-170: Consider propagating chmod failures.The
chmodcommands don't check their return status. If chmod fails (e.g., filesystem error, unexpected ownership), the function silently continues. While rare, this could lead to confusing behavior where the caller believes permissions were changed but they weren't.♻️ Optional: propagate chmod failures
relax_config_for_write() { local f for f in "$@"; do if [ -L "$f" ]; then printf '[SECURITY] Refusing to relax permissions — %s is a symlink\n' "$f" >&2 return 1 fi [ -f "$f" ] || continue - chmod 644 "$f" + if ! chmod 644 "$f"; then + printf '[SECURITY] Failed to relax permissions on %s\n' "$f" >&2 + return 1 + fi done } lock_config_after_write() { local f for f in "$@"; do if [ -L "$f" ]; then printf '[SECURITY] Refusing to lock permissions — %s is a symlink\n' "$f" >&2 return 1 fi [ -f "$f" ] || continue - chmod 444 "$f" + if ! chmod 444 "$f"; then + printf '[SECURITY] Failed to lock permissions on %s\n' "$f" >&2 + return 1 + fi done }Also applies to: 182-182
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/lib/sandbox-init.sh` at line 170, The chmod invocations (e.g., the line containing chmod 644 "$f" and the other chmod calls around the same area) currently ignore failures; update the surrounding code to check chmod's exit status and propagate errors instead of continuing silently — after each chmod, if it fails emit an explanatory error to stderr via the function's logger and return a non‑zero status (or exit non‑zero if the script is intended to terminate), so callers can detect and handle permission-change failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 366-368: The relax_config_for_write call currently isn't paired
with a guaranteed re-lock; wrap the section that runs the Python override and
the subsequent hash recomputation so lock_config_after_write is always invoked
even on error by installing a trap immediately after relax_config_for_write
(similar to the pattern used around apply_model_override): set a trap to call
lock_config_after_write with the same arguments on EXIT (or ERR), run the Python
script and recompute_hash, then clear the trap once successful; apply the same
trap/clear pattern to the later relax_config_for_write block that surrounds the
388-390 operations so both blocks use lock_config_after_write as an error-safe
finally.
- Around line 276-278: After calling relax_config_for_write, ensure
lock_config_after_write is always invoked on failure or exit by installing a
trap (or explicit error handling) before running the Python script and hash
recomputation so that lock_config_after_write "$config_file" "$hash_file" runs
on ERR/EXIT; remove or disable the trap only after successful completion of the
Python script and the hash recomputation (and before the later chattr
+i/Landlock steps) to avoid leaving files at 644 if the Python script or hash
step fails; apply the same pattern around the second relax/lock pair (the
321–323 code path) so both relax_config_for_write/lock_config_after_write pairs
are guarded.
- Around line 447-449: The relax_config_for_write call that temporarily loosens
permissions for the Slack token override should be protected with a trap-based
cleanup so permissions are always re-locked even if the subsequent Python script
or hash recomputation fails: before calling relax_config_for_write (used for the
slack token override) register a trap that will call the inverse/relock routine
(undoing relax_config_for_write’s changes) on EXIT/ERR, then run the Python
script and hash recompute, and finally remove or replace the trap on successful
completion; apply the exact same trap-wrapping pattern to the other occurrence
of relax_config_for_write mentioned in the review.
---
Nitpick comments:
In `@scripts/lib/sandbox-init.sh`:
- Line 170: The chmod invocations (e.g., the line containing chmod 644 "$f" and
the other chmod calls around the same area) currently ignore failures; update
the surrounding code to check chmod's exit status and propagate errors instead
of continuing silently — after each chmod, if it fails emit an explanatory error
to stderr via the function's logger and return a non‑zero status (or exit
non‑zero if the script is intended to terminate), so callers can detect and
handle permission-change failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0684cdf1-e5dc-4f13-93b8-28f83dc966fc
📒 Files selected for processing (3)
scripts/lib/sandbox-init.shscripts/nemoclaw-start.shtest/e2e/test-runtime-overrides.sh
| # Relax 444 → 644 so writes succeed after CAP_DAC_OVERRIDE is dropped (#2653) | ||
| relax_config_for_write "$config_file" "$hash_file" | ||
|
|
There was a problem hiding this comment.
Same concern: Slack token override lacks error-safe re-locking.
The pattern repeats here. For defense-in-depth, consider the same trap-based approach to ensure permissions are re-locked even if the Python script or hash recomputation fails.
Also applies to: 484-486
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/nemoclaw-start.sh` around lines 447 - 449, The relax_config_for_write
call that temporarily loosens permissions for the Slack token override should be
protected with a trap-based cleanup so permissions are always re-locked even if
the subsequent Python script or hash recomputation fails: before calling
relax_config_for_write (used for the slack token override) register a trap that
will call the inverse/relock routine (undoing relax_config_for_write’s changes)
on EXIT/ERR, then run the Python script and hash recompute, and finally remove
or replace the trap on successful completion; apply the exact same trap-wrapping
pattern to the other occurrence of relax_config_for_write mentioned in the
review.
…ailure Address CodeRabbit review feedback: 1. relax_config_for_write() / lock_config_after_write() now check chmod exit status and return 1 with a diagnostic message on failure. 2. All three override functions (apply_model_override, apply_cors_override, apply_slack_token_override) now capture write/hash failures via || _write_rc=$? and always call lock_config_after_write before returning — files are never left at 644 on error. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 320-324: The success message is printed even when sha256sum fails
because failure is swallowed by "|| _write_rc=$?" but printf is unconditional;
change the pattern so the printf only runs on successful sha256sum. Replace the
current construct (cd ... && sha256sum ... >"$hash_file") || _write_rc=$?;
printf ... with an if/then: if (cd /sandbox/.openclaw && sha256sum openclaw.json
>"$hash_file"); then printf '[SECURITY] Config hash recomputed after model
override\n' >&2; else _write_rc=$?; fi, and apply the same fix to the analogous
sites in apply_cors_override and apply_slack_token_override so the success log
is emitted only on true success.
- Around line 196-203: Update the test titled "triggers on any override env
var..." to match the new guard semantics: assert that setting either
NEMOCLAW_MODEL_OVERRIDE or NEMOCLAW_INFERENCE_API_OVERRIDE causes the config
patch to run, and remove the expectation that NEMOCLAW_CONTEXT_WINDOW,
NEMOCLAW_MAX_TOKENS, or NEMOCLAW_REASONING alone should trigger it; add/adjust
cases to explicitly verify (a) model override triggers, (b) inference API
override triggers, and (c) the other three vars by themselves do not trigger the
guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c60aebfb-2bd2-45af-bcd7-6ddc091dbba9
📒 Files selected for processing (2)
scripts/lib/sandbox-init.shscripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/lib/sandbox-init.sh
… logs Address second round of CodeRabbit review feedback: 1. Update unit test 'triggers on any override env var' to assert the new 2-variable guard (#2653 Phase 2): only NEMOCLAW_MODEL_OVERRIDE and NEMOCLAW_INFERENCE_API_OVERRIDE trigger the function; standalone NEMOCLAW_CONTEXT_WINDOW/MAX_TOKENS/REASONING do not. 2. Move 'hash recomputed' success log inside the sha256sum success path so it is not emitted on hash failure — applies to all three override functions. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
…it (#2700) ## Summary Fixes #2698 — `runtime-overrides-e2e` test 14 crashes because invalid override validation uses `return 1`, which under `set -euo pipefail` kills the container before CMD runs. ## Root Cause PR #2659 tightened the trigger guard for `apply_model_override()` but kept `return 1` for validation failures. When `NEMOCLAW_CONTEXT_WINDOW=notanumber` is rejected, `return 1` propagates through `set -e` and exits the entrypoint. The container dies before the test can read the config. ## Fix Change non-security-critical validation failures from `return 1` to `return 0`: - Invalid API type → skip, don't crash - Non-integer context window → skip, don't crash - Non-integer max tokens → skip, don't crash - Invalid reasoning boolean → skip, don't crash - Non-http CORS origin → skip, don't crash Security-critical rejections remain `return 1`: - Symlink on config/hash path (indicates tampering) - Control characters in model override (indicates injection) - Oversized model override (indicates injection) The security warning is still logged to stderr, but the container starts normally with config unchanged. ## Verification - [x] Unit tests pass (`test/nemoclaw-start.test.ts` — assertions use `toContain` on message strings which are preserved) - [ ] `runtime-overrides-e2e` passes (to be validated on sparky) ## AI Disclosure - [x] AI-assisted — tool: Claude Code (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved startup resilience by gracefully handling invalid configuration parameters. The system now logs warnings and continues instead of aborting when misconfigured environment variables are detected. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
…OVERRIDE drop (NVIDIA#2659) ## Summary Fix the `test/e2e/test-runtime-overrides.sh` E2E crash at baseline config capture. After PR NVIDIA#917 drops `CAP_DAC_OVERRIDE` via `capsh`, root can no longer write to 444-mode config files. The three runtime override functions (`apply_model_override`, `apply_cors_override`, `apply_slack_token_override`) now temporarily relax permissions to 644 before writing and re-lock to 444 afterward, using `CAP_FOWNER` which is retained by design. ## Related Issue Fixes NVIDIA#2653 ## Changes - **`scripts/lib/sandbox-init.sh`**: Add `relax_config_for_write()` / `lock_config_after_write()` shared helpers that `chmod 644`/`chmod 444` with symlink guards - **`scripts/nemoclaw-start.sh`**: Wrap python3 write + hash recomputation in all three override functions with the new helpers - **`scripts/nemoclaw-start.sh`**: Tighten `apply_model_override()` trigger guard — only fire when `NEMOCLAW_MODEL_OVERRIDE` or `NEMOCLAW_INFERENCE_API_OVERRIDE` is explicitly set (Dockerfile ENV defaults no longer trigger spurious override runs) - **`test/e2e/test-runtime-overrides.sh`**: Create timestamped log file matching CI artifact glob `test-runtime-overrides-*.log`, replace all `2>/dev/null` with `2>>"$LOG_FILE"` - **`test/e2e/test-runtime-overrides.sh`**: Update tests 3-5, 9-11, 14 to pair standalone parameters with `NEMOCLAW_MODEL_OVERRIDE` (matching the tightened trigger guard) ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Overrides now only apply when explicit override flags are set, preventing unintended changes. * Safer config writes: permissions are temporarily relaxed with strict safety checks (rejecting unsafe targets, skipping missing files), clear security errors on failure, recompute verification only on successful writes, and permissions are always restored; write/hash failures are surfaced. * **Tests** * Improved test logging with timestamped artifacts and preserved stderr; runtime override tests updated to cover model-override scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
…it (NVIDIA#2700) ## Summary Fixes NVIDIA#2698 — `runtime-overrides-e2e` test 14 crashes because invalid override validation uses `return 1`, which under `set -euo pipefail` kills the container before CMD runs. ## Root Cause PR NVIDIA#2659 tightened the trigger guard for `apply_model_override()` but kept `return 1` for validation failures. When `NEMOCLAW_CONTEXT_WINDOW=notanumber` is rejected, `return 1` propagates through `set -e` and exits the entrypoint. The container dies before the test can read the config. ## Fix Change non-security-critical validation failures from `return 1` to `return 0`: - Invalid API type → skip, don't crash - Non-integer context window → skip, don't crash - Non-integer max tokens → skip, don't crash - Invalid reasoning boolean → skip, don't crash - Non-http CORS origin → skip, don't crash Security-critical rejections remain `return 1`: - Symlink on config/hash path (indicates tampering) - Control characters in model override (indicates injection) - Oversized model override (indicates injection) The security warning is still logged to stderr, but the container starts normally with config unchanged. ## Verification - [x] Unit tests pass (`test/nemoclaw-start.test.ts` — assertions use `toContain` on message strings which are preserved) - [ ] `runtime-overrides-e2e` passes (to be validated on sparky) ## AI Disclosure - [x] AI-assisted — tool: Claude Code (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved startup resilience by gracefully handling invalid configuration parameters. The system now logs warnings and continues instead of aborting when misconfigured environment variables are detected. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
## Summary Partial fix for #3280 — drop the two caps that are cleanly droppable today, surface residuals when the drop step is silently skipped, and rewrite the cap test so it actually exercises the regression. - Append `cap_sys_admin` and `cap_sys_ptrace` to the `capsh --drop` list in `scripts/lib/sandbox-init.sh`. Verified live: in a permissive runtime (`--cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE`), pre-drop `CapBnd=0xa82c25fb`, post-drop `CapBnd=0x1e9` — only load-bearing caps remain. - Replace the misleading "runtime already restricts capabilities" message on the `CAP_SETPCAP`-missing fallback with `report_residual_capabilities()`, which reads `CapBnd:` from `/proc/self/status` and names which of the 5 must-drop caps remain. Uses bash 64-bit arithmetic (no gawk-strtonum dependency). - Rewrite `test/e2e-gateway-isolation.sh` test 14 to inventory all 8 caps named in the issue against `CapBnd`. 5 are classified `must-drop`; 3 (`CAP_FOWNER`, `CAP_SETUID`, `CAP_SETGID`) are classified `allowed` because dropping them requires an entrypoint refactor (see below). The test container now starts with `--cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE` so the bounding set entering capsh matches the permissive runtime that triggered T6002104. Without this, docker's default bounding set already excludes those caps and the test would have been a no-op for the very regression we care about. ## Scope: why this is 5/8, not 8/8 The issue names eight caps that must be absent from the bounding set: `CAP_SYS_ADMIN`, `CAP_NET_RAW`, `CAP_NET_BIND_SERVICE`, `CAP_SYS_PTRACE`, `CAP_DAC_OVERRIDE`, `CAP_FOWNER`, `CAP_SETUID`, `CAP_SETGID`. This PR fully drops 5 of them. The remaining 3 — `CAP_FOWNER`, `CAP_SETUID`, `CAP_SETGID` — can't be dropped without an entrypoint refactor: 1. The entrypoint runs as root and uses `gosu` to step down to the sandbox/gateway user. 2. `gosu` performs the `setuid()` syscall, which requires `CAP_SETUID` in the process's permitted set. 3. Dropping `CAP_SETUID` from the bounding set causes the next `exec` to strip it from permitted — but the bounding set can only be modified with `CAP_SETPCAP`, which only root has. 4. So the drop must happen *before* gosu (which breaks gosu) or *after* gosu (which is too late — we're no longer root and can't modify the bounding set). The proper fix is to replace `gosu` with `setpriv` (`setpriv --reuid=sandbox --regid=sandbox --bounding-set=-cap_setuid,-cap_setgid,-cap_fowner -- $cmd`), which does reuid + bounding-set drop atomically in one process. `setpriv` is already present in the image, but the refactor affects 4 gosu call sites in `scripts/nemoclaw-start.sh` plus the parallel entrypoint in `agents/hermes/start.sh` (same shared library), plus the no-new-privileges special case at `scripts/nemoclaw-start.sh:1490`. That's a different shape of change than "tighten bounding set" and needs its own design and review cycle. **Residual-risk note.** The practical exploit path for these 3 caps from the sandbox user's bounding set requires (a) execing a setuid-root binary that carries those file caps, but the image ships none and (b) the user creating a new one, which is blocked by Landlock + `no-new-privs` + lack of `CAP_DAC_OVERRIDE` on root-owned paths. So the residual is a defense-in-depth gap, not a directly exploitable one. **Follow-up.** I'll file an issue titled "refactor(entrypoint): replace gosu with setpriv to drop CAP_FOWNER/SETUID/SETGID from sandbox bounding set (#3280 follow-up)" and link it from here so T6002104 can track the second half. ## Test plan - [x] **Forward case (live container).** Built `nemoclaw-isolation-test` = `nemoclaw-production` + new `sandbox-init.sh`. Ran test 14 logic with `--cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE`: all 5 must-drop caps `ABSENT`, all 3 load-bearing caps `PRESENT`, test passes. - [x] **Negative case (live container).** Rebuilt the image with `cap_sys_admin` removed from the drop list. Test correctly fails with "CAP_SYS_ADMIN still present in CapBnd after capsh drop". `CapBnd=0x2001e9` (bit 21 set) — exactly the T6002104 signature. - [x] **Synthetic bit-decode harness.** Replayed the decode loop over fabricated `CapBnd` values (ideal post-drop, single-cap regressions, full-skip case) — classifier correctly identifies each failure mode by name. - [x] `npx vitest run test/sandbox-init.test.ts` — 36/36 pass (no-capsh fall-through and `NEMOCLAW_CAPS_DROPPED=1` short-circuit unchanged). - [x] `shfmt -d -i 2 -ci -bn` clean on both touched files. - [x] `bash -n` clean. - [ ] CI to run full `e2e-gateway-isolation.sh` against a freshly-built sandbox image. ## Notes for review - Drop list scope is intentionally narrowed per maintainer guidance; further hardening (CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_BPF, CAP_PERFMON, CAP_SYS_BOOT, CAP_SYSLOG, CAP_NET_ADMIN, …) is a candidate follow-up beyond this PR's issue. - Each kept cap (cap_chown, cap_fowner, cap_setuid, cap_setgid, cap_kill) now has an inline justification with a pointer to its load-bearing call site, so a future contributor can audit them without rediscovering #797 and #2659. - `report_residual_capabilities()` is new code, but it only runs on the existing `CAP_SETPCAP`-missing fallback path — same trigger as the previous one-line warning, just with structured diagnostics. The happy path is unchanged. Closes #3280 *(partial — see scope section above; full closure depends on the setpriv follow-up)*. Signed-off-by: Dongni Yang <dongniy@nvidia.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security Improvements** * Enhanced sandbox security by dropping additional dangerous system capabilities. * Improved security diagnostics to detect and report residual dangerous capabilities in the runtime environment. * Strengthened validation testing to comprehensively verify that dangerous capabilities are properly restricted during sandbox initialization. [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3328) <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Summary
Fix the
test/e2e/test-runtime-overrides.shE2E crash at baseline config capture. After PR #917 dropsCAP_DAC_OVERRIDEviacapsh, root can no longer write to 444-mode config files. The three runtime override functions (apply_model_override,apply_cors_override,apply_slack_token_override) now temporarily relax permissions to 644 before writing and re-lock to 444 afterward, usingCAP_FOWNERwhich is retained by design.Related Issue
Fixes #2653
Changes
scripts/lib/sandbox-init.sh: Addrelax_config_for_write()/lock_config_after_write()shared helpers thatchmod 644/chmod 444with symlink guardsscripts/nemoclaw-start.sh: Wrap python3 write + hash recomputation in all three override functions with the new helpersscripts/nemoclaw-start.sh: Tightenapply_model_override()trigger guard — only fire whenNEMOCLAW_MODEL_OVERRIDEorNEMOCLAW_INFERENCE_API_OVERRIDEis explicitly set (Dockerfile ENV defaults no longer trigger spurious override runs)test/e2e/test-runtime-overrides.sh: Create timestamped log file matching CI artifact globtest-runtime-overrides-*.log, replace all2>/dev/nullwith2>>"$LOG_FILE"test/e2e/test-runtime-overrides.sh: Update tests 3-5, 9-11, 14 to pair standalone parameters withNEMOCLAW_MODEL_OVERRIDE(matching the tightened trigger guard)Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)AI Disclosure
Signed-off-by: Julie Yaunches jyaunches@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests