Skip to content

Commit 3739258

Browse files
authored
fix(entrypoint): relax config permissions before write after CAP_DAC_OVERRIDE drop (#2659)
## Summary Fix the `test/e2e/test-runtime-overrides.sh` E2E crash at baseline config capture. After PR #917 drops `CAP_DAC_OVERRIDE` via `capsh`, root can no longer write to 444-mode config files. The three runtime override functions (`apply_model_override`, `apply_cors_override`, `apply_slack_token_override`) now temporarily relax permissions to 644 before writing and re-lock to 444 afterward, using `CAP_FOWNER` which is retained by design. ## Related Issue Fixes #2653 ## Changes - **`scripts/lib/sandbox-init.sh`**: Add `relax_config_for_write()` / `lock_config_after_write()` shared helpers that `chmod 644`/`chmod 444` with symlink guards - **`scripts/nemoclaw-start.sh`**: Wrap python3 write + hash recomputation in all three override functions with the new helpers - **`scripts/nemoclaw-start.sh`**: Tighten `apply_model_override()` trigger guard — only fire when `NEMOCLAW_MODEL_OVERRIDE` or `NEMOCLAW_INFERENCE_API_OVERRIDE` is explicitly set (Dockerfile ENV defaults no longer trigger spurious override runs) - **`test/e2e/test-runtime-overrides.sh`**: Create timestamped log file matching CI artifact glob `test-runtime-overrides-*.log`, replace all `2>/dev/null` with `2>>"$LOG_FILE"` - **`test/e2e/test-runtime-overrides.sh`**: Update tests 3-5, 9-11, 14 to pair standalone parameters with `NEMOCLAW_MODEL_OVERRIDE` (matching the tightened trigger guard) ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Overrides now only apply when explicit override flags are set, preventing unintended changes. * Safer config writes: permissions are temporarily relaxed with strict safety checks (rejecting unsafe targets, skipping missing files), clear security errors on failure, recompute verification only on successful writes, and permissions are always restored; write/hash failures are surfaced. * **Tests** * Improved test logging with timestamped artifacts and preserved stderr; runtime override tests updated to cover model-override scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
1 parent 04ea838 commit 3739258

4 files changed

Lines changed: 143 additions & 32 deletions

File tree

scripts/lib/sandbox-init.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,52 @@ validate_tmp_permissions() {
143143
return $failed
144144
}
145145

146+
# ── Config file permission helpers ────────────────────────────────
147+
# After drop_capabilities() strips CAP_DAC_OVERRIDE, root can no longer
148+
# write to files with mode 444. These helpers temporarily relax config
149+
# files to 644 for writing, then re-lock to 444 afterward.
150+
#
151+
# CAP_FOWNER is retained (by design in PR #917), so root can still chmod
152+
# files it doesn't own. The helpers include symlink guards to prevent
153+
# symlink-following attacks on the config path.
154+
#
155+
# Usage:
156+
# relax_config_for_write /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash
157+
# # ... perform writes ...
158+
# lock_config_after_write /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash
159+
#
160+
# Ref: https://github.com/NVIDIA/NemoClaw/issues/2653
161+
162+
relax_config_for_write() {
163+
local f
164+
for f in "$@"; do
165+
if [ -L "$f" ]; then
166+
printf '[SECURITY] Refusing to relax permissions — %s is a symlink\n' "$f" >&2
167+
return 1
168+
fi
169+
[ -f "$f" ] || continue
170+
if ! chmod 644 "$f"; then
171+
printf '[SECURITY] Failed to relax permissions on %s\n' "$f" >&2
172+
return 1
173+
fi
174+
done
175+
}
176+
177+
lock_config_after_write() {
178+
local f
179+
for f in "$@"; do
180+
if [ -L "$f" ]; then
181+
printf '[SECURITY] Refusing to lock permissions — %s is a symlink\n' "$f" >&2
182+
return 1
183+
fi
184+
[ -f "$f" ] || continue
185+
if ! chmod 444 "$f"; then
186+
printf '[SECURITY] Failed to lock permissions on %s\n' "$f" >&2
187+
return 1
188+
fi
189+
done
190+
}
191+
146192
# ── Capability dropping ──────────────────────────────────────────
147193
# CIS Docker Benchmark 5.3: containers should not run with default caps.
148194
# OpenShell manages the container runtime so we cannot pass --cap-drop=ALL

scripts/nemoclaw-start.sh

Lines changed: 57 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -193,12 +193,13 @@ _SANDBOX_HOME="/sandbox" # Home dir for the sandbox user (useradd -d /s
193193
# Ref: https://github.com/NVIDIA/NemoClaw/issues/759
194194

195195
apply_model_override() {
196-
# Any of these env vars trigger a config patch
196+
# Only explicit override env vars trigger a config patch. NEMOCLAW_CONTEXT_WINDOW,
197+
# NEMOCLAW_MAX_TOKENS, and NEMOCLAW_REASONING are promoted from Dockerfile build
198+
# ARGs to ENV and are always set — they should only take effect when accompanied
199+
# by an explicit model or API override. Without this guard the function runs on
200+
# every container start even with no override requested. Ref: #2653
197201
[ -n "${NEMOCLAW_MODEL_OVERRIDE:-}" ] \
198202
|| [ -n "${NEMOCLAW_INFERENCE_API_OVERRIDE:-}" ] \
199-
|| [ -n "${NEMOCLAW_CONTEXT_WINDOW:-}" ] \
200-
|| [ -n "${NEMOCLAW_MAX_TOKENS:-}" ] \
201-
|| [ -n "${NEMOCLAW_REASONING:-}" ] \
202203
|| return 0
203204

204205
# SECURITY: Only root can write to /sandbox/.openclaw (root:root 444).
@@ -272,10 +273,15 @@ apply_model_override() {
272273
[ -n "$max_tokens" ] && printf '[config] Applying max tokens override: %s\n' "$max_tokens" >&2
273274
[ -n "$reasoning" ] && printf '[config] Applying reasoning override: %s\n' "$reasoning" >&2
274275

276+
# Relax 444 → 644 so writes succeed after CAP_DAC_OVERRIDE is dropped (#2653).
277+
# Re-lock in all exit paths so files are never left at 644 on failure.
278+
relax_config_for_write "$config_file" "$hash_file"
279+
local _write_rc=0
280+
275281
NEMOCLAW_CONTEXT_WINDOW="$context_window" \
276282
NEMOCLAW_MAX_TOKENS="$max_tokens" \
277283
NEMOCLAW_REASONING="$reasoning" \
278-
python3 - "$config_file" "$model_override" "$api_override" <<'PYOVERRIDE'
284+
python3 - "$config_file" "$model_override" "$api_override" <<'PYOVERRIDE' || _write_rc=$?
279285
import json, os, sys
280286
281287
config_file, model_override, api_override = sys.argv[1], sys.argv[2], sys.argv[3]
@@ -311,9 +317,18 @@ with open(config_file, "w") as f:
311317
json.dump(cfg, f, indent=2)
312318
PYOVERRIDE
313319

314-
# Recompute config hash so integrity check passes on next startup
315-
(cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file")
316-
printf '[SECURITY] Config hash recomputed after model override\n' >&2
320+
if [ "$_write_rc" -eq 0 ]; then
321+
# Recompute config hash so integrity check passes on next startup
322+
if (cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file"); then
323+
printf '[SECURITY] Config hash recomputed after model override\n' >&2
324+
else
325+
_write_rc=$?
326+
fi
327+
fi
328+
329+
# Re-lock 644 → 444 — always runs, even on write/hash failure (#2653)
330+
lock_config_after_write "$config_file" "$hash_file"
331+
[ "$_write_rc" -eq 0 ] || return "$_write_rc"
317332
}
318333

319334
# ── Runtime CORS origin override ──────────────────────────────────
@@ -356,7 +371,12 @@ apply_cors_override() {
356371

357372
printf '[config] Adding CORS origin: %s\n' "$cors_origin" >&2
358373

359-
python3 - "$config_file" "$cors_origin" <<'PYCORS'
374+
# Relax 444 → 644 so writes succeed after CAP_DAC_OVERRIDE is dropped (#2653).
375+
# Re-lock in all exit paths so files are never left at 644 on failure.
376+
relax_config_for_write "$config_file" "$hash_file"
377+
local _write_rc=0
378+
379+
python3 - "$config_file" "$cors_origin" <<'PYCORS' || _write_rc=$?
360380
import json, sys
361381
362382
config_file, cors_origin = sys.argv[1], sys.argv[2]
@@ -373,8 +393,17 @@ with open(config_file, "w") as f:
373393
json.dump(cfg, f, indent=2)
374394
PYCORS
375395

376-
(cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file")
377-
printf '[config] Config hash recomputed after CORS override\n' >&2
396+
if [ "$_write_rc" -eq 0 ]; then
397+
if (cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file"); then
398+
printf '[config] Config hash recomputed after CORS override\n' >&2
399+
else
400+
_write_rc=$?
401+
fi
402+
fi
403+
404+
# Re-lock 644 → 444 — always runs, even on write/hash failure (#2653)
405+
lock_config_after_write "$config_file" "$hash_file"
406+
[ "$_write_rc" -eq 0 ] || return "$_write_rc"
378407
}
379408

380409
# ── Slack token placeholder resolution ────────────────────────────
@@ -431,9 +460,14 @@ apply_slack_token_override() {
431460

432461
printf '[channels] Resolving Slack token placeholders in openclaw.json\n' >&2
433462

463+
# Relax 444 → 644 so writes succeed after CAP_DAC_OVERRIDE is dropped (#2653).
464+
# Re-lock in all exit paths so files are never left at 644 on failure.
465+
relax_config_for_write "$config_file" "$hash_file"
466+
local _write_rc=0
467+
434468
SLACK_BOT_TOKEN="$SLACK_BOT_TOKEN" \
435469
SLACK_APP_TOKEN="${SLACK_APP_TOKEN:-}" \
436-
python3 - "$config_file" <<'PYSLACK'
470+
python3 - "$config_file" <<'PYSLACK' || _write_rc=$?
437471
import json, os, re, sys
438472
439473
config_file = sys.argv[1]
@@ -463,8 +497,17 @@ with open(config_file, "w") as f:
463497
f.write(content)
464498
PYSLACK
465499

466-
(cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file")
467-
printf '[channels] Config hash recomputed after Slack token override\n' >&2
500+
if [ "$_write_rc" -eq 0 ]; then
501+
if (cd /sandbox/.openclaw && sha256sum openclaw.json >"$hash_file"); then
502+
printf '[channels] Config hash recomputed after Slack token override\n' >&2
503+
else
504+
_write_rc=$?
505+
fi
506+
fi
507+
508+
# Re-lock 644 → 444 — always runs, even on write/hash failure (#2653)
509+
lock_config_after_write "$config_file" "$hash_file"
510+
[ "$_write_rc" -eq 0 ] || return "$_write_rc"
468511
}
469512

470513
# ── Slack channel guard (unhandled-rejection safety net) ─────────

test/e2e/test-runtime-overrides.sh

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,22 @@ info() { echo -e "${YELLOW}TEST${NC}: $1"; }
3636
PASSED=0
3737
FAILED=0
3838

39+
# ── Log file for CI artifact collection ──────────────────────────
40+
# Create a timestamped log file whose name matches the CI artifact glob
41+
# test-runtime-overrides-*.log so Docker stderr is captured automatically.
42+
LOG_DIR="${SCRIPT_DIR}"
43+
LOG_FILE="${LOG_DIR}/test-runtime-overrides-$(date +%Y%m%dT%H%M%S).log"
44+
: >"$LOG_FILE"
45+
info "Logging Docker stderr to: $LOG_FILE"
46+
3947
# Helper: run entrypoint with env vars, then read a config field via jq.
4048
# The entrypoint patches config and starts the gateway — we only need the
4149
# config patch, so we override CMD to just cat the config and exit.
50+
# Docker stderr is captured to the log file for CI artifact visibility.
4251
run_override() {
4352
local env_args=("$@")
4453
docker run --rm "${env_args[@]}" "$IMAGE" \
45-
bash -c 'cat /sandbox/.openclaw/openclaw.json' 2>/dev/null
54+
bash -c 'cat /sandbox/.openclaw/openclaw.json' 2>>"$LOG_FILE"
4655
}
4756

4857
# Helper: run entrypoint with env vars and capture stderr for validation messages.
@@ -53,6 +62,8 @@ run_override_stderr() {
5362
docker run --rm "${env_args[@]}" "$IMAGE" \
5463
bash -c 'true' >/dev/null 2>"$tmpfile" || true
5564
cat "$tmpfile"
65+
# Also append to the main log file for CI artifact capture
66+
cat "$tmpfile" >>"$LOG_FILE"
5667
rm -f "$tmpfile"
5768
}
5869

@@ -83,7 +94,7 @@ info "Baseline: model=$BASELINE_MODEL ctx=$BASELINE_CTX max=$BASELINE_MAX reason
8394
# ── Test 1: No-op baseline ───────────────────────────────────────
8495

8596
info "1. No overrides — config matches build-time defaults"
86-
HASH_CHECK=$(docker run --rm "$IMAGE" bash -c 'cd /sandbox/.openclaw && sha256sum -c .config-hash --status && echo OK || echo FAIL' 2>/dev/null)
97+
HASH_CHECK=$(docker run --rm "$IMAGE" bash -c 'cd /sandbox/.openclaw && sha256sum -c .config-hash --status && echo OK || echo FAIL' 2>>"$LOG_FILE")
8798
if [ "$HASH_CHECK" = "OK" ]; then
8899
pass "baseline config hash valid"
89100
else
@@ -104,17 +115,19 @@ fi
104115

105116
# Verify hash was recomputed
106117
HASH_CHECK=$(docker run --rm -e "NEMOCLAW_MODEL_OVERRIDE=$OVERRIDE_MODEL" "$IMAGE" \
107-
bash -c 'cd /sandbox/.openclaw && sha256sum -c .config-hash --status && echo OK || echo FAIL' 2>/dev/null)
118+
bash -c 'cd /sandbox/.openclaw && sha256sum -c .config-hash --status && echo OK || echo FAIL' 2>>"$LOG_FILE")
108119
if [ "$HASH_CHECK" = "OK" ]; then
109120
pass "config hash valid after model override"
110121
else
111122
fail "config hash invalid after model override"
112123
fi
113124

114125
# ── Test 3: Context window override ──────────────────────────────
126+
# NEMOCLAW_CONTEXT_WINDOW only takes effect alongside a model override
127+
# (standalone values are baked at build time). Ref: #2653 Phase 2.
115128

116-
info "3. NEMOCLAW_CONTEXT_WINDOW patches contextWindow"
117-
CFG=$(run_override -e "NEMOCLAW_CONTEXT_WINDOW=32768")
129+
info "3. NEMOCLAW_CONTEXT_WINDOW patches contextWindow (with model override)"
130+
CFG=$(run_override -e "NEMOCLAW_MODEL_OVERRIDE=$OVERRIDE_MODEL" -e "NEMOCLAW_CONTEXT_WINDOW=32768")
118131
ACTUAL=$(echo "$CFG" | jq -r '.models.providers | to_entries[0].value.models[0].contextWindow')
119132
if [ "$ACTUAL" = "32768" ]; then
120133
pass "contextWindow overridden to 32768"
@@ -124,8 +137,8 @@ fi
124137

125138
# ── Test 4: Max tokens override ──────────────────────────────────
126139

127-
info "4. NEMOCLAW_MAX_TOKENS patches maxTokens"
128-
CFG=$(run_override -e "NEMOCLAW_MAX_TOKENS=16384")
140+
info "4. NEMOCLAW_MAX_TOKENS patches maxTokens (with model override)"
141+
CFG=$(run_override -e "NEMOCLAW_MODEL_OVERRIDE=$OVERRIDE_MODEL" -e "NEMOCLAW_MAX_TOKENS=16384")
129142
ACTUAL=$(echo "$CFG" | jq -r '.models.providers | to_entries[0].value.models[0].maxTokens')
130143
if [ "$ACTUAL" = "16384" ]; then
131144
pass "maxTokens overridden to 16384"
@@ -135,8 +148,8 @@ fi
135148

136149
# ── Test 5: Reasoning override ───────────────────────────────────
137150

138-
info "5. NEMOCLAW_REASONING=true patches reasoning"
139-
CFG=$(run_override -e "NEMOCLAW_REASONING=true")
151+
info "5. NEMOCLAW_REASONING=true patches reasoning (with model override)"
152+
CFG=$(run_override -e "NEMOCLAW_MODEL_OVERRIDE=$OVERRIDE_MODEL" -e "NEMOCLAW_REASONING=true")
140153
ACTUAL=$(echo "$CFG" | jq -r '.models.providers | to_entries[0].value.models[0].reasoning')
141154
if [ "$ACTUAL" = "true" ]; then
142155
pass "reasoning overridden to true"
@@ -190,23 +203,23 @@ else
190203
fi
191204

192205
info "9. NEMOCLAW_CONTEXT_WINDOW with non-integer is rejected"
193-
STDERR=$(run_override_stderr -e "NEMOCLAW_CONTEXT_WINDOW=notanumber")
206+
STDERR=$(run_override_stderr -e "NEMOCLAW_MODEL_OVERRIDE=test" -e "NEMOCLAW_CONTEXT_WINDOW=notanumber")
194207
if echo "$STDERR" | grep -q "must be a positive integer"; then
195208
pass "non-integer context window rejected"
196209
else
197210
fail "non-integer context window was not rejected"
198211
fi
199212

200213
info "10. NEMOCLAW_MAX_TOKENS with non-integer is rejected"
201-
STDERR=$(run_override_stderr -e "NEMOCLAW_MAX_TOKENS=abc")
214+
STDERR=$(run_override_stderr -e "NEMOCLAW_MODEL_OVERRIDE=test" -e "NEMOCLAW_MAX_TOKENS=abc")
202215
if echo "$STDERR" | grep -q "must be a positive integer"; then
203216
pass "non-integer max tokens rejected"
204217
else
205218
fail "non-integer max tokens was not rejected"
206219
fi
207220

208221
info "11. NEMOCLAW_REASONING with invalid value is rejected"
209-
STDERR=$(run_override_stderr -e "NEMOCLAW_REASONING=maybe")
222+
STDERR=$(run_override_stderr -e "NEMOCLAW_MODEL_OVERRIDE=test" -e "NEMOCLAW_REASONING=maybe")
210223
if echo "$STDERR" | grep -q 'must be "true" or "false"'; then
211224
pass "invalid reasoning value rejected"
212225
else
@@ -232,7 +245,7 @@ fi
232245
# ── Test 14: Original config unchanged after rejected override ───
233246

234247
info "14. Config unchanged after rejected override"
235-
CFG=$(run_override -e "NEMOCLAW_CONTEXT_WINDOW=notanumber")
248+
CFG=$(run_override -e "NEMOCLAW_MODEL_OVERRIDE=test" -e "NEMOCLAW_CONTEXT_WINDOW=notanumber")
236249
ACTUAL_CTX=$(echo "$CFG" | jq -r '.models.providers | to_entries[0].value.models[0].contextWindow')
237250
if [ "$ACTUAL_CTX" = "$BASELINE_CTX" ]; then
238251
pass "config unchanged after rejected override"

test/nemoclaw-start.test.ts

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -401,16 +401,25 @@ describe("runtime model override (#759)", () => {
401401
expect(fn[1]).toContain('NEMOCLAW_REASONING must be "true" or "false"');
402402
});
403403

404-
it("triggers on any override env var, not just MODEL_OVERRIDE", () => {
404+
it("triggers only on explicit override env vars (MODEL_OVERRIDE or INFERENCE_API_OVERRIDE)", () => {
405405
const fn = src.match(/apply_model_override\(\) \{([\s\S]*?)^}/m);
406406
expect(fn).toBeTruthy();
407-
// The guard should check all five env vars
407+
// The guard should only check the two explicit override env vars (#2653).
408+
// NEMOCLAW_CONTEXT_WINDOW, NEMOCLAW_MAX_TOKENS, and NEMOCLAW_REASONING are
409+
// promoted from Dockerfile ARGs to ENV and always set — they should only
410+
// take effect alongside an explicit model or API override.
408411
const guard = fn[1].split("return 0")[0];
409412
expect(guard).toContain("NEMOCLAW_MODEL_OVERRIDE");
410413
expect(guard).toContain("NEMOCLAW_INFERENCE_API_OVERRIDE");
411-
expect(guard).toContain("NEMOCLAW_CONTEXT_WINDOW");
412-
expect(guard).toContain("NEMOCLAW_MAX_TOKENS");
413-
expect(guard).toContain("NEMOCLAW_REASONING");
414+
expect(guard).not.toMatch(
415+
/\[\s*-n\s*"\$\{NEMOCLAW_CONTEXT_WINDOW:-\}"/,
416+
);
417+
expect(guard).not.toMatch(
418+
/\[\s*-n\s*"\$\{NEMOCLAW_MAX_TOKENS:-\}"/,
419+
);
420+
expect(guard).not.toMatch(
421+
/\[\s*-n\s*"\$\{NEMOCLAW_REASONING:-\}"/,
422+
);
414423
});
415424
});
416425

0 commit comments

Comments
 (0)