fix: restart ollama proxy when token diverges from persisted token#2555
fix: restart ollama proxy when token diverges from persisted token#2555adrian-santos wants to merge 1 commit into
Conversation
ensureOllamaAuthProxy() previously assumed that if the proxy PID was alive, the running token matched the persisted token. This assumption breaks when multiple nemoclaw onboard runs occur without a proxy restart in between: each full onboard regenerates the token and writes it to ~/.nemoclaw/ollama-proxy-token, but if killStaleProxy() fails to kill the original process (e.g. stale PID file, lsof race), the proxy keeps running with its original token while the file and provider credential are updated to the new token. The next sandbox rebuild wires in the new token, but every inference call returns HTTP 401 Unauthorized because the proxy rejects it. Fix: add isProxyTokenValid() which probes /v1/models with the stored token (an authenticated endpoint, unlike /api/tags which is exempt). If the probe fails, ensureOllamaAuthProxy() falls through to kill and restart the proxy with the persisted token. No behavior change in the happy path (first run or reboot recovery where tokens are already in sync). Fixes NVIDIA#2553
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard-ollama-proxy.ts (1)
204-208:⚠️ Potential issue | 🟠 MajorVerify recovery success after restart before returning.
After Line 206, the function sleeps and returns without confirming the new proxy is actually healthy. If kill/spawn fails (e.g., lingering port owner), callers continue assuming recovery succeeded.
Suggested fix
- spawnOllamaAuthProxy(token); + const restartedPid = spawnOllamaAuthProxy(token); sleep(1); + if (!isOllamaProxyProcess(restartedPid)) { + console.error(` Error: Ollama auth proxy recovery failed on :${OLLAMA_PROXY_PORT}`); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-ollama-proxy.ts` around lines 204 - 208, The code calls killStaleProxy(), sets ollamaProxyToken, calls spawnOllamaAuthProxy(token), then sleeps(1) and returns without verifying the proxy is healthy; change this to wait for and verify recovery success by adding a health-check loop after spawnOllamaAuthProxy(token) that polls the new proxy (e.g., an HTTP /health or connection attempt) with a short retry interval and timeout, only set ollamaProxyToken and return after the probe succeeds; if the probe fails within the timeout, log the failure and throw or return an error so callers don't assume recovery succeeded (update callers if necessary to handle the error).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 102-109: isProxyTokenValid currently treats any non-zero curl exit
as token invalid; change it to return the HTTP status code (number) on a probe
success and null on transient/errors: call curl without -f and use --write-out
"%{http_code}" (capture stdout/stderr and exit code), return parsed numeric HTTP
status when curl succeeds, return null for timeouts/other non-probe failures;
then update the caller that currently does a boolean check (the usage around
line 198) to treat status === 401 or === 403 as token rejection (force restart)
and treat null or any other status as indeterminate (do not force restart).
---
Outside diff comments:
In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 204-208: The code calls killStaleProxy(), sets ollamaProxyToken,
calls spawnOllamaAuthProxy(token), then sleeps(1) and returns without verifying
the proxy is healthy; change this to wait for and verify recovery success by
adding a health-check loop after spawnOllamaAuthProxy(token) that polls the new
proxy (e.g., an HTTP /health or connection attempt) with a short retry interval
and timeout, only set ollamaProxyToken and return after the probe succeeds; if
the probe fails within the timeout, log the failure and throw or return an error
so callers don't assume recovery succeeded (update callers if necessary to
handle the error).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 78319d14-2ca3-4840-8640-e5577c62a1cd
📒 Files selected for processing (1)
src/lib/onboard-ollama-proxy.ts
| function isProxyTokenValid(token: string): boolean { | ||
| const result = spawnSync( | ||
| "curl", | ||
| ["-sf", "--max-time", "3", "-H", `Authorization: Bearer ${token}`, `http://localhost:${OLLAMA_PROXY_PORT}/v1/models`], | ||
| { encoding: "utf8" }, | ||
| ); | ||
| return result.status === 0; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /repo && find . -name "onboard-ollama-proxy.ts" -type fRepository: NVIDIA/NemoClaw
Length of output: 115
🏁 Script executed:
cd /repo && cat -n src/lib/onboard-ollama-proxy.ts | head -120Repository: NVIDIA/NemoClaw
Length of output: 115
🏁 Script executed:
cd /repo && cat -n src/lib/onboard-ollama-proxy.ts | sed -n '190,210p'Repository: NVIDIA/NemoClaw
Length of output: 115
🏁 Script executed:
git ls-files | grep -E "onboard.*proxy" | head -20Repository: NVIDIA/NemoClaw
Length of output: 91
🏁 Script executed:
find . -name "onboard-ollama-proxy.ts" -type f 2>/dev/nullRepository: NVIDIA/NemoClaw
Length of output: 93
🏁 Script executed:
cat -n src/lib/onboard-ollama-proxy.ts | sed -n '1,15p'Repository: NVIDIA/NemoClaw
Length of output: 746
🏁 Script executed:
cat -n src/lib/onboard-ollama-proxy.ts | sed -n '100,115p'Repository: NVIDIA/NemoClaw
Length of output: 833
🏁 Script executed:
cat -n src/lib/onboard-ollama-proxy.ts | sed -n '190,210p'Repository: NVIDIA/NemoClaw
Length of output: 818
🌐 Web query:
curl -sf flag exit code behavior 401 403 5xx timeout
💡 Result:
With curl -sf (equivalent to --silent --fail), the exit code behavior for various scenarios is as follows: - HTTP 401 (Unauthorized) or 403 (Forbidden): Exits with code 22. The -f/--fail flag causes curl to fail silently (no output) and exit with error 22 for any HTTP response code >=400. Note that this is not always fail-safe, especially with authentication involved (e.g., 401/407 may sometimes slip through). - HTTP 5xx (server errors, e.g., 500): Exits with code 22, same as above for >=400. - Timeout: Exits with code 28 (CURLE_OPERATION_TIMEDOUT). This occurs if the operation exceeds the maximum allowed time (connect, transfer, etc.). Use --max-time or --connect-timeout to adjust. Without -f, curl exits with 0 even for HTTP errors >=400, as it considers the transfer successful if data was received. -s suppresses progress/output but does not affect exit codes directly.
Citations:
- 1: https://everything.curl.dev/cmdline/exitcode.html
- 2: http://curl.se/docs/manpage.html
- 3: https://www.manpagez.com/man/1/curl/curl-7.77.0.php
- 4: https://superuser.com/questions/590099/can-i-make-curl-fail-with-an-exitcode-different-than-0-if-the-http-status-code-i
- 5: https://everything.curl.dev/http/response.html
- 6: https://curl2code.com/en/guides/curl-error-codes
- 7: https://curl.se/docs/manpage.html
🌐 Web query:
curl --max-time timeout exit code semantics
💡 Result:
When using curl with --max-time (or -m), if the maximum time is exceeded, curl exits with exit code 28, corresponding to CURLE_OPERATION_TIMEDOUT. This applies to the overall operation timeout, including connection, transfer, etc. The error message typically reads "Operation timed out after X milliseconds" (distinguishing it from connect-timeout, which says "Connection timed out" or "Resolving timed out", but both use code 28). curl returns 0 on success. Non-zero exit codes indicate errors, with 28 specifically for timeouts triggered by --max-time. Official documentation confirms: "--max-time makes curl exit with a timeout error code (28)" and lists CURLE_OPERATION_TIMEDOUT (28) for operation timeouts.
Citations:
- 1: https://everything.curl.dev/usingcurl/timeouts.html
- 2: https://everything.curl.dev/cmdline/exitcode.html
- 3: https://curl.se/libcurl/c/libcurl-errors.html
Differentiate auth rejection from transient probe failures before forcing restart.
Line 108 treats any non-zero curl result as token divergence. With -f, the exit code 22 is returned for both auth rejection (401/403) and server errors (5xx), while timeouts exit with code 28. This means a valid proxy experiencing transient failures will be unnecessarily killed and restarted. Only explicitly reject the token on 401/403 responses; treat other failures as indeterminate.
Suggested fix
-function isProxyTokenValid(token: string): boolean {
- const result = spawnSync(
- "curl",
- ["-sf", "--max-time", "3", "-H", `Authorization: Bearer ${token}`, `http://localhost:${OLLAMA_PROXY_PORT}/v1/models`],
- { encoding: "utf8" },
- );
- return result.status === 0;
+function getProxyTokenValidationStatus(token: string): "valid" | "rejected" | "unknown" {
+ const result = spawnSync(
+ "curl",
+ [
+ "-sS",
+ "-o",
+ "/dev/null",
+ "-w",
+ "%{http_code}",
+ "--max-time",
+ "3",
+ "-H",
+ `Authorization: Bearer ${token}`,
+ `http://localhost:${OLLAMA_PROXY_PORT}/v1/models`,
+ ],
+ { encoding: "utf8" },
+ );
+ const code = String(result.stdout || "").trim();
+ if (code === "200") return "valid";
+ if (code === "401" || code === "403") return "rejected";
+ return "unknown";
}Also update line 198 to use the returned status instead of a boolean check.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function isProxyTokenValid(token: string): boolean { | |
| const result = spawnSync( | |
| "curl", | |
| ["-sf", "--max-time", "3", "-H", `Authorization: Bearer ${token}`, `http://localhost:${OLLAMA_PROXY_PORT}/v1/models`], | |
| { encoding: "utf8" }, | |
| ); | |
| return result.status === 0; | |
| } | |
| function getProxyTokenValidationStatus(token: string): "valid" | "rejected" | "unknown" { | |
| const result = spawnSync( | |
| "curl", | |
| [ | |
| "-sS", | |
| "-o", | |
| "/dev/null", | |
| "-w", | |
| "%{http_code}", | |
| "--max-time", | |
| "3", | |
| "-H", | |
| `Authorization: Bearer ${token}`, | |
| `http://localhost:${OLLAMA_PROXY_PORT}/v1/models`, | |
| ], | |
| { encoding: "utf8" }, | |
| ); | |
| const code = String(result.stdout || "").trim(); | |
| if (code === "200") return "valid"; | |
| if (code === "401" || code === "403") return "rejected"; | |
| return "unknown"; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/onboard-ollama-proxy.ts` around lines 102 - 109, isProxyTokenValid
currently treats any non-zero curl exit as token invalid; change it to return
the HTTP status code (number) on a probe success and null on transient/errors:
call curl without -f and use --write-out "%{http_code}" (capture stdout/stderr
and exit code), return parsed numeric HTTP status when curl succeeds, return
null for timeouts/other non-probe failures; then update the caller that
currently does a boolean check (the usage around line 198) to treat status ===
401 or === 403 as token rejection (force restart) and treat null or any other
status as indeterminate (do not force restart).
…ers (#2606) (#2617) ## Summary Add `test-gpu-double-onboard.sh` that reproduces the exact scenario from #2553: re-onboard with `NEMOCLAW_PROVIDER=ollama` must not leave the proxy running with a different token than what's persisted to disk. ### What the test covers The exact reproduction scenario from #2553: 1. Full non-interactive onboard with `NEMOCLAW_PROVIDER=ollama` 2. Verify proxy running, token persisted, inference works 3. Re-onboard (second onboard, same sandbox with `NEMOCLAW_RECREATE_SANDBOX=1`) 4. **Token consistency**: file token matches what the running proxy accepts (not 401) 5. Inference through sandbox still works after re-onboard ### NVKS runner migration Both `gpu-e2e` (existing) and `gpu-double-onboard-e2e` (new) are moved from the never-enabled `self-hosted` runner to NVKS ephemeral GPU runners (`linux-amd64-gpu-l40g-latest-1` — L40G, 48 GB VRAM): - **Removes `GPU_E2E_ENABLED` gate** — the variable was never set, so `gpu-e2e` has been silently skipped every nightly since it was added on March 25 - **Eliminates `needs: gpu-e2e` serialization** — separate ephemeral VMs, no shared state - **Drops timeout from 60 to 30 min** — fresh VM each run, no stale state cleanup, auto-selected model is `qwen2.5:7b` (4.4 GB pull) - **Enables GPU E2E coverage in nightly for the first time** ### Testing The nightly workflow only triggers on `schedule` and `workflow_dispatch`. To validate before merge, run the workflow manually from this branch. Closes #2606 ### Related - #2553 — The bug (ollama proxy token divergence) - #2555 — The fix (restart proxy when token diverges) - #839 — Original gpu-e2e addition (never enabled) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * GPU nightly E2E workflow now runs unconditionally on ephemeral GPU runners with a shorter timeout and improved failure reporting that surfaces the new double-onboard job. * **Tests** * Added a comprehensive GPU E2E that verifies double-onboard behavior, proxy token persistence and acceptance/rejection, end-to-end inference, and sandbox cleanup. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
✨ Thanks for submitting this PR that fixes a bug where the ollama proxy token diverges from the persisted token after re-onboarding, causing persistent HTTP 401 errors on inference. Related open issues: |
GPU E2E Results ✅Triggered nightly E2E
The |
jyaunches
left a comment
There was a problem hiding this comment.
Review
Nice fix — the approach is clean and the GPU E2E confirms it works on real hardware (gpu-double-onboard-e2e passed on NVKS L40G). Three things need addressing before merge:
🔴 1. Broken unit test
test/ollama-proxy-recovery.test.ts → "keeps the existing proxy when the recorded pid still points to the auth proxy" is failing on both checks and macos-e2e. The test doesn't mock the new spawnSync("curl", ...) call, so isProxyTokenValid() always returns false (no real proxy running), and the proxy is always restarted.
Fix: mock child_process.spawnSync in the injected test script to intercept curl calls:
const realSpawnSync = childProcess.spawnSync;
childProcess.spawnSync = (cmd, args, opts) => {
if (cmd === "curl") return { status: 0 }; // token is valid
if (cmd === "sleep") return { status: 0 };
return realSpawnSync(cmd, args, opts);
};🔴 2. Missing test for the token-divergence scenario
The core fix — detecting a running proxy with a mismatched token — has no unit test. Add a third test case: mock ps to report proxy alive, mock spawnSync("curl") to return { status: 1 } (token mismatch), assert proxy is killed and restarted with the persisted token.
🔴 3. Missing DCO sign-off
dco-check is failing. Add to the PR description:
Signed-off-by: Adrian Santos <your-email@example.com>
…ers (NVIDIA#2606) (NVIDIA#2617) ## Summary Add `test-gpu-double-onboard.sh` that reproduces the exact scenario from NVIDIA#2553: re-onboard with `NEMOCLAW_PROVIDER=ollama` must not leave the proxy running with a different token than what's persisted to disk. ### What the test covers The exact reproduction scenario from NVIDIA#2553: 1. Full non-interactive onboard with `NEMOCLAW_PROVIDER=ollama` 2. Verify proxy running, token persisted, inference works 3. Re-onboard (second onboard, same sandbox with `NEMOCLAW_RECREATE_SANDBOX=1`) 4. **Token consistency**: file token matches what the running proxy accepts (not 401) 5. Inference through sandbox still works after re-onboard ### NVKS runner migration Both `gpu-e2e` (existing) and `gpu-double-onboard-e2e` (new) are moved from the never-enabled `self-hosted` runner to NVKS ephemeral GPU runners (`linux-amd64-gpu-l40g-latest-1` — L40G, 48 GB VRAM): - **Removes `GPU_E2E_ENABLED` gate** — the variable was never set, so `gpu-e2e` has been silently skipped every nightly since it was added on March 25 - **Eliminates `needs: gpu-e2e` serialization** — separate ephemeral VMs, no shared state - **Drops timeout from 60 to 30 min** — fresh VM each run, no stale state cleanup, auto-selected model is `qwen2.5:7b` (4.4 GB pull) - **Enables GPU E2E coverage in nightly for the first time** ### Testing The nightly workflow only triggers on `schedule` and `workflow_dispatch`. To validate before merge, run the workflow manually from this branch. Closes NVIDIA#2606 ### Related - NVIDIA#2553 — The bug (ollama proxy token divergence) - NVIDIA#2555 — The fix (restart proxy when token diverges) - NVIDIA#839 — Original gpu-e2e addition (never enabled) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * GPU nightly E2E workflow now runs unconditionally on ephemeral GPU runners with a shorter timeout and improved failure reporting that surfaces the new double-onboard job. * **Tests** * Added a comprehensive GPU E2E that verifies double-onboard behavior, proxy token persistence and acceptance/rejection, end-to-end inference, and sandbox cleanup. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
Hey @adrian-santos — thanks for digging into this and putting up the fix! Just a heads-up: while this was in review, the same issue (#2553) was fixed by #2642 (landed on main April 29). That PR includes a similar approach (token probe before trusting the persisted file) plus an E2E regression test. Since main already has this covered, I'm going to close this PR. Really appreciate you taking the time — your analysis of the root cause was spot-on and helped validate the final fix. 🙏 |
Problem
Closes #2553.
ensureOllamaAuthProxy()checks whether the PID fromollama-auth-proxy.pidbelongs to a running proxy process, and if so, loads the persisted token and returns — without verifying the running process was actually started with that token.The two diverge when multiple
nemoclaw onboardruns occur in sequence: each full onboard callsstartOllamaAuthProxy(), which generates a new token and writes it to~/.nemoclaw/ollama-proxy-token. IfkillStaleProxy()fails to kill the original proxy (e.g. stale PID file,lsofrace on Linux), the proxy keeps running with its original token while the file and provider credential are updated. The nextnemoclaw rebuildwires the new token into the sandbox, but the proxy rejects it — every inference call returnsHTTP 401 Unauthorized.This is silent:
nemoclaw statusshows the sandbox as healthy, the Telegram bridge receives messages, but every bot response is an auth error.Fix
Add
isProxyTokenValid()which probes/v1/modelswith the stored token (an authenticated endpoint — unlike/api/tagswhich is explicitly exempt from auth inollama-auth-proxy.js). InensureOllamaAuthProxy(), if the probe fails, fall through tokillStaleProxy()and restart with the persisted token.Behavior
curlcall to/v1/modelsthat returns immediately. No other behavior change.Test
Manually verified on NVIDIA DGX Spark (GB10, ARM64, Ubuntu) with NemoClaw v0.0.25 / OpenClaw v2026.4.9 / nemotron-3-super:120b via Ollama. Before this fix, running
nemoclaw onboardtwice reliably reproduced the 401. After,ensureOllamaAuthProxy()detects the mismatch and self-heals.Summary by CodeRabbit