feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1)#2577
feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1)#2577gburachas wants to merge 2 commits into
Conversation
Add OllamaTuning interface to nemoclaw/src/onboard/config.ts, extend NemoClawOnboardConfig with an optional ollamaTuning field, and validate all fields in isOnboardConfig: vramPercent ∈ [1,100]; kvCacheType enum; numGpuLayers ≥ −1; numCtx > 0; numBatch > 0; appliedAt required. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…PR 1/5) Ollama VRAM control — CLI scaffold: - src/lib/ollama-cmd.ts: parseOllamaArgs (all subcommands + combined form vram=N% ctx=N kv-cache=... flash=on|off) and stub runOllamaCommand (prints "would apply: <parsed>", exits 0; no real apply yet) - src/nemoclaw.ts: add case "ollama": dispatch after the gc case - src/lib/command-registry.ts: register 9 nemoclaw ollama subcommands (group: Sandbox Management, scope: global); update count assertions - nemoclaw/src/commands/slash.ts: case "ollama" → slashOllamaStatus (read-only; no daemon restart in slash context) Tests: - test/cli/ollama-cmd.test.ts: parse tests for every subcommand, the combined form, and malformed-input rejection - nemoclaw/src/commands/slash.test.ts: routing test for the new case Test infra fix (WSL2 DNS hang, unrelated to Ollama): - Fake openshell scripts in sandbox-stuck-recovery, sandbox-connect- inference, and reboot-identity-drift tests did not handle "sandbox ssh-config", falling through to exit 0 with empty output. This caused the CLI to call real ssh with an empty config, triggering a multi-second DNS lookup for a fake hostname that timed out the spawnSync calls (status: null). Added exit-1 handler for that command so executeSandboxCommand short-circuits without touching ssh. Fixes 13 tests; remaining 14 failures are pre-existing regressions tracked in #2042 / #2562. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Ollama GPU/memory tuning system comprising a new onboard configuration model, CLI command suite, slash command support, and argument parser. Eight global CLI commands manage VRAM, context, batch settings, and orchestration, while tests validate configuration constraints and command parsing across multiple scopes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
nemoclaw/src/commands/slash.ts (1)
80-90: Consider distinguishing “not configured” from “non-Ollama endpoint.”Right now this path can imply Ollama defaults even when current onboarding is a different endpoint type. A small branch for
config.endpointType !== "ollama"would avoid user confusion.Suggested refinement
function slashOllamaStatus(): PluginCommandResult { const config = loadOnboardConfig(); const tuning = config?.ollamaTuning; + if (config && config.endpointType !== "ollama") { + return { + text: [ + "**Ollama VRAM Tuning**", + "", + `Current endpoint is '${config.endpointType}', so Ollama tuning is inactive.`, + "Run `nemoclaw onboard` and select Ollama to enable local tuning controls.", + ].join("\n"), + }; + } + if (!tuning) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw/src/commands/slash.ts` around lines 80 - 90, The current early-return that assumes Ollama defaults when !tuning should first check the onboarding/agent configuration type; update the branch that checks tuning (the code using tuning and config.endpointType) to distinguish non-Ollama endpoints by adding a guard if config.endpointType !== "ollama" and return a clear message like "This endpoint is not Ollama — tuning not applicable" (instead of the Ollama-specific message); otherwise keep the existing Ollama-default message and the prompts to run `nemoclaw ollama optimize` / `status`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemoclaw/src/onboard/config.ts`:
- Around line 85-96: The numeric validations for the tuning knobs currently
allow fractional values; update the checks in the validation block that handles
t.vramPercent, t.numGpuLayers, t.numCtx, and t.numBatch to require integers (use
Number.isInteger or equivalent) in addition to the existing range checks: ensure
t.vramPercent is an integer between 1 and 100, t.numGpuLayers is an integer and
either -1 or >=0 as appropriate, and t.numCtx and t.numBatch are integers > 0;
keep the existing typeof/range logic but add the integer test and return false
when it fails.
In `@src/lib/ollama-cmd.ts`:
- Around line 116-146: The parser currently allows mixed or unsupported combos
(e.g. subcommand plus assignments or stray flags) and then runOllamaCommand
silently drops extras; after the loop that builds result (using
result.subcommand and parseAssignment), add a final validation step that
inspects result and rejects invalid shapes by throwing OllamaArgParseError:
allow only (a) subcommand-only (result.subcommand set and no assignments/flags),
(b) assignment-only (no result.subcommand but at least one assignment/flag), or
(c) the special optimize mode (result.subcommand === "optimize") with its
allowed optimize-specific flags; reject combinations like subcommand +
assignments, multiple subcommands, or stray bare flags (e.g. --ctx alone) and
surface clear error messages referencing the offending keys so runOllamaCommand
never receives ambiguous input.
- Around line 222-235: formatParsed currently omits keys whose values are null,
making explicit "clear" settings indistinguishable from nothing; update
formatParsed to always include each relevant key (vramPercent, numCtx, numBatch,
flashAttention, kvCacheType) and render null values explicitly (e.g., show
"null" or "off" text) instead of skipping them so commands like ctx=off or
vram=off appear in the preview; adjust the kvCacheType formatting to still quote
non-null strings but emit a visible token for null as well; apply the same
visible-null behavior to the analogous rendering site referenced around the
other occurrence.
- Around line 95-111: The CLI numeric parsing is too permissive (uses parseInt),
so update the --ctx, --vram and the same batch/related branches to validate the
full token with a strict regex before parsing; e.g., ensure the value string
matches /^\d+$/ for plain integers (--ctx, batch) and /^\d+%?$/ for
optional-percent values (--vram), then extract the digits and convert to Number;
throw OllamaArgParseError if the token contains any non-numeric characters or
multiple percent signs. Use the existing symbols (result.optimizeCtx,
result.optimizeVram and the same parsing block that handles batch/batchSize
around lines 156-191) to assign the validated numeric value after this stricter
check.
---
Nitpick comments:
In `@nemoclaw/src/commands/slash.ts`:
- Around line 80-90: The current early-return that assumes Ollama defaults when
!tuning should first check the onboarding/agent configuration type; update the
branch that checks tuning (the code using tuning and config.endpointType) to
distinguish non-Ollama endpoints by adding a guard if config.endpointType !==
"ollama" and return a clear message like "This endpoint is not Ollama — tuning
not applicable" (instead of the Ollama-specific message); otherwise keep the
existing Ollama-default message and the prompts to run `nemoclaw ollama
optimize` / `status`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4ceee886-c3dc-4bcb-9c74-fbc3f4a2dd40
📒 Files selected for processing (12)
nemoclaw/src/commands/slash.test.tsnemoclaw/src/commands/slash.tsnemoclaw/src/onboard/config.test.tsnemoclaw/src/onboard/config.tssrc/lib/command-registry.test.tssrc/lib/command-registry.tssrc/lib/ollama-cmd.tssrc/nemoclaw.tstest/cli/ollama-cmd.test.tstest/reboot-identity-drift.test.tstest/sandbox-connect-inference.test.tstest/sandbox-stuck-recovery.test.ts
| if (t.vramPercent !== undefined) { | ||
| if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false; | ||
| } | ||
| if (t.numGpuLayers !== undefined) { | ||
| if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false; | ||
| } | ||
| if (t.numCtx !== undefined) { | ||
| if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false; | ||
| } | ||
| if (t.numBatch !== undefined) { | ||
| if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false; | ||
| } |
There was a problem hiding this comment.
Tighten numeric validation to integers for tuning knobs.
vramPercent, numGpuLayers, numCtx, and numBatch currently accept fractional values. That can admit invalid persisted config states for daemon/env settings that are integer-based.
Suggested fix
if (t.vramPercent !== undefined) {
- if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false;
+ if (
+ typeof t.vramPercent !== "number" ||
+ !Number.isInteger(t.vramPercent) ||
+ t.vramPercent < 1 ||
+ t.vramPercent > 100
+ )
+ return false;
}
if (t.numGpuLayers !== undefined) {
- if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false;
+ if (
+ typeof t.numGpuLayers !== "number" ||
+ !Number.isInteger(t.numGpuLayers) ||
+ t.numGpuLayers < -1
+ )
+ return false;
}
if (t.numCtx !== undefined) {
- if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false;
+ if (typeof t.numCtx !== "number" || !Number.isInteger(t.numCtx) || t.numCtx <= 0) return false;
}
if (t.numBatch !== undefined) {
- if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false;
+ if (
+ typeof t.numBatch !== "number" ||
+ !Number.isInteger(t.numBatch) ||
+ t.numBatch <= 0
+ )
+ return false;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (t.vramPercent !== undefined) { | |
| if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false; | |
| } | |
| if (t.numGpuLayers !== undefined) { | |
| if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false; | |
| } | |
| if (t.numCtx !== undefined) { | |
| if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false; | |
| } | |
| if (t.numBatch !== undefined) { | |
| if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false; | |
| } | |
| if (t.vramPercent !== undefined) { | |
| if ( | |
| typeof t.vramPercent !== "number" || | |
| !Number.isInteger(t.vramPercent) || | |
| t.vramPercent < 1 || | |
| t.vramPercent > 100 | |
| ) | |
| return false; | |
| } | |
| if (t.numGpuLayers !== undefined) { | |
| if ( | |
| typeof t.numGpuLayers !== "number" || | |
| !Number.isInteger(t.numGpuLayers) || | |
| t.numGpuLayers < -1 | |
| ) | |
| return false; | |
| } | |
| if (t.numCtx !== undefined) { | |
| if (typeof t.numCtx !== "number" || !Number.isInteger(t.numCtx) || t.numCtx <= 0) return false; | |
| } | |
| if (t.numBatch !== undefined) { | |
| if ( | |
| typeof t.numBatch !== "number" || | |
| !Number.isInteger(t.numBatch) || | |
| t.numBatch <= 0 | |
| ) | |
| return false; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemoclaw/src/onboard/config.ts` around lines 85 - 96, The numeric validations
for the tuning knobs currently allow fractional values; update the checks in the
validation block that handles t.vramPercent, t.numGpuLayers, t.numCtx, and
t.numBatch to require integers (use Number.isInteger or equivalent) in addition
to the existing range checks: ensure t.vramPercent is an integer between 1 and
100, t.numGpuLayers is an integer and either -1 or >=0 as appropriate, and
t.numCtx and t.numBatch are integers > 0; keep the existing typeof/range logic
but add the integer test and return false when it fails.
| if (arg === "--ctx" && i + 1 < remaining.length) { | ||
| const val = parseInt(remaining[i + 1], 10); | ||
| if (isNaN(val) || val <= 0) { | ||
| throw new OllamaArgParseError(`Invalid --ctx value: ${remaining[i + 1]}`); | ||
| } | ||
| result.optimizeCtx = val; | ||
| i += 2; | ||
| continue; | ||
| } | ||
|
|
||
| if (arg === "--vram" && i + 1 < remaining.length) { | ||
| const raw = remaining[i + 1].replace(/%$/, ""); | ||
| const val = parseInt(raw, 10); | ||
| if (isNaN(val) || val < 1 || val > 100) { | ||
| throw new OllamaArgParseError(`Invalid --vram value: ${remaining[i + 1]}`); | ||
| } | ||
| result.optimizeVram = val; |
There was a problem hiding this comment.
Use strict numeric parsing for CLI values.
These branches currently accept malformed inputs because parseInt stops at the first non-numeric character. For example, --ctx 32768foo, batch=32x, or vram=80%% will parse as valid values instead of failing. That can silently apply the wrong tuning.
Suggested fix
+function parseStrictPositiveInt(raw: string, field: string): number {
+ if (!/^\d+$/.test(raw)) {
+ throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+ }
+ const value = Number(raw);
+ if (!Number.isSafeInteger(value) || value <= 0) {
+ throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+ }
+ return value;
+}
+
+function parseStrictPercent(raw: string, field: string): number {
+ const normalized = raw.replace(/%$/, "");
+ if (!/^\d+$/.test(normalized)) {
+ throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+ }
+ const value = Number(normalized);
+ if (!Number.isSafeInteger(value) || value < 1 || value > 100) {
+ throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+ }
+ return value;
+}Also applies to: 156-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/ollama-cmd.ts` around lines 95 - 111, The CLI numeric parsing is too
permissive (uses parseInt), so update the --ctx, --vram and the same
batch/related branches to validate the full token with a strict regex before
parsing; e.g., ensure the value string matches /^\d+$/ for plain integers
(--ctx, batch) and /^\d+%?$/ for optional-percent values (--vram), then extract
the digits and convert to Number; throw OllamaArgParseError if the token
contains any non-numeric characters or multiple percent signs. Use the existing
symbols (result.optimizeCtx, result.optimizeVram and the same parsing block that
handles batch/batchSize around lines 156-191) to assign the validated numeric
value after this stricter check.
| // Named subcommands | ||
| if ( | ||
| arg === "status" || | ||
| arg === "optimize" || | ||
| arg === "apply" || | ||
| arg === "reset" | ||
| ) { | ||
| if (result.subcommand !== undefined) { | ||
| throw new OllamaArgParseError( | ||
| `Unexpected subcommand "${arg}" — already have "${result.subcommand}"`, | ||
| ); | ||
| } | ||
| result.subcommand = arg; | ||
| i++; | ||
| continue; | ||
| } | ||
|
|
||
| // Assignment-style args: key=value | ||
| const eqIdx = arg.indexOf("="); | ||
| if (eqIdx !== -1) { | ||
| const key = arg.slice(0, eqIdx); | ||
| const value = arg.slice(eqIdx + 1); | ||
| parseAssignment(key, value, result); | ||
| i++; | ||
| continue; | ||
| } | ||
|
|
||
| throw new OllamaArgParseError(`Unknown argument: ${arg}`); | ||
| } | ||
|
|
||
| return result; |
There was a problem hiding this comment.
Reject mixed/unsupported argument combinations instead of silently ignoring them.
Right now combinations such as status vram=80%, reset --apply, or even bare --ctx 32768 parse successfully, but runOllamaCommand only prints the subcommand or assignment preview and drops the rest. That makes invalid invocations look accepted.
Please validate the final shape after parsing and fail fast unless the args match one supported mode, e.g. assignment-only, subcommand-only, or optimize with its optimize-specific flags.
Also applies to: 255-265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/ollama-cmd.ts` around lines 116 - 146, The parser currently allows
mixed or unsupported combos (e.g. subcommand plus assignments or stray flags)
and then runOllamaCommand silently drops extras; after the loop that builds
result (using result.subcommand and parseAssignment), add a final validation
step that inspects result and rejects invalid shapes by throwing
OllamaArgParseError: allow only (a) subcommand-only (result.subcommand set and
no assignments/flags), (b) assignment-only (no result.subcommand but at least
one assignment/flag), or (c) the special optimize mode (result.subcommand ===
"optimize") with its allowed optimize-specific flags; reject combinations like
subcommand + assignments, multiple subcommands, or stray bare flags (e.g. --ctx
alone) and surface clear error messages referencing the offending keys so
runOllamaCommand never receives ambiguous input.
| function formatParsed(parsed: ParsedOllamaArgs): string { | ||
| const parts: string[] = []; | ||
| if (parsed.vramPercent !== undefined && parsed.vramPercent !== null) | ||
| parts.push(`vramPercent: ${parsed.vramPercent}`); | ||
| if (parsed.numCtx !== undefined && parsed.numCtx !== null) | ||
| parts.push(`numCtx: ${parsed.numCtx}`); | ||
| if (parsed.numBatch !== undefined && parsed.numBatch !== null) | ||
| parts.push(`numBatch: ${parsed.numBatch}`); | ||
| if (parsed.flashAttention !== undefined && parsed.flashAttention !== null) | ||
| parts.push(`flashAttention: ${parsed.flashAttention}`); | ||
| if (parsed.kvCacheType !== undefined && parsed.kvCacheType !== null) | ||
| parts.push(`kvCacheType: "${parsed.kvCacheType}"`); | ||
| return `{ ${parts.join(", ")} }`; | ||
| } |
There was a problem hiding this comment.
Keep explicit clears visible in the preview output.
formatParsed drops null fields, so commands like ctx=off vram=off render as { }. In this scaffold, that makes “clear these settings” indistinguishable from “nothing parsed”.
Suggested fix
- if (parsed.vramPercent !== undefined && parsed.vramPercent !== null)
- parts.push(`vramPercent: ${parsed.vramPercent}`);
- if (parsed.numCtx !== undefined && parsed.numCtx !== null)
- parts.push(`numCtx: ${parsed.numCtx}`);
- if (parsed.numBatch !== undefined && parsed.numBatch !== null)
- parts.push(`numBatch: ${parsed.numBatch}`);
+ if (parsed.vramPercent !== undefined)
+ parts.push(`vramPercent: ${parsed.vramPercent === null ? "off" : parsed.vramPercent}`);
+ if (parsed.numCtx !== undefined)
+ parts.push(`numCtx: ${parsed.numCtx === null ? "off" : parsed.numCtx}`);
+ if (parsed.numBatch !== undefined)
+ parts.push(`numBatch: ${parsed.numBatch === null ? "off" : parsed.numBatch}`);
if (parsed.flashAttention !== undefined && parsed.flashAttention !== null)
parts.push(`flashAttention: ${parsed.flashAttention}`);
- if (parsed.kvCacheType !== undefined && parsed.kvCacheType !== null)
- parts.push(`kvCacheType: "${parsed.kvCacheType}"`);
+ if (parsed.kvCacheType !== undefined)
+ parts.push(`kvCacheType: ${parsed.kvCacheType === null ? '"off"' : `"${parsed.kvCacheType}"`}`);Also applies to: 265-265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/ollama-cmd.ts` around lines 222 - 235, formatParsed currently omits
keys whose values are null, making explicit "clear" settings indistinguishable
from nothing; update formatParsed to always include each relevant key
(vramPercent, numCtx, numBatch, flashAttention, kvCacheType) and render null
values explicitly (e.g., show "null" or "off" text) instead of skipping them so
commands like ctx=off or vram=off appear in the preview; adjust the kvCacheType
formatting to still quote non-null strings but emit a visible token for null as
well; apply the same visible-null behavior to the analogous rendering site
referenced around the other occurrence.
|
✨ Thanks for submitting this PR that adds ollama tuning config schema and CLI scaffold for CoWork, which proposes a way to improve the user experience with local Ollama inference. |
1 similar comment
|
✨ Thanks for submitting this PR that adds ollama tuning config schema and CLI scaffold for CoWork, which proposes a way to improve the user experience with local Ollama inference. |
Summary
Adds the first PR implementing
nemoclaw ollamaGPU/VRAM control for local Ollama inference. This PR establishes theOllamaTuningconfig schema with full type-guard validation, registers the completenemoclaw ollamaCLI surface (9 subcommands), and delivers a parsing scaffold that accepts every subcommand and the combined assignment form (vram=80% ctx=32768 kv-cache=q8_0 flash=on) with full test coverage. No real apply logic yet — the VRAM resolver and ollama subcommands (PR 2), and daemon manager (PR 3) follow in subsequent PRs.Related Issue
Changes
nemoclaw/src/onboard/config.ts— addOllamaTuninginterface andollamaTuning?: OllamaTuningtoNemoClawOnboardConfig; extendisOnboardConfigtype guard with per-field validation (vramPercent∈ [1,100],kvCacheTypeenum,numGpuLayers≥ −1,numCtx > 0,numBatch > 0,appliedAtrequired)src/lib/ollama-cmd.ts(new) —parseOllamaArgs(argv)handles all subcommands and the combined form; stubrunOllamaCommandprints"would apply: <parsed>"and exits 0src/nemoclaw.ts—case "ollama":dispatch after thegccasesrc/lib/command-registry.ts— register 9nemoclaw ollama …subcommands (group: Sandbox Management, scope: global)nemoclaw/src/commands/slash.ts—case "ollama":→slashOllamaStatus(read-only; no daemon restart in slash context)Roadmap to full Ollama VRAM control (PRs 2–3):
(Aside — test infra fix: fake openshell scripts in three existing tests fell through on
sandbox ssh-config, causing the CLI to call realsshwith an empty config, triggering multi-second DNS lookups in WSL2 and timing out those tests. Addedexit 1handler to each fake openshell; fixes 13 tests. The remaining 14 failures intest/cli.test.ts,test/gateway-state-reconcile-2276.test.ts, andtest/nemoclaw-cli-recovery.test.tsare pre-existing regressions tracked in #2042 and #2562.)Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)AI Disclosure
Signed-off-by: Giedrius T. Burachas gburachas@nvidia.com
Summary by CodeRabbit
New Features
vram,ctx,batch,flash,kv-cache)./nemoclaw ollamacommand with status reporting and configuration management.Tests