Skip to content

feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1)#2577

Open
gburachas wants to merge 2 commits into
mainfrom
feat/ollama-controls
Open

feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1)#2577
gburachas wants to merge 2 commits into
mainfrom
feat/ollama-controls

Conversation

@gburachas
Copy link
Copy Markdown

@gburachas gburachas commented Apr 28, 2026

Summary

Adds the first PR implementing nemoclaw ollama GPU/VRAM control for local Ollama inference. This PR establishes the OllamaTuning config schema with full type-guard validation, registers the complete nemoclaw ollama CLI surface (9 subcommands), and delivers a parsing scaffold that accepts every subcommand and the combined assignment form (vram=80% ctx=32768 kv-cache=q8_0 flash=on) with full test coverage. No real apply logic yet — the VRAM resolver and ollama subcommands (PR 2), and daemon manager (PR 3) follow in subsequent PRs.

Related Issue

Changes

  • nemoclaw/src/onboard/config.ts — add OllamaTuning interface and ollamaTuning?: OllamaTuning to NemoClawOnboardConfig; extend isOnboardConfig type guard with per-field validation (vramPercent ∈ [1,100], kvCacheType enum, numGpuLayers ≥ −1, numCtx > 0, numBatch > 0, appliedAt required)
  • src/lib/ollama-cmd.ts (new) — parseOllamaArgs(argv) handles all subcommands and the combined form; stub runOllamaCommand prints "would apply: <parsed>" and exits 0
  • src/nemoclaw.tscase "ollama": dispatch after the gc case
  • src/lib/command-registry.ts — register 9 nemoclaw ollama … subcommands (group: Sandbox Management, scope: global)
  • nemoclaw/src/commands/slash.tscase "ollama":slashOllamaStatus (read-only; no daemon restart in slash context)
  • Tests — config round-trip + rejection tests, CLI parse tests, slash dispatch test (109 new assertions across 4 files)

Roadmap to full Ollama VRAM control (PRs 2–3):

  • PR 2/3: VRAM-aware tuning resolver and nemoclaw ollama subcommand implementation
  • PR 3/3: daemon manager (managed/systemd/foreign) + WSL2 unification (always route through auth proxy)

(Aside — test infra fix: fake openshell scripts in three existing tests fell through on sandbox ssh-config, causing the CLI to call real ssh with an empty config, triggering multi-second DNS lookups in WSL2 and timing out those tests. Added exit 1 handler to each fake openshell; fixes 13 tests. The remaining 14 failures in test/cli.test.ts, test/gateway-state-reconcile-2276.test.ts, and test/nemoclaw-cli-recovery.test.ts are pre-existing regressions tracked in #2042 and #2562.)

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: Claude Code (opus-4-7 for planning, sonnet-4-6 for coding, 1M context)

Signed-off-by: Giedrius T. Burachas gburachas@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added Ollama GPU and memory tuning commands (vram, ctx, batch, flash, kv-cache).
    • Added /nemoclaw ollama command with status reporting and configuration management.
    • Added automated tuning optimization, application of recommendations, and reset capabilities.
    • Added validation for Ollama tuning configuration parameters.
  • Tests

    • Added comprehensive unit tests for new Ollama functionality.

gburachas and others added 2 commits April 27, 2026 15:54
Add OllamaTuning interface to nemoclaw/src/onboard/config.ts, extend
NemoClawOnboardConfig with an optional ollamaTuning field, and validate
all fields in isOnboardConfig: vramPercent ∈ [1,100]; kvCacheType enum;
numGpuLayers ≥ −1; numCtx > 0; numBatch > 0; appliedAt required.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…PR 1/5)

Ollama VRAM control — CLI scaffold:
- src/lib/ollama-cmd.ts: parseOllamaArgs (all subcommands + combined form
  vram=N% ctx=N kv-cache=... flash=on|off) and stub runOllamaCommand
  (prints "would apply: <parsed>", exits 0; no real apply yet)
- src/nemoclaw.ts: add case "ollama": dispatch after the gc case
- src/lib/command-registry.ts: register 9 nemoclaw ollama subcommands
  (group: Sandbox Management, scope: global); update count assertions
- nemoclaw/src/commands/slash.ts: case "ollama" → slashOllamaStatus
  (read-only; no daemon restart in slash context)

Tests:
- test/cli/ollama-cmd.test.ts: parse tests for every subcommand, the
  combined form, and malformed-input rejection
- nemoclaw/src/commands/slash.test.ts: routing test for the new case

Test infra fix (WSL2 DNS hang, unrelated to Ollama):
- Fake openshell scripts in sandbox-stuck-recovery, sandbox-connect-
  inference, and reboot-identity-drift tests did not handle
  "sandbox ssh-config", falling through to exit 0 with empty output.
  This caused the CLI to call real ssh with an empty config, triggering
  a multi-second DNS lookup for a fake hostname that timed out the
  spawnSync calls (status: null). Added exit-1 handler for that command
  so executeSandboxCommand short-circuits without touching ssh.
  Fixes 13 tests; remaining 14 failures are pre-existing regressions
  tracked in #2042 / #2562.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Ollama GPU/memory tuning system comprising a new onboard configuration model, CLI command suite, slash command support, and argument parser. Eight global CLI commands manage VRAM, context, batch settings, and orchestration, while tests validate configuration constraints and command parsing across multiple scopes.

Changes

Cohort / File(s) Summary
Ollama Configuration Model
nemoclaw/src/onboard/config.ts, nemoclaw/src/onboard/config.test.ts
Added OllamaTuning interface with validated fields (vramPercent 1–100, numGpuLayers ≥ -1, positive numCtx/numBatch, kvCacheType enum, required appliedAt). Config parser now enforces constraints and allows tuning to be absent. Comprehensive test suite validates range checks, enum values, required fields, and JSON round-tripping.
Slash Command Handler
nemoclaw/src/commands/slash.ts, nemoclaw/src/commands/slash.test.ts
New slashOllamaStatus() handler displays Ollama tuning status, rendering either default-use message or multi-field status report (VRAM cap, GPU layers, context/batch sizes, cache options, timestamps). Help output extended to list ollama subcommand. Tests verify output correctness for unconfigured and configured states.
CLI Command Registry
src/lib/command-registry.ts, src/lib/command-registry.test.ts
Registered eight new global Sandbox Management commands: vram, ctx, batch, flash, kv-cache (per-request/restart settings), plus optimize, apply, reset, status (orchestration). Registry tests updated to reflect new command counts and token expectations.
Ollama CLI Parser
src/lib/ollama-cmd.ts, test/cli/ollama-cmd.test.ts
New argument parser supporting named subcommands (status, optimize, apply, reset), flag options (--help, --sudo, --yes, --apply), and key=value assignments with validation (vram 1–100, ctx/batch positive, kv-cache enum). runOllamaCommand exports output behavior and error handling. Extensive test coverage for parsing, validation, and error cases.
Integration & Fixtures
src/nemoclaw.ts, test/reboot-identity-drift.test.ts, test/sandbox-connect-inference.test.ts, test/sandbox-stuck-recovery.test.ts
Top-level wiring imports and dispatches runOllamaCommand. Three test fixtures now explicitly fail sandbox ssh-config invocation (exit code 1) instead of falling through to success, affecting recovery/inference testing behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 A new burrow for Ollama's GPU dreams,
With tuning configs and validation schemes,
VRAM capped, layers set, cache types refined,
Commands cascading, so thoughtfully designed!
Status and optimize, apply and reset—
The finest GPU care—a rabbit's best bet! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main change: introducing an Ollama tuning config schema and CLI scaffold, which is the core focus across all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ollama-controls

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
nemoclaw/src/commands/slash.ts (1)

80-90: Consider distinguishing “not configured” from “non-Ollama endpoint.”

Right now this path can imply Ollama defaults even when current onboarding is a different endpoint type. A small branch for config.endpointType !== "ollama" would avoid user confusion.

Suggested refinement
 function slashOllamaStatus(): PluginCommandResult {
   const config = loadOnboardConfig();
   const tuning = config?.ollamaTuning;
 
+  if (config && config.endpointType !== "ollama") {
+    return {
+      text: [
+        "**Ollama VRAM Tuning**",
+        "",
+        `Current endpoint is '${config.endpointType}', so Ollama tuning is inactive.`,
+        "Run `nemoclaw onboard` and select Ollama to enable local tuning controls.",
+      ].join("\n"),
+    };
+  }
+
   if (!tuning) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/slash.ts` around lines 80 - 90, The current
early-return that assumes Ollama defaults when !tuning should first check the
onboarding/agent configuration type; update the branch that checks tuning (the
code using tuning and config.endpointType) to distinguish non-Ollama endpoints
by adding a guard if config.endpointType !== "ollama" and return a clear message
like "This endpoint is not Ollama — tuning not applicable" (instead of the
Ollama-specific message); otherwise keep the existing Ollama-default message and
the prompts to run `nemoclaw ollama optimize` / `status`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/onboard/config.ts`:
- Around line 85-96: The numeric validations for the tuning knobs currently
allow fractional values; update the checks in the validation block that handles
t.vramPercent, t.numGpuLayers, t.numCtx, and t.numBatch to require integers (use
Number.isInteger or equivalent) in addition to the existing range checks: ensure
t.vramPercent is an integer between 1 and 100, t.numGpuLayers is an integer and
either -1 or >=0 as appropriate, and t.numCtx and t.numBatch are integers > 0;
keep the existing typeof/range logic but add the integer test and return false
when it fails.

In `@src/lib/ollama-cmd.ts`:
- Around line 116-146: The parser currently allows mixed or unsupported combos
(e.g. subcommand plus assignments or stray flags) and then runOllamaCommand
silently drops extras; after the loop that builds result (using
result.subcommand and parseAssignment), add a final validation step that
inspects result and rejects invalid shapes by throwing OllamaArgParseError:
allow only (a) subcommand-only (result.subcommand set and no assignments/flags),
(b) assignment-only (no result.subcommand but at least one assignment/flag), or
(c) the special optimize mode (result.subcommand === "optimize") with its
allowed optimize-specific flags; reject combinations like subcommand +
assignments, multiple subcommands, or stray bare flags (e.g. --ctx alone) and
surface clear error messages referencing the offending keys so runOllamaCommand
never receives ambiguous input.
- Around line 222-235: formatParsed currently omits keys whose values are null,
making explicit "clear" settings indistinguishable from nothing; update
formatParsed to always include each relevant key (vramPercent, numCtx, numBatch,
flashAttention, kvCacheType) and render null values explicitly (e.g., show
"null" or "off" text) instead of skipping them so commands like ctx=off or
vram=off appear in the preview; adjust the kvCacheType formatting to still quote
non-null strings but emit a visible token for null as well; apply the same
visible-null behavior to the analogous rendering site referenced around the
other occurrence.
- Around line 95-111: The CLI numeric parsing is too permissive (uses parseInt),
so update the --ctx, --vram and the same batch/related branches to validate the
full token with a strict regex before parsing; e.g., ensure the value string
matches /^\d+$/ for plain integers (--ctx, batch) and /^\d+%?$/ for
optional-percent values (--vram), then extract the digits and convert to Number;
throw OllamaArgParseError if the token contains any non-numeric characters or
multiple percent signs. Use the existing symbols (result.optimizeCtx,
result.optimizeVram and the same parsing block that handles batch/batchSize
around lines 156-191) to assign the validated numeric value after this stricter
check.

---

Nitpick comments:
In `@nemoclaw/src/commands/slash.ts`:
- Around line 80-90: The current early-return that assumes Ollama defaults when
!tuning should first check the onboarding/agent configuration type; update the
branch that checks tuning (the code using tuning and config.endpointType) to
distinguish non-Ollama endpoints by adding a guard if config.endpointType !==
"ollama" and return a clear message like "This endpoint is not Ollama — tuning
not applicable" (instead of the Ollama-specific message); otherwise keep the
existing Ollama-default message and the prompts to run `nemoclaw ollama
optimize` / `status`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4ceee886-c3dc-4bcb-9c74-fbc3f4a2dd40

📥 Commits

Reviewing files that changed from the base of the PR and between d392ec0 and faa7ebd.

📒 Files selected for processing (12)
  • nemoclaw/src/commands/slash.test.ts
  • nemoclaw/src/commands/slash.ts
  • nemoclaw/src/onboard/config.test.ts
  • nemoclaw/src/onboard/config.ts
  • src/lib/command-registry.test.ts
  • src/lib/command-registry.ts
  • src/lib/ollama-cmd.ts
  • src/nemoclaw.ts
  • test/cli/ollama-cmd.test.ts
  • test/reboot-identity-drift.test.ts
  • test/sandbox-connect-inference.test.ts
  • test/sandbox-stuck-recovery.test.ts

Comment on lines +85 to +96
if (t.vramPercent !== undefined) {
if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false;
}
if (t.numGpuLayers !== undefined) {
if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false;
}
if (t.numCtx !== undefined) {
if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false;
}
if (t.numBatch !== undefined) {
if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten numeric validation to integers for tuning knobs.

vramPercent, numGpuLayers, numCtx, and numBatch currently accept fractional values. That can admit invalid persisted config states for daemon/env settings that are integer-based.

Suggested fix
   if (t.vramPercent !== undefined) {
-    if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false;
+    if (
+      typeof t.vramPercent !== "number" ||
+      !Number.isInteger(t.vramPercent) ||
+      t.vramPercent < 1 ||
+      t.vramPercent > 100
+    )
+      return false;
   }
   if (t.numGpuLayers !== undefined) {
-    if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false;
+    if (
+      typeof t.numGpuLayers !== "number" ||
+      !Number.isInteger(t.numGpuLayers) ||
+      t.numGpuLayers < -1
+    )
+      return false;
   }
   if (t.numCtx !== undefined) {
-    if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false;
+    if (typeof t.numCtx !== "number" || !Number.isInteger(t.numCtx) || t.numCtx <= 0) return false;
   }
   if (t.numBatch !== undefined) {
-    if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false;
+    if (
+      typeof t.numBatch !== "number" ||
+      !Number.isInteger(t.numBatch) ||
+      t.numBatch <= 0
+    )
+      return false;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (t.vramPercent !== undefined) {
if (typeof t.vramPercent !== "number" || t.vramPercent < 1 || t.vramPercent > 100) return false;
}
if (t.numGpuLayers !== undefined) {
if (typeof t.numGpuLayers !== "number" || t.numGpuLayers < -1) return false;
}
if (t.numCtx !== undefined) {
if (typeof t.numCtx !== "number" || t.numCtx <= 0) return false;
}
if (t.numBatch !== undefined) {
if (typeof t.numBatch !== "number" || t.numBatch <= 0) return false;
}
if (t.vramPercent !== undefined) {
if (
typeof t.vramPercent !== "number" ||
!Number.isInteger(t.vramPercent) ||
t.vramPercent < 1 ||
t.vramPercent > 100
)
return false;
}
if (t.numGpuLayers !== undefined) {
if (
typeof t.numGpuLayers !== "number" ||
!Number.isInteger(t.numGpuLayers) ||
t.numGpuLayers < -1
)
return false;
}
if (t.numCtx !== undefined) {
if (typeof t.numCtx !== "number" || !Number.isInteger(t.numCtx) || t.numCtx <= 0) return false;
}
if (t.numBatch !== undefined) {
if (
typeof t.numBatch !== "number" ||
!Number.isInteger(t.numBatch) ||
t.numBatch <= 0
)
return false;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/onboard/config.ts` around lines 85 - 96, The numeric validations
for the tuning knobs currently allow fractional values; update the checks in the
validation block that handles t.vramPercent, t.numGpuLayers, t.numCtx, and
t.numBatch to require integers (use Number.isInteger or equivalent) in addition
to the existing range checks: ensure t.vramPercent is an integer between 1 and
100, t.numGpuLayers is an integer and either -1 or >=0 as appropriate, and
t.numCtx and t.numBatch are integers > 0; keep the existing typeof/range logic
but add the integer test and return false when it fails.

Comment thread src/lib/ollama-cmd.ts
Comment on lines +95 to +111
if (arg === "--ctx" && i + 1 < remaining.length) {
const val = parseInt(remaining[i + 1], 10);
if (isNaN(val) || val <= 0) {
throw new OllamaArgParseError(`Invalid --ctx value: ${remaining[i + 1]}`);
}
result.optimizeCtx = val;
i += 2;
continue;
}

if (arg === "--vram" && i + 1 < remaining.length) {
const raw = remaining[i + 1].replace(/%$/, "");
const val = parseInt(raw, 10);
if (isNaN(val) || val < 1 || val > 100) {
throw new OllamaArgParseError(`Invalid --vram value: ${remaining[i + 1]}`);
}
result.optimizeVram = val;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use strict numeric parsing for CLI values.

These branches currently accept malformed inputs because parseInt stops at the first non-numeric character. For example, --ctx 32768foo, batch=32x, or vram=80%% will parse as valid values instead of failing. That can silently apply the wrong tuning.

Suggested fix
+function parseStrictPositiveInt(raw: string, field: string): number {
+  if (!/^\d+$/.test(raw)) {
+    throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+  }
+  const value = Number(raw);
+  if (!Number.isSafeInteger(value) || value <= 0) {
+    throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+  }
+  return value;
+}
+
+function parseStrictPercent(raw: string, field: string): number {
+  const normalized = raw.replace(/%$/, "");
+  if (!/^\d+$/.test(normalized)) {
+    throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+  }
+  const value = Number(normalized);
+  if (!Number.isSafeInteger(value) || value < 1 || value > 100) {
+    throw new OllamaArgParseError(`Invalid ${field} value: ${raw}`);
+  }
+  return value;
+}

Also applies to: 156-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/ollama-cmd.ts` around lines 95 - 111, The CLI numeric parsing is too
permissive (uses parseInt), so update the --ctx, --vram and the same
batch/related branches to validate the full token with a strict regex before
parsing; e.g., ensure the value string matches /^\d+$/ for plain integers
(--ctx, batch) and /^\d+%?$/ for optional-percent values (--vram), then extract
the digits and convert to Number; throw OllamaArgParseError if the token
contains any non-numeric characters or multiple percent signs. Use the existing
symbols (result.optimizeCtx, result.optimizeVram and the same parsing block that
handles batch/batchSize around lines 156-191) to assign the validated numeric
value after this stricter check.

Comment thread src/lib/ollama-cmd.ts
Comment on lines +116 to +146
// Named subcommands
if (
arg === "status" ||
arg === "optimize" ||
arg === "apply" ||
arg === "reset"
) {
if (result.subcommand !== undefined) {
throw new OllamaArgParseError(
`Unexpected subcommand "${arg}" — already have "${result.subcommand}"`,
);
}
result.subcommand = arg;
i++;
continue;
}

// Assignment-style args: key=value
const eqIdx = arg.indexOf("=");
if (eqIdx !== -1) {
const key = arg.slice(0, eqIdx);
const value = arg.slice(eqIdx + 1);
parseAssignment(key, value, result);
i++;
continue;
}

throw new OllamaArgParseError(`Unknown argument: ${arg}`);
}

return result;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject mixed/unsupported argument combinations instead of silently ignoring them.

Right now combinations such as status vram=80%, reset --apply, or even bare --ctx 32768 parse successfully, but runOllamaCommand only prints the subcommand or assignment preview and drops the rest. That makes invalid invocations look accepted.

Please validate the final shape after parsing and fail fast unless the args match one supported mode, e.g. assignment-only, subcommand-only, or optimize with its optimize-specific flags.

Also applies to: 255-265

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/ollama-cmd.ts` around lines 116 - 146, The parser currently allows
mixed or unsupported combos (e.g. subcommand plus assignments or stray flags)
and then runOllamaCommand silently drops extras; after the loop that builds
result (using result.subcommand and parseAssignment), add a final validation
step that inspects result and rejects invalid shapes by throwing
OllamaArgParseError: allow only (a) subcommand-only (result.subcommand set and
no assignments/flags), (b) assignment-only (no result.subcommand but at least
one assignment/flag), or (c) the special optimize mode (result.subcommand ===
"optimize") with its allowed optimize-specific flags; reject combinations like
subcommand + assignments, multiple subcommands, or stray bare flags (e.g. --ctx
alone) and surface clear error messages referencing the offending keys so
runOllamaCommand never receives ambiguous input.

Comment thread src/lib/ollama-cmd.ts
Comment on lines +222 to +235
function formatParsed(parsed: ParsedOllamaArgs): string {
const parts: string[] = [];
if (parsed.vramPercent !== undefined && parsed.vramPercent !== null)
parts.push(`vramPercent: ${parsed.vramPercent}`);
if (parsed.numCtx !== undefined && parsed.numCtx !== null)
parts.push(`numCtx: ${parsed.numCtx}`);
if (parsed.numBatch !== undefined && parsed.numBatch !== null)
parts.push(`numBatch: ${parsed.numBatch}`);
if (parsed.flashAttention !== undefined && parsed.flashAttention !== null)
parts.push(`flashAttention: ${parsed.flashAttention}`);
if (parsed.kvCacheType !== undefined && parsed.kvCacheType !== null)
parts.push(`kvCacheType: "${parsed.kvCacheType}"`);
return `{ ${parts.join(", ")} }`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep explicit clears visible in the preview output.

formatParsed drops null fields, so commands like ctx=off vram=off render as { }. In this scaffold, that makes “clear these settings” indistinguishable from “nothing parsed”.

Suggested fix
-  if (parsed.vramPercent !== undefined && parsed.vramPercent !== null)
-    parts.push(`vramPercent: ${parsed.vramPercent}`);
-  if (parsed.numCtx !== undefined && parsed.numCtx !== null)
-    parts.push(`numCtx: ${parsed.numCtx}`);
-  if (parsed.numBatch !== undefined && parsed.numBatch !== null)
-    parts.push(`numBatch: ${parsed.numBatch}`);
+  if (parsed.vramPercent !== undefined)
+    parts.push(`vramPercent: ${parsed.vramPercent === null ? "off" : parsed.vramPercent}`);
+  if (parsed.numCtx !== undefined)
+    parts.push(`numCtx: ${parsed.numCtx === null ? "off" : parsed.numCtx}`);
+  if (parsed.numBatch !== undefined)
+    parts.push(`numBatch: ${parsed.numBatch === null ? "off" : parsed.numBatch}`);
   if (parsed.flashAttention !== undefined && parsed.flashAttention !== null)
     parts.push(`flashAttention: ${parsed.flashAttention}`);
-  if (parsed.kvCacheType !== undefined && parsed.kvCacheType !== null)
-    parts.push(`kvCacheType: "${parsed.kvCacheType}"`);
+  if (parsed.kvCacheType !== undefined)
+    parts.push(`kvCacheType: ${parsed.kvCacheType === null ? '"off"' : `"${parsed.kvCacheType}"`}`);

Also applies to: 265-265

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/ollama-cmd.ts` around lines 222 - 235, formatParsed currently omits
keys whose values are null, making explicit "clear" settings indistinguishable
from nothing; update formatParsed to always include each relevant key
(vramPercent, numCtx, numBatch, flashAttention, kvCacheType) and render null
values explicitly (e.g., show "null" or "off" text) instead of skipping them so
commands like ctx=off or vram=off appear in the preview; adjust the kvCacheType
formatting to still quote non-null strings but emit a visible token for null as
well; apply the same visible-null behavior to the analogous rendering site
referenced around the other occurrence.

@wscurran wscurran added Local Models Running NemoClaw with local models NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. enhancement: inference Items related to running (local or hosted) inference models from NemoClaw. labels Apr 28, 2026
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR that adds ollama tuning config schema and CLI scaffold for CoWork, which proposes a way to improve the user experience with local Ollama inference.

1 similar comment
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR that adds ollama tuning config schema and CLI scaffold for CoWork, which proposes a way to improve the user experience with local Ollama inference.

@gburachas gburachas changed the title feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1/5) feat(ollama): add ollama tuning config schema and CLI scaffold for CoWork (PR 1) Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. enhancement: inference Items related to running (local or hosted) inference models from NemoClaw. Local Models Running NemoClaw with local models NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants