feat: add OllamaAdapter for local-inference benchmarking by SyncroAgency · Pull Request #1495 · garrytan/gstack

SyncroAgency · 2026-05-14T12:59:33Z

Summary

Wires a fourth provider into /benchmark-models alongside Claude, GPT, and Gemini. Ollama talks HTTP directly to a local daemon (http://localhost:11434 by default) rather than shelling out to a CLI, so adapter setup is ollama serve instead of a login flow. Cost is zero (local inference) and tool-call surface is empty (/api/generate is pure completion).

Why

Lets users compare local models (qwen2.5-coder:7b, llama3.2:3b, etc.) head-to-head against cloud providers using the same harness. Use cases:

Cost/quality tradeoff sizing before routing a task local vs cloud
Validating a local model is "good enough" for a specific skill prompt
Latency baselines for local inference vs cloud RTT

Strictly additive — zero behavioral change for existing claude/gpt/gemini paths.

Adapter behavior

Method	Behavior
`available()`	Probes `GET /api/tags` with 2s timeout. Reports remediation hint on every failure mode (install URL, `ollama serve`, `ollama pull <model>`)
`run()`	POSTs to `/api/generate` with `{model, prompt, stream:false}`. Parses `response`, `prompt_eval_count`, `eval_count`, `model`
`estimateCost()`	Returns 0 via priced-at-zero rows in PRICING table (future paid-GPU hosts can override per-model)

Env overrides: GSTACK_OLLAMA_URL (custom host/port), GSTACK_OLLAMA_MODEL (default model, falls back to qwen2.5-coder:7b).

Files touched

test/helpers/providers/types.ts — extend Family union with 'ollama'
test/helpers/providers/ollama.ts — new, the adapter
test/helpers/benchmark-runner.ts — register OllamaAdapter; switch hardcoded provider unions to Family type for forward compat
bin/gstack-model-benchmark — register adapter, generalize whitelist into a VALID_PROVIDERS array (parser still rejects unknown names with WARN, unchanged behavior)
test/helpers/tool-map.ts — add ollama row (all tools false — /api/generate exposes no agentic surface)
test/helpers/pricing.ts — zero-cost rows for qwen2.5-coder:7b, llama3.2:3b, nomic-embed-text
benchmark-models/SKILL.md.tmpl + regen SKILL.md — include ollama in default dry-run + describe local-vs-cloud workflow

Tests

test/providers-ollama.test.ts — new, 14 offline unit tests via fetch stubbing covering:
- availability probing (ok, no models, unreachable, non-2xx, env-overridden URL)
- generate POST shape + model defaults + env override
- ECONNREFUSED → binary_missing routing
- 404 → remediation hint routing
- AbortController timeout → timeout error code
- zero-cost estimation
- stable name/family identity
test/skill-e2e-benchmark-providers.test.ts — live smoke gated on EVALS=1. Skips cleanly if daemon not running. Skips ok-text assertion (local model quality varies).
test/benchmark-cli.test.ts — +2 regression tests covering ollama in --models whitelist + valid-providers WARN message
test/benchmark-runner.test.ts — PRICING + missingTools + TOOL_COMPATIBILITY assertions extended to the fourth family

Test plan

bun test test/providers-ollama.test.ts — 14/14 pass (offline)
bun test test/benchmark-{cli,runner}.test.ts — green for ollama-related additions; pre-existing Windows-only failure (gpt: NOT READY env-strip test) unchanged
Full bun test — zero new failures vs main baseline
bun run gen:skill-docs --host all — regenerated; SKILL.md freshness checks pass
bun run build — compiles browse/design/pdf binaries; trailing chmod/rm steps fail on Windows shell but that's pre-existing
Manual smoke: bun run bin/gstack-model-benchmark --prompt hi --models claude,gpt,gemini,ollama --dry-run prints availability for all 4 with helpful remediation when ollama daemon is down
Live bun run test:e2e EVALS=1 with daemon running — to be validated by reviewer's machine (mine doesn't have Ollama running today; adapter response confirmed via offline stubs)

Out of scope

This PR does not wire Ollama into /qa, /review, /ship, or any other slash skill — those skills hardcode the current Claude session. Adding routing-layer integration would be a separate, larger architectural change. This PR is the minimal additive step that lets /benchmark-models ask "is local good enough?" with data.

Wires a fourth provider into `/benchmark-models` alongside Claude, GPT, and Gemini. Ollama talks HTTP directly to a local daemon (http://localhost:11434 by default) rather than shelling out to a CLI, so adapter setup is `ollama serve` instead of a login flow. Cost is zero (local inference) and tool-call surface is empty (the `/api/generate` endpoint is pure completion). Why: lets users compare local models (qwen2.5-coder:7b, llama3.2:3b, etc.) head-to-head against cloud providers using the same harness. Use cases include cost/quality tradeoff sizing before routing a task local vs cloud, and validating that a local model is "good enough" for a specific skill prompt. Adapter behavior: - available() probes GET /api/tags with 2s timeout; reports remediation hint (install URL, ollama serve, ollama pull <model>) on every failure mode. - run() POSTs to /api/generate with {model, prompt, stream:false}; parses response, prompt_eval_count, eval_count, model. - estimateCost() returns 0 via priced-at-zero rows in PRICING table (future paid-GPU hosts can override per-model). - Env overrides: GSTACK_OLLAMA_URL (custom host/port), GSTACK_OLLAMA_MODEL (default model). Wiring touched: - types.ts: extend Family union with 'ollama'. - benchmark-runner.ts: import + register OllamaAdapter; switch hardcoded unions to Family type for forward compat. - bin/gstack-model-benchmark: register adapter, generalize whitelist into a VALID_PROVIDERS array. - tool-map.ts: add ollama row (all tools false — /api/generate exposes no agentic surface). - pricing.ts: add zero-cost rows for qwen2.5-coder:7b, llama3.2:3b, nomic-embed-text. - benchmark-models/SKILL.md(.tmpl): include ollama in default dry-run model list + describe local-vs-cloud workflow. Tests: - test/providers-ollama.test.ts: 14 offline unit tests via fetch stubbing covering availability probing, generate POST shape, model override env vars, ECONNREFUSED→binary_missing routing, 404→remediation hint routing, timeout via AbortController, zero-cost estimation, stable name/family identity. - test/skill-e2e-benchmark-providers.test.ts: live smoke gated on EVALS=1 — skips cleanly if daemon not running. - test/benchmark-cli.test.ts: regression coverage for ollama acceptance in --models whitelist + valid-providers WARN message. - test/benchmark-runner.test.ts: PRICING + missingTools + TOOL_COMPATIBILITY assertions extended to the fourth family.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OllamaAdapter for local-inference benchmarking#1495

feat: add OllamaAdapter for local-inference benchmarking#1495
SyncroAgency wants to merge 1 commit into
garrytan:mainfrom
SyncroAgency:feat/ollama-adapter

SyncroAgency commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SyncroAgency commented May 14, 2026

Summary

Why

Adapter behavior

Files touched

Tests

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant