Skip to content

Latest commit

 

History

History
234 lines (168 loc) · 8.19 KB

File metadata and controls

234 lines (168 loc) · 8.19 KB

Ollama Integration with AIDD

AIDD now supports Ollama as a provider through the ZRun CLI backend, allowing you to run AI coding agents locally without API costs.

What is Ollama?

Ollama is a local LLM runner that lets you run models like Llama 3.1, Qwen, and Code Llama on your own machine. It provides an OpenAI-compatible API that works seamlessly with AIDD's ZRun agent.

Prerequisites

  1. Install Ollama: Follow the official guide at https://ollama.ai/download
  2. Pull a Model: Choose a model and pull it with ollama pull <model>
  3. Start Ollama: Run ollama serve to start the server

Recommended Models

Model Size RAM Required Tool Support Best For
gpt-oss:20b 13GB 16GB Default — passed the aidd quiz benchmark
llama3.1:70b 40GB 64GB Complex tasks (untested against aidd quiz)
qwen2.5:32b 19GB 32GB Code generation
codellama:34b 19GB 32GB Code-specific
deepseek-coder-v2:16b 9.2GB 16GB Programming
llama3.1:8b 4.7GB 8GB ⚠️ Lower-RAM fallback; tool-call reliability is hit-or-miss

Why gpt-oss:20b is the default: in the 2026-04-19 codebase comprehension quiz run (benchmarks/fixtures/quiz/), gpt-oss:20b was the only installed local model that scored a clean sweep on all four questions — llama3.2:latest, qwen3.5:9b, and qwen2.5-7b-cline all failed at the tool-call or response-writing layer. If you want the older llama3.1:8b default, pin it explicitly in providers.ollama.model or pass --model llama3.1:8b per run.

Setup Instructions

1. Add Ollama to zrun/config.json

Both providers live in the same config file — add an ollama entry alongside your existing zhipu block (or copy zrun/config.json.example if starting fresh):

{
	"defaultProvider": "zhipu",
	"maxTurns": 500,
	"providers": {
		"ollama": {
			"model": "gpt-oss:20b"
		},
		"zhipu": {
			"apiKey": "your-z-ai-api-key",
			"model": "glm-5.1"
		}
	}
}

You never have to edit this file again to switch — pick a provider per run via CLI flag, env var, or model inference (see §3).

2. Verify Installation

Run the test script from the repo root:

bun zrun/test-ollama.ts

# or, if Ollama runs on a different host
bun zrun/test-ollama.ts --base-url http://192.168.1.100:11434

It probes /api/tags, lists installed models, and reports the OpenAI-compatible URL zrun will use at runtime. Exit code 0 means ready; non-zero means the server is unreachable or has no models pulled.

3. Use with AIDD

Four ways to pick Ollama for a run, in decreasing precedence:

# (a) Explicit --provider flag
./aidd.sh --cli zrun --project-dir ./myproject --provider ollama

# (b) ZRUN_PROVIDER env var (works through aidd.sh without any extra plumbing)
ZRUN_PROVIDER=ollama ./aidd.sh --cli zrun --project-dir ./myproject

# (c) Auto-inferred from --model — any known Ollama family name routes to ollama
./aidd.sh --cli zrun --project-dir ./myproject --model llama3.1:8b
./aidd.sh --cli zrun --project-dir ./myproject --model qwen2.5:32b

# (d) defaultProvider in config.json, once you're ready to make ollama sticky
#     ({ "defaultProvider": "ollama", ... })
./aidd.sh --cli zrun --project-dir ./myproject

Model Discovery

If you don't specify a model (no providers.ollama.model in config, no --model on the command line), ZRun will automatically:

  1. Connect to your Ollama instance
  2. List available models
  3. Pick the provider's declared default (gpt-oss:20b) if it's installed, otherwise fall back to the alphabetical first so repeat runs are deterministic

You can see available models with:

curl http://localhost:11434/api/tags

Performance Tips

  1. Use GPU Acceleration: Install Ollama with GPU support for better performance
  2. Choose Appropriate Model Size: Match model size to your available RAM
  3. Adjust maxTurns: Reduce maxTurns in config for faster iteration
  4. Use Smaller Contexts: Keep prompts focused to reduce token usage

Troubleshooting

"Ollama server is not running"

ollama serve

"No models found"

ollama pull gpt-oss:20b

Model doesn't respond to tool calls

Some models have limited function calling support. Try models marked with ✅ in the table above.

Performance is slow

  • Check if GPU is being used: Ollama should show "GPU" in the output
  • Consider a smaller model
  • Close other applications to free RAM

Configuration Options

Per-provider settings live under providers.ollama:

Option Required Default Description
providers.ollama.baseUrl No http://localhost:11434/v1 Ollama API endpoint
providers.ollama.model No First available Model to use (overridable per run via --model)
providers.ollama.apiKey No - Not needed for Ollama

Top-level settings that apply across all providers:

Option Required Default Description
defaultProvider No "zhipu" Which provider fires when no --provider flag / ZRUN_PROVIDER / --model hint
maxTurns No 500 Maximum agent iterations

Coexisting with Zhipu AI

There's nothing to migrate — both providers live in the same config file and you pick at runtime:

{
	"defaultProvider": "zhipu",
	"providers": {
		"ollama": { "model": "gpt-oss:20b" },
		"zhipu": { "apiKey": "your-z-ai-api-key", "model": "glm-5.1" }
	}
}
# Default (zhipu, per defaultProvider)
./aidd.sh --cli zrun --project-dir ./myproject

# Same machine, switch to ollama for one run
ZRUN_PROVIDER=ollama ./aidd.sh --cli zrun --project-dir ./myproject

# Or let --model decide
./aidd.sh --cli zrun --project-dir ./myproject --model gpt-oss:20b

If you want ollama to be the sticky default, flip defaultProvider to "ollama" — the zhipu block stays available for runs that pass --provider zhipu or ZRUN_PROVIDER=zhipu.

Security Considerations

  • Ollama runs locally, so your code never leaves your machine
  • No API keys or external dependencies
  • Ensure your Ollama instance is not exposed to the network if sensitive
  • Regular model updates are manual (pull new versions when available)

Advanced Usage

Custom Ollama Server

Point at a remote Ollama host:

{
	"providers": {
		"ollama": {
			"baseUrl": "http://192.168.1.100:11434/v1",
			"model": "gpt-oss:20b"
		}
	}
}

Multiple Model Support

Swap models per task without editing config:

# Lightweight model for quick tasks (auto-infers ollama from the model name)
./aidd.sh --cli zrun --model qwen2.5:7b --max-iterations 5

# Heavier model for complex features
./aidd.sh --cli zrun --model llama3.1:70b --max-iterations 20

Limitations

  1. Hardware Dependent: Performance depends on your CPU/GPU
  2. Model Quality: Local models may be less capable than GPT-4/Claude
  3. Token Limits: Smaller context windows than cloud models
  4. Tool Support: Varies by model (check documentation)

Getting Help