Local LLM Provider

The local provider connects Koan to any OpenAI-compatible LLM server running on your machine. This gives you full offline operation with no API costs — ideal for experimentation, privacy-sensitive work, or when you've exhausted your cloud quota.

Quick Setup

1. Install a Local LLM Server

Pick one:

Ollama (recommended for beginners)

# macOS
brew install ollama

# Start the server
ollama serve

# Pull a model
ollama pull qwen2.5-coder:14b

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.

llama.cpp

# Build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Download a GGUF model and start the server
./llama-server -m /path/to/model.gguf --port 8080

API at http://localhost:8080/v1.

LM Studio

Download from https://lmstudio.ai, load a model, and start the local server from the UI. API at http://localhost:1234/v1.

vLLM

pip install vllm
vllm serve "Qwen/Qwen2.5-Coder-14B" --port 8000

API at http://localhost:8000/v1.

2. Configure Koan

In config.yaml:

cli_provider: "local"

local_llm:
  base_url: "http://localhost:11434/v1"   # Adjust for your server
  model: "qwen2.5-coder:14b"             # Model name on your server
  api_key: ""                              # Usually empty for local servers

Or via environment variables (in .env):

KOAN_CLI_PROVIDER=local
KOAN_LOCAL_LLM_BASE_URL=http://localhost:11434/v1
KOAN_LOCAL_LLM_MODEL=qwen2.5-coder:14b
KOAN_LOCAL_LLM_API_KEY=

Environment variables override config.yaml values.

3. Start Everything

If you're using Ollama, the easiest way to start Koan is:

make ollama

This single command starts all three components in the background:

ollama serve — the LLM server
The Telegram bridge (awake)
The agent loop (run)

To stop everything:

make stop

This stops all Koan processes including ollama serve.

To check what's running:

make status

Environment variables from .env (like OLLAMA_HOST) are automatically loaded and passed to all components.

4. Verify (Optional)

If you want to manually verify the LLM server:

# Quick test with curl
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2.5-coder:14b", "messages": [{"role": "user", "content": "Hello"}]}'

Per-Project Configuration

Use local LLM for specific projects (e.g., small libraries) while keeping Claude for complex work:

# projects.yaml
defaults:
  cli_provider: "claude"

projects:
  critical-app:
    path: "/path/to/app"
    # Uses Claude (default)

  side-project:
    path: "/path/to/side"
    cli_provider: "local"   # Use local LLM for this project

How It Works

Unlike Claude and Copilot which call external CLI binaries, the local provider runs Koan's own local_llm_runner.py — a Python-based agentic loop that:

Sends your prompt + system context to the LLM via the OpenAI API
Parses tool_calls from the response (function calling format)
Executes tools locally (read, write, edit, grep, glob, shell)
Feeds results back to the LLM
Repeats until the LLM produces a final text response or max turns

This means any LLM server supporting the OpenAI function calling protocol will work.

Recommended Models

Not all local models handle tool use (function calling) well. Models that work best with Koan's agentic loop:

Model	Size	Tool Use	Notes
`qwen2.5-coder:14b`	14B	Good	Best balance of size and capability
`qwen2.5-coder:7b`	7B	Fair	Lighter, faster, less reliable tool use
`deepseek-coder-v2:16b`	16B	Good	Strong coding, good function calling
`codellama:34b`	34B	Fair	Needs more RAM, variable tool use
`mistral:7b`	7B	Basic	Fast but limited tool use

Hardware requirements vary by model size:

7B models: 8GB RAM minimum, 16GB recommended
14B models: 16GB RAM minimum, 32GB recommended
34B+ models: 32GB+ RAM, consider GPU acceleration

Provider Differences

Feature	Claude Code	Local LLM
Binary	`claude` (external)	Python runner (built-in)
Tool protocol	Native Claude tools	OpenAI function calling
Fallback model	Supported	Not supported
MCP support	Yes	No (tools are built-in)
Output format	JSON supported	JSON supported
Max turns	Supported	Supported
Cost	API subscription	Free (hardware only)
Quality	State of the art	Varies by model
Speed	Network latency	Local inference speed

Configuration Reference

config.yaml

cli_provider: "local"

local_llm:
  # Server URL — must expose /v1/chat/completions
  base_url: "http://localhost:11434/v1"

  # Model name — as recognized by your server
  model: "qwen2.5-coder:14b"

  # API key — usually empty for local servers
  # Some servers (e.g., vLLM with auth) may require one
  api_key: ""

Environment Variables

Variable	Default	Description
`KOAN_CLI_PROVIDER`	`claude`	Set to `local` to enable
`KOAN_LOCAL_LLM_BASE_URL`	`http://localhost:11434/v1`	Server URL
`KOAN_LOCAL_LLM_MODEL`	(none)	Model name (required)
`KOAN_LOCAL_LLM_API_KEY`	(none)	API key if needed

Default Server URLs by Platform

Server	Default URL
Ollama	`http://localhost:11434/v1`
llama.cpp	`http://localhost:8080/v1`
LM Studio	`http://localhost:1234/v1`
vLLM	`http://localhost:8000/v1`

Troubleshooting

"Connection refused" or timeout

Your LLM server isn't running. Start it:

# Ollama
ollama serve

# llama.cpp
./llama-server -m /path/to/model.gguf

# Check if the server is up
curl http://localhost:11434/v1/models

Model not found

The model name in your config doesn't match what's loaded on the server.

# Ollama — list available models
ollama list

# Pull a model if needed
ollama pull qwen2.5-coder:14b

Poor quality results / tool use failures

Local models have variable tool-use capability. If the agent produces garbage or ignores tool results:

Try a larger model (14B+ recommended for tool use)
Try a different model family (Qwen2.5-Coder works well)
Consider using Claude for complex missions and local LLM for simpler tasks via per-project configuration

Slow inference

Local inference speed depends on your hardware. Tips:

Use quantized models (Q4_K_M is a good balance)
Enable GPU acceleration in your server config
Use a smaller model for chat/lightweight tasks
Set models.lightweight to a smaller local model

API key errors

Most local servers don't require an API key. If yours does, set it in config:

local_llm:
  api_key: "your-key-here"

Or via environment:

KOAN_LOCAL_LLM_API_KEY=your-key-here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local LLM Provider

Quick Setup

1. Install a Local LLM Server

2. Configure Koan

3. Start Everything

4. Verify (Optional)

Per-Project Configuration

How It Works

Recommended Models

Provider Differences

Configuration Reference

config.yaml

Environment Variables

Default Server URLs by Platform

Troubleshooting

"Connection refused" or timeout

Model not found

Poor quality results / tool use failures

Slow inference

API key errors

FilesExpand file tree

provider-local.md

Latest commit

History

provider-local.md

File metadata and controls

Local LLM Provider

Quick Setup

1. Install a Local LLM Server

2. Configure Koan

3. Start Everything

4. Verify (Optional)

Per-Project Configuration

How It Works

Recommended Models

Provider Differences

Configuration Reference

config.yaml

Environment Variables

Default Server URLs by Platform

Troubleshooting

"Connection refused" or timeout

Model not found

Poor quality results / tool use failures

Slow inference

API key errors