Provider Configuration

The system is completely provider-agnostic - use any LLM provider you prefer!

Quick Setup

1. Install Provider SDK(s)

Only install what you'll use:

npm install groq-sdk              # Groq (recommended - fastest, cheapest)
npm install @anthropic-ai/sdk     # Anthropic (high quality)
npm install openai                # OpenAI or local models
npm install @google/generative-ai # Google Gemini

2. Configure API Key(s)

Add to .env:

GROQ_API_KEY=your-groq-key           # For Groq
ANTHROPIC_API_KEY=your-anthropic-key # For Anthropic
OPENAI_API_KEY=your-openai-key       # For OpenAI
GOOGLE_API_KEY=your-google-key       # For Gemini
LOCAL_MODEL_ENDPOINT=http://localhost:11434  # For llama.cpp/vLLM (legacy fallback)

3. Verify

node scripts/enhanced-transcript-monitor.js

# Check startup logs for: "[UnifiedInferenceEngine] <provider> SDK loaded"

Supported Providers

Provider	Best For	Cost	Privacy	Speed
Groq	Most tasks	💰 Cheapest	☁️ Cloud	⚡ Fastest
Anthropic	Quality	💰💰 Mid	☁️ Cloud	⚡⚡ Fast
OpenAI	GPT-4	💰💰💰 High	☁️ Cloud	⚡⚡ Fast
Gemini	Google	💰💰 Mid	☁️ Cloud	⚡⚡ Fast
Local	Privacy	🆓 Free	🔒 Local	⚡⚡⚡ Variable

Tier-Based Routing

The system uses tier-based provider routing via the @rapid/llm-proxy unified layer, with Copilot as the primary provider for all tiers. Copilot scales beautifully with parallelism (0.77s effective per call at 10 concurrent), and batch agents already use Promise.all with concurrency 5-20.

Tier	Provider Priority	Use Cases
Fast	Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models	Simple extraction, parsing, basic classification
Standard	Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models	Semantic analysis, ontology classification
Premium	Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models	Insight generation, pattern recognition, QA review
Local Fallback	DMR → Ollama	Always available when cloud providers fail

See LLM Architecture for complete tier configuration.

Per-Agent LLM Mode Control

The UKB workflow system supports three LLM modes that can be set globally or per-agent:

Mode	Icon	Description	Use Case
Mock	🧪 Orange	Fake responses	Testing, development
Local	🖥️ Purple	DMR/llama.cpp	Privacy, offline, cost savings
Public	☁️ Green	Cloud APIs	Production quality

Dashboard Controls

Global Dropdown - Set default mode for all agents in the modal toolbar
Per-Agent Button - Override specific agents in the node sidebar
Node Badge - Shows current mode (M/L/P) on each graph node

API Endpoints

# Get current LLM state
curl http://localhost:3033/api/ukb/llm-state

# Set global mode
curl -X POST http://localhost:3033/api/ukb/llm-mode/global \
  -H "Content-Type: application/json" \
  -d '{"mode": "local"}'

# Set per-agent override
curl -X POST http://localhost:3033/api/ukb/llm-mode/agent \
  -H "Content-Type: application/json" \
  -d '{"agentId": "semantic_analysis", "mode": "mock"}'

# Clear per-agent override
curl -X DELETE http://localhost:3033/api/ukb/llm-mode/agent/semantic_analysis

Using with Different Agents

Claude Code (Anthropic)

echo "ANTHROPIC_API_KEY=your-key" >> .env
npm install @anthropic-ai/sdk
coding

GitHub CoPilot / Cursor (OpenAI)

echo "OPENAI_API_KEY=your-key" >> .env
npm install openai
# Use with your editor

Any Agent + Groq

echo "GROQ_API_KEY=your-key" >> .env
npm install groq-sdk
# Works with ANY coding agent!

Local Models (Privacy-First)

Docker Model Runner (Recommended)

DMR uses llama.cpp via Docker Desktop - no separate installation needed.

Cross-Platform Support:

Platform	GPU Acceleration	Notes
macOS (Apple Silicon)	Metal	Automatic, fastest option
macOS (Intel)	CPU	AVX2 optimized
Linux + NVIDIA	CUDA	Requires CUDA toolkit
Linux + AMD	ROCm/Vulkan	Requires ROCm drivers
Linux (CPU)	AVX2/AVX512	Automatic fallback
Windows + NVIDIA	CUDA	Requires CUDA toolkit
Windows (CPU)	DirectML	Automatic fallback

The install.sh script automatically:

Detects your platform and available GPU acceleration
Enables DMR on the correct port
Configures DMR_HOST for container access (Windows uses host.docker.internal)
Downloads the default model (ai/llama3.2)

Manual Setup (if not using install.sh):

# Enable DMR (requires Docker Desktop 4.40+)
docker desktop enable model-runner --tcp 12434

# Pull a model
docker model pull ai/llama3.2

# Verify it's running
curl http://localhost:12434/engines/v1/models

CLI Usage - Query DMR directly from terminal:

# Use the built-in llm command (installed with coding)
llm "What is a closure in JavaScript?"

# With options
llm -m ai/qwen2.5-coder "Review this code: $(cat file.js)"
llm -s "You are a code reviewer" "Check this function for bugs"
llm -t 500 "Summarize this in 100 words"

# Pipe input
cat README.md | llm "Summarize this document"
echo "Fix this SQL: SELCT * FORM users" | llm

# Raw JSON output
llm -r "Hello" | jq .usage

Options:

-m, --model MODEL - Model to use (default: ai/llama3.2)
-t, --tokens N - Max tokens (default: 1000)
-s, --system PROMPT - System prompt
-r, --raw - Output raw JSON

Direct curl (for scripts/automation):

curl -X POST http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/llama3.2",
    "messages": [{"role": "user", "content": "Explain async/await"}],
    "max_tokens": 500
  }'

Available Models:

ai/llama3.2 - General purpose (default)
ai/qwen2.5-coder - Code-focused
ai/llama3.2:3B-Q4_K_M - Faster, smaller variant

Configuration (in .env.ports):

DMR_PORT=12434        # API port (default)
DMR_HOST=localhost    # Host for API access

Windows Container Access: Windows containers need host.docker.internal to reach the host's DMR:

# In .env.ports on Windows:
DMR_HOST=host.docker.internal

The installer handles this automatically.

llama.cpp Direct (Legacy Fallback)

If DMR is unavailable, the system falls back to llama.cpp server:

# Build and run llama.cpp server
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j
./llama-server --model /path/to/model.gguf --host 0.0.0.0 --port 11434

# Configure
echo "LOCAL_MODEL_ENDPOINT=http://localhost:11434" >> .env
npm install openai  # llama.cpp uses OpenAI-compatible API

Note: DMR is preferred over direct llama.cpp as it's built into Docker Desktop and requires no separate installation.

Troubleshooting

"Cannot find package '@anthropic-ai/sdk'"

This is normal if you haven't installed that provider. The system will use other available providers.

To fix: Install the SDK you need

npm install @anthropic-ai/sdk
echo "ANTHROPIC_API_KEY=your-key" >> .env

"No providers initialized"

You need at least ONE provider configured:

# Check .env file
cat .env

# Make sure you have at least one API key AND installed the SDK
npm install groq-sdk  # Or whichever provider you chose

Trajectory analyzer not available

This is normal if no provider SDKs are installed yet. The monitor still works for basic logging. To enable trajectory analysis, install at least one provider SDK.

Sensitive Data Routing

For sensitive data, the system automatically routes to local models:

// Automatically uses local model
const result = await engine.infer(prompt, {
  operationType: 'sensitive-analysis'
});

// Or explicitly request local
const result = await engine.infer(prompt, {}, {
  privacy: 'local'
});

Summary

✅ Agent-Agnostic: Works with Claude Code, CoPilot, Cursor, any coding agent ✅ Provider-Agnostic: Supports 5 different LLM providers ✅ Optional Dependencies: Install only what you need ✅ Automatic Fallback: Circuit breaker pattern for reliability ✅ Privacy-First: Route sensitive data to local models ✅ Cost-Optimized: Smart routing to cheapest provider

You are NOT locked into Anthropic! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provider Configuration

Quick Setup

1. Install Provider SDK(s)

2. Configure API Key(s)

3. Verify

Supported Providers

Tier-Based Routing

Per-Agent LLM Mode Control

Dashboard Controls

API Endpoints

Using with Different Agents

Claude Code (Anthropic)

GitHub CoPilot / Cursor (OpenAI)

Any Agent + Groq

Local Models (Privacy-First)

Docker Model Runner (Recommended)

llama.cpp Direct (Legacy Fallback)

Troubleshooting

"Cannot find package '@anthropic-ai/sdk'"

"No providers initialized"

Trajectory analyzer not available

Sensitive Data Routing

Summary

FilesExpand file tree

provider-configuration.md

Latest commit

History

provider-configuration.md

File metadata and controls

Provider Configuration

Quick Setup

1. Install Provider SDK(s)

2. Configure API Key(s)

3. Verify

Supported Providers

Tier-Based Routing

Per-Agent LLM Mode Control

Dashboard Controls

API Endpoints

Using with Different Agents

Claude Code (Anthropic)

GitHub CoPilot / Cursor (OpenAI)

Any Agent + Groq

Local Models (Privacy-First)

Docker Model Runner (Recommended)

llama.cpp Direct (Legacy Fallback)

Troubleshooting

"Cannot find package '@anthropic-ai/sdk'"

"No providers initialized"

Trajectory analyzer not available

Sensitive Data Routing

Summary