The system is completely provider-agnostic - use any LLM provider you prefer!
Only install what you'll use:
npm install groq-sdk # Groq (recommended - fastest, cheapest)
npm install @anthropic-ai/sdk # Anthropic (high quality)
npm install openai # OpenAI or local models
npm install @google/generative-ai # Google GeminiAdd to .env:
GROQ_API_KEY=your-groq-key # For Groq
ANTHROPIC_API_KEY=your-anthropic-key # For Anthropic
OPENAI_API_KEY=your-openai-key # For OpenAI
GOOGLE_API_KEY=your-google-key # For Gemini
LOCAL_MODEL_ENDPOINT=http://localhost:11434 # For llama.cpp/vLLM (legacy fallback)node scripts/enhanced-transcript-monitor.js
# Check startup logs for: "[UnifiedInferenceEngine] <provider> SDK loaded"| Provider | Best For | Cost | Privacy | Speed |
|---|---|---|---|---|
| Groq | Most tasks | 💰 Cheapest | ☁️ Cloud | ⚡ Fastest |
| Anthropic | Quality | 💰💰 Mid | ☁️ Cloud | ⚡⚡ Fast |
| OpenAI | GPT-4 | 💰💰💰 High | ☁️ Cloud | ⚡⚡ Fast |
| Gemini | 💰💰 Mid | ☁️ Cloud | ⚡⚡ Fast | |
| Local | Privacy | 🆓 Free | 🔒 Local | ⚡⚡⚡ Variable |
The system uses tier-based provider routing via the @rapid/llm-proxy unified layer, with Copilot as the primary provider for all tiers. Copilot scales beautifully with parallelism (0.77s effective per call at 10 concurrent), and batch agents already use Promise.all with concurrency 5-20.
| Tier | Provider Priority | Use Cases |
|---|---|---|
| Fast | Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models | Simple extraction, parsing, basic classification |
| Standard | Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models | Semantic analysis, ontology classification |
| Premium | Copilot → Groq → Claude Code → Anthropic → OpenAI → Gemini → GitHub Models | Insight generation, pattern recognition, QA review |
| Local Fallback | DMR → Ollama | Always available when cloud providers fail |
See LLM Architecture for complete tier configuration.
The UKB workflow system supports three LLM modes that can be set globally or per-agent:
| Mode | Icon | Description | Use Case |
|---|---|---|---|
| Mock | 🧪 Orange | Fake responses | Testing, development |
| Local | 🖥️ Purple | DMR/llama.cpp | Privacy, offline, cost savings |
| Public | ☁️ Green | Cloud APIs | Production quality |
- Global Dropdown - Set default mode for all agents in the modal toolbar
- Per-Agent Button - Override specific agents in the node sidebar
- Node Badge - Shows current mode (M/L/P) on each graph node
# Get current LLM state
curl http://localhost:3033/api/ukb/llm-state
# Set global mode
curl -X POST http://localhost:3033/api/ukb/llm-mode/global \
-H "Content-Type: application/json" \
-d '{"mode": "local"}'
# Set per-agent override
curl -X POST http://localhost:3033/api/ukb/llm-mode/agent \
-H "Content-Type: application/json" \
-d '{"agentId": "semantic_analysis", "mode": "mock"}'
# Clear per-agent override
curl -X DELETE http://localhost:3033/api/ukb/llm-mode/agent/semantic_analysisecho "ANTHROPIC_API_KEY=your-key" >> .env
npm install @anthropic-ai/sdk
codingecho "OPENAI_API_KEY=your-key" >> .env
npm install openai
# Use with your editorecho "GROQ_API_KEY=your-key" >> .env
npm install groq-sdk
# Works with ANY coding agent!DMR uses llama.cpp via Docker Desktop - no separate installation needed.
Cross-Platform Support:
| Platform | GPU Acceleration | Notes |
|---|---|---|
| macOS (Apple Silicon) | Metal | Automatic, fastest option |
| macOS (Intel) | CPU | AVX2 optimized |
| Linux + NVIDIA | CUDA | Requires CUDA toolkit |
| Linux + AMD | ROCm/Vulkan | Requires ROCm drivers |
| Linux (CPU) | AVX2/AVX512 | Automatic fallback |
| Windows + NVIDIA | CUDA | Requires CUDA toolkit |
| Windows (CPU) | DirectML | Automatic fallback |
The install.sh script automatically:
- Detects your platform and available GPU acceleration
- Enables DMR on the correct port
- Configures
DMR_HOSTfor container access (Windows useshost.docker.internal) - Downloads the default model (
ai/llama3.2)
Manual Setup (if not using install.sh):
# Enable DMR (requires Docker Desktop 4.40+)
docker desktop enable model-runner --tcp 12434
# Pull a model
docker model pull ai/llama3.2
# Verify it's running
curl http://localhost:12434/engines/v1/modelsCLI Usage - Query DMR directly from terminal:
# Use the built-in llm command (installed with coding)
llm "What is a closure in JavaScript?"
# With options
llm -m ai/qwen2.5-coder "Review this code: $(cat file.js)"
llm -s "You are a code reviewer" "Check this function for bugs"
llm -t 500 "Summarize this in 100 words"
# Pipe input
cat README.md | llm "Summarize this document"
echo "Fix this SQL: SELCT * FORM users" | llm
# Raw JSON output
llm -r "Hello" | jq .usageOptions:
-m, --model MODEL- Model to use (default: ai/llama3.2)-t, --tokens N- Max tokens (default: 1000)-s, --system PROMPT- System prompt-r, --raw- Output raw JSON
Direct curl (for scripts/automation):
curl -X POST http://localhost:12434/engines/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/llama3.2",
"messages": [{"role": "user", "content": "Explain async/await"}],
"max_tokens": 500
}'Available Models:
ai/llama3.2- General purpose (default)ai/qwen2.5-coder- Code-focusedai/llama3.2:3B-Q4_K_M- Faster, smaller variant
Configuration (in .env.ports):
DMR_PORT=12434 # API port (default)
DMR_HOST=localhost # Host for API accessWindows Container Access:
Windows containers need host.docker.internal to reach the host's DMR:
# In .env.ports on Windows:
DMR_HOST=host.docker.internalThe installer handles this automatically.
If DMR is unavailable, the system falls back to llama.cpp server:
# Build and run llama.cpp server
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j
./llama-server --model /path/to/model.gguf --host 0.0.0.0 --port 11434
# Configure
echo "LOCAL_MODEL_ENDPOINT=http://localhost:11434" >> .env
npm install openai # llama.cpp uses OpenAI-compatible APINote: DMR is preferred over direct llama.cpp as it's built into Docker Desktop and requires no separate installation.
This is normal if you haven't installed that provider. The system will use other available providers.
To fix: Install the SDK you need
npm install @anthropic-ai/sdk
echo "ANTHROPIC_API_KEY=your-key" >> .envYou need at least ONE provider configured:
# Check .env file
cat .env
# Make sure you have at least one API key AND installed the SDK
npm install groq-sdk # Or whichever provider you choseThis is normal if no provider SDKs are installed yet. The monitor still works for basic logging. To enable trajectory analysis, install at least one provider SDK.
For sensitive data, the system automatically routes to local models:
// Automatically uses local model
const result = await engine.infer(prompt, {
operationType: 'sensitive-analysis'
});
// Or explicitly request local
const result = await engine.infer(prompt, {}, {
privacy: 'local'
});✅ Agent-Agnostic: Works with Claude Code, CoPilot, Cursor, any coding agent ✅ Provider-Agnostic: Supports 5 different LLM providers ✅ Optional Dependencies: Install only what you need ✅ Automatic Fallback: Circuit breaker pattern for reliability ✅ Privacy-First: Route sensitive data to local models ✅ Cost-Optimized: Smart routing to cheapest provider
You are NOT locked into Anthropic! 🎉


