Complete reference for configuring AI models in JOC
- Overview
- Default Models
- Model Properties
- Provider Configuration
- Custom Models
- Best Practices
- Troubleshooting
JOC uses ollama cloud models by default, providing a balance of capability and cost-effectiveness. Models are configured in opencode.jsonc.
| Model | Context | Output | Best For | Notes |
|---|---|---|---|---|
| glm-5.1:cloud | 202K | 131K | General purpose, most tasks | Default for most agents |
| kimi-k2.5:cloud | 262K | 262K | Extended context, long documents | Same input/output context |
| minimax-m2.7:cloud | 205K | 128K | High performance tasks | Balanced performance |
| qwen3.5:cloud | 262K | 32K | Long document processing | Limited output |
| Task Type | Recommended Model | Reason |
|---|---|---|
| Implementation | glm-5.1:cloud | Balanced, cost-effective |
| Architecture | opus | Deep reasoning (override) |
| Search/Explore | haiku | Fast, efficient |
| Documentation | haiku | Simple generation |
| Security Review | opus | Critical analysis (override) |
| Long documents | kimi-k2.5:cloud | Extended context |
| Complex analysis | minimax-m2.7:cloud | High performance |
The maximum input context size in tokens.
Context = System Prompt + User Message + Conversation History
Example:
System: 1,000 tokens
User: 500 tokens
History: 10,000 tokens
Total: 11,500 tokens (must be < context_limit)
The maximum output size in tokens.
Output = Generated Response
For code generation:
- Short functions: ~500 tokens
- Full files: ~2,000 tokens
- Multiple files: Limited by output
For long outputs, models may need continuation.
The _launch property controls automatic model startup:
{
"provider": {
"opencode": {
"options": {}
},
"ollama": {
"models": {
"glm-5.1:cloud": {
"_launch": true,
"limit": {
"context": 202752,
"output": 131072
},
"name": "glm-5.1:cloud"
}
}
}
}
}{
"provider": {
"opencode": {
"options": {}
},
"ollama": {
"models": {
"glm-5.1:cloud": { "_launch": true },
"kimi-k2.5:cloud": { "_launch": true }
}
},
"openrouter": {
"models": {
"glm-5:cloud": {
"limit": { "context": 200000, "output": 131072 }
}
}
}
}
}{
"provider": {
"ollama": {
"models": {
"glm-5.1:cloud": {
"_launch": true,
"env": {
"OLLAMA_HOST": "${OLLAMA_HOST}",
"OLLAMA_API_KEY": "${OLLAMA_API_KEY}"
}
}
}
}
}
}- Define in opencode.jsonc:
{
"provider": {
"ollama": {
"models": {
"my-custom-model": {
"_launch": true,
"limit": {
"context": 128000,
"output": 4096
},
"name": "my-custom-model"
}
}
}
}
}- Assign to Agents:
---
name: my-agent
description: Uses custom model
model: ollama/my-custom-model
mode: subagent
---- Use in Skills:
<Configuration>
model: my-custom-model
</Configuration>{
"provider": {
"ollama": {
"models": {
"code-specialist": {
"_launch": true,
"limit": { "context": 100000, "output": 16000 },
"parameters": {
"temperature": 0.1, // Lower = more focused
"top_p": 0.95,
"frequency_penalty": 0.1,
"presence_penalty": 0.1
},
"defaults": {
"system_prompt": "You are a code specialist...",
"stop_sequences": ["```", "---END---"]
}
}
}
}
}
}Configure different models for different agent types:
{
"provider": {
"ollama": {
"models": {
// Fast tier - reading, searching, simple generation
"fast-model": {
"limit": { "context": 50000, "output": 2000 },
"tier": "fast"
},
// Standard tier - most operations
"glm-5.1:cloud": {
"limit": { "context": 200000, "output": 100000 },
"tier": "standard"
},
// Deep tier - complex reasoning
"deep-model": {
"limit": { "context": 300000, "output": 20000 },
"tier": "deep"
}
}
}
}
}{
"model_routing": {
"default": "glm-5.1:cloud",
"routing": {
"explore": "fast-model",
"executor": "glm-5.1:cloud",
"architect": "deep-model",
"security-reviewer": "deep-model"
}
}
}-
Monitor context usage:
// Large files consume context // Split into chunks if needed const chunks = await splitLargeFile(file, maxChunkSize)
-
Use conversation compaction:
- Let JOC compact when needed
- Key state is preserved
-
Prefer focused prompts:
- Be specific
- Avoid redundant context
-
Request appropriate sizes:
// Bad: Request entire file if only need function "Write the entire UserService.ts file" // Good: Request specific part "Write the authenticate method for UserService"
-
Use streaming for long outputs:
// Stream long responses for await (const chunk of stream) { process(chunk) }
-
Break down complex tasks:
- Multiple smaller requests
- Assemble results
-
Use tiered models:
{ "model_routing": { "explore": "haiku", // Fast, cheap "default": "glm-5.1:cloud", // Standard "architecture": "opus" // Expensive, use sparingly } } -
Cache frequently used context:
// Store common context once await agentContext({ action: "setMemory", data: { techStack: { ... } } })
-
Batch similar operations:
// Bad: Multiple agent calls await agent("fix", { file: "a.ts" }) await agent("fix", { file: "b.ts" }) // Good: Single call with multiple files await agent("fix", { files: ["a.ts", "b.ts"] })
Error: Model 'glm-5.1:cloud' not found
Solutions:
- Check model name spelling
- Verify provider configuration
- Ensure
_launch: trueis set
Error: Context window exceeded (150000 > 100000)
Solutions:
- Reduce input size
- Use model with larger context
- Clear conversation history
// Clear history to free context
await clearHistory()Error: Output truncated at 8000 tokens
Solutions:
- Use model with larger output limit
- Request smaller chunks
- Use streaming
Error: Rate limit exceeded
Solutions:
- Implement backoff
- Reduce concurrent requests
- Use multiple API keys (rotate)
// Exponential backoff
let delay = 1000
while (retries < maxRetries) {
try {
return await generate(prompt)
} catch (e) {
if (e.status === 429) {
await sleep(delay)
delay *= 2
retries++
} else throw e
}
}Causes:
- Large context
- Complex reasoning required
- Network latency
- Model under load
Solutions:
- Use faster tier model
- Reduce context
- Use streaming for progress feedback
- Check network connectivity
- Agents - Agent model assignments
- Skills - Skill model requirements
- Installation - Initial configuration
{ "glm-5.1:cloud": { "_launch": true // Auto-start on first use } }