A self-hosted AI agent is fundamentally:
- Loop: Accept input → Process → Generate response → Execute actions → Repeat
- Context: Maintain conversation history and system state
- Tools: Interface between LLM and the real world (files, shell, APIs)
- Control: Decide when to stop, when to call tools, when to ask for help
┌─────────────────────────────────────────────────────┐
│ Agent Controller │
│ (Orchestrates conversation loop, manages state) │
└────────────┬───────────────────────────┬────────────┘
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ LLM Interface │ │ Tool Registry │
│ (Ollama) │ │ │
└────────────────┘ └────────┬───────┘
│ │
│ ▼
│ ┌─────────────────┐
│ │ Tool Executor │
│ │ - FileOps │
│ │ - ShellExec │
│ │ - WebSearch │
│ │ - CodeRunner │
│ └─────────────────┘
│
▼
┌────────────────┐
│ Context Manager│
│ - History │
│ - Memory │
│ - Token Budget │
└────────────────┘
Responsibilities:
- Main conversation loop
- Route messages between user, LLM, and tools
- Decide when to terminate (task complete, error, timeout)
- Handle interruptions and errors gracefully
Key Methods:
class Agent:
def run(self, user_message: str) -> str:
"""Main entry point - process a user message"""
def _loop(self) -> str:
"""Internal loop: LLM → Tool → LLM until done"""
def _should_continue(self) -> bool:
"""Termination logic (max turns, success, error)"""Responsibilities:
- Send prompts to Ollama
- Parse responses (text vs tool calls)
- Handle streaming (optional)
- Manage model parameters (temperature, top_p, etc.)
Key Methods:
class OllamaLLM:
def generate(self, messages: List[Message], tools: List[Tool] = None) -> Response:
"""Send messages to Ollama, get response"""
def supports_tools(self) -> bool:
"""Check if current model supports function calling"""Ollama API Basics:
# Chat completion
POST http://localhost:11434/api/chat
{
"model": "llama3.3",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}
# With tools (function calling)
POST http://localhost:11434/api/chat
{
"model": "llama3.3",
"messages": [...],
"tools": [
{
"type": "function",
"function": {
"name": "execute_shell",
"description": "Run a shell command",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "string"}
},
"required": ["command"]
}
}
}
]
}Responsibilities:
- Define available tools (functions the LLM can call)
- Execute tool requests safely
- Return structured results to LLM
Tool Structure:
class Tool:
name: str
description: str
parameters: dict # JSON schema
def execute(self, **kwargs) -> ToolResult:
"""Run the tool, return result"""
class ToolResult:
success: bool
output: str
error: str = NoneExample Tools:
read_file(path)- Read file contentwrite_file(path, content)- Write to fileexecute_shell(command)- Run shell commandsearch_web(query)- Search the webrun_python(code)- Execute Python code safely
Safety Considerations:
- Whitelist allowed commands (no
rm -rf /) - Sandbox file operations (restrict to project directory)
- Timeout long-running operations
- Ask for confirmation on destructive actions
Responsibilities:
- Track conversation history (messages)
- Estimate token usage (prevent context overflow)
- Implement memory strategies (summarization, pruning)
- Persist state between sessions (optional)
Memory Strategies:
A. Sliding Window (simplest)
- Keep last N messages, drop oldest
- Fast but loses old context
B. Summarization
- Periodically summarize old messages into single message
- Preserves important info, reduces tokens
C. Semantic Memory (advanced)
- Store embeddings of past conversations
- Retrieve relevant memories based on current context
- Similar to RAG (Retrieval Augmented Generation)
Token Budget Example:
class ContextManager:
def __init__(self, max_tokens=4096):
self.messages = []
self.max_tokens = max_tokens
def add_message(self, message):
self.messages.append(message)
if self.estimate_tokens() > self.max_tokens * 0.8:
self._prune() # Remove or summarize old messages
def estimate_tokens(self) -> int:
# Rough estimate: ~4 chars per token
return sum(len(m.content) for m in self.messages) // 4User: "Create a Python script that prints hello world"
↓
Agent Controller: Add user message to context
↓
LLM Interface: Send context + tools to Ollama
↓
Ollama: Returns tool call → write_file("hello.py", "print('Hello')")
↓
Tool Executor: Execute write_file
↓
Tool Result: Success, file written
↓
Agent Controller: Add tool result to context
↓
LLM Interface: Send updated context to Ollama
↓
Ollama: Returns text response → "I've created hello.py"
↓
Agent Controller: Return response to user
- Simple HTTP API
- Runs locally (privacy, no API costs)
- Good model selection (llama, qwen, codellama)
- Active development, good docs
- Fast prototyping
- Rich ecosystem (requests, pytest, etc.)
- Easy to read/modify
- You already know it from MagicMirror work
Tool calling (function calling):
- Structured output (JSON)
- Easier to parse and execute
- Models trained specifically for this
Prompt engineering (text-based tools):
- Works with any model
- More flexible but error-prone
- Requires careful parsing
Recommendation: Start with tool calling if your model supports it (llama3.3, qwen2.5-coder do). Fall back to prompt engineering if needed.
Future enhancements:
- Multi-agent orchestration - Spawn sub-agents for specific tasks
- Plugin system - Load tools dynamically
- Web UI - Simple chat interface (Flask/FastAPI)
- OpenClaw integration - Connect as ACP harness
- Voice interface - Add TTS/STT
| Feature | Your Agent | OpenClaw |
|---|---|---|
| Scope | Learning project, custom tools | Production-ready, multi-channel |
| Backend | Ollama (local) | Anthropic, OpenAI, Google, etc. |
| Complexity | Simple, hackable | Enterprise features (cron, memory, nodes) |
| Control | Full source access | Plugin/config based |
Key insight: Building your own teaches you what OpenClaw does under the hood. You'll use OpenClaw better by understanding agent internals.
- Review this architecture on the flight
- Sketch out pseudocode for the main loop
- Think about which tools you want first
- Consider edge cases (errors, timeouts, infinite loops)
Read roadmap.md for phased implementation plan.