Skip to content

Latest commit

 

History

History
294 lines (235 loc) · 8.38 KB

File metadata and controls

294 lines (235 loc) · 8.38 KB

Architecture - Self-Hosted AI Agent

System Overview

A self-hosted AI agent is fundamentally:

  1. Loop: Accept input → Process → Generate response → Execute actions → Repeat
  2. Context: Maintain conversation history and system state
  3. Tools: Interface between LLM and the real world (files, shell, APIs)
  4. Control: Decide when to stop, when to call tools, when to ask for help

Component Architecture

┌─────────────────────────────────────────────────────┐
│                   Agent Controller                   │
│  (Orchestrates conversation loop, manages state)     │
└────────────┬───────────────────────────┬────────────┘
             │                           │
             ▼                           ▼
    ┌────────────────┐          ┌────────────────┐
    │  LLM Interface │          │  Tool Registry │
    │   (Ollama)     │          │                │
    └────────────────┘          └────────┬───────┘
             │                           │
             │                           ▼
             │                  ┌─────────────────┐
             │                  │  Tool Executor  │
             │                  │  - FileOps      │
             │                  │  - ShellExec    │
             │                  │  - WebSearch    │
             │                  │  - CodeRunner   │
             │                  └─────────────────┘
             │
             ▼
    ┌────────────────┐
    │ Context Manager│
    │ - History      │
    │ - Memory       │
    │ - Token Budget │
    └────────────────┘

Core Components

1. Agent Controller (src/agent.py)

Responsibilities:

  • Main conversation loop
  • Route messages between user, LLM, and tools
  • Decide when to terminate (task complete, error, timeout)
  • Handle interruptions and errors gracefully

Key Methods:

class Agent:
    def run(self, user_message: str) -> str:
        """Main entry point - process a user message"""
        
    def _loop(self) -> str:
        """Internal loop: LLM → Tool → LLM until done"""
        
    def _should_continue(self) -> bool:
        """Termination logic (max turns, success, error)"""

2. LLM Interface (src/llm.py)

Responsibilities:

  • Send prompts to Ollama
  • Parse responses (text vs tool calls)
  • Handle streaming (optional)
  • Manage model parameters (temperature, top_p, etc.)

Key Methods:

class OllamaLLM:
    def generate(self, messages: List[Message], tools: List[Tool] = None) -> Response:
        """Send messages to Ollama, get response"""
        
    def supports_tools(self) -> bool:
        """Check if current model supports function calling"""

Ollama API Basics:

# Chat completion
POST http://localhost:11434/api/chat
{
  "model": "llama3.3",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

# With tools (function calling)
POST http://localhost:11434/api/chat
{
  "model": "llama3.3",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "execute_shell",
        "description": "Run a shell command",
        "parameters": {
          "type": "object",
          "properties": {
            "command": {"type": "string"}
          },
          "required": ["command"]
        }
      }
    }
  ]
}

3. Tool System (src/tools/)

Responsibilities:

  • Define available tools (functions the LLM can call)
  • Execute tool requests safely
  • Return structured results to LLM

Tool Structure:

class Tool:
    name: str
    description: str
    parameters: dict  # JSON schema
    
    def execute(self, **kwargs) -> ToolResult:
        """Run the tool, return result"""

class ToolResult:
    success: bool
    output: str
    error: str = None

Example Tools:

  • read_file(path) - Read file content
  • write_file(path, content) - Write to file
  • execute_shell(command) - Run shell command
  • search_web(query) - Search the web
  • run_python(code) - Execute Python code safely

Safety Considerations:

  • Whitelist allowed commands (no rm -rf /)
  • Sandbox file operations (restrict to project directory)
  • Timeout long-running operations
  • Ask for confirmation on destructive actions

4. Context Manager (src/context.py)

Responsibilities:

  • Track conversation history (messages)
  • Estimate token usage (prevent context overflow)
  • Implement memory strategies (summarization, pruning)
  • Persist state between sessions (optional)

Memory Strategies:

A. Sliding Window (simplest)

  • Keep last N messages, drop oldest
  • Fast but loses old context

B. Summarization

  • Periodically summarize old messages into single message
  • Preserves important info, reduces tokens

C. Semantic Memory (advanced)

  • Store embeddings of past conversations
  • Retrieve relevant memories based on current context
  • Similar to RAG (Retrieval Augmented Generation)

Token Budget Example:

class ContextManager:
    def __init__(self, max_tokens=4096):
        self.messages = []
        self.max_tokens = max_tokens
        
    def add_message(self, message):
        self.messages.append(message)
        if self.estimate_tokens() > self.max_tokens * 0.8:
            self._prune()  # Remove or summarize old messages
            
    def estimate_tokens(self) -> int:
        # Rough estimate: ~4 chars per token
        return sum(len(m.content) for m in self.messages) // 4

Data Flow

Typical Request Flow

User: "Create a Python script that prints hello world"
  ↓
Agent Controller: Add user message to context
  ↓
LLM Interface: Send context + tools to Ollama
  ↓
Ollama: Returns tool call → write_file("hello.py", "print('Hello')")
  ↓
Tool Executor: Execute write_file
  ↓
Tool Result: Success, file written
  ↓
Agent Controller: Add tool result to context
  ↓
LLM Interface: Send updated context to Ollama
  ↓
Ollama: Returns text response → "I've created hello.py"
  ↓
Agent Controller: Return response to user

Design Decisions

Why Ollama?

  • Simple HTTP API
  • Runs locally (privacy, no API costs)
  • Good model selection (llama, qwen, codellama)
  • Active development, good docs

Why Python?

  • Fast prototyping
  • Rich ecosystem (requests, pytest, etc.)
  • Easy to read/modify
  • You already know it from MagicMirror work

Why Tool Calling vs Prompt Engineering?

Tool calling (function calling):

  • Structured output (JSON)
  • Easier to parse and execute
  • Models trained specifically for this

Prompt engineering (text-based tools):

  • Works with any model
  • More flexible but error-prone
  • Requires careful parsing

Recommendation: Start with tool calling if your model supports it (llama3.3, qwen2.5-coder do). Fall back to prompt engineering if needed.


Extension Points

Future enhancements:

  1. Multi-agent orchestration - Spawn sub-agents for specific tasks
  2. Plugin system - Load tools dynamically
  3. Web UI - Simple chat interface (Flask/FastAPI)
  4. OpenClaw integration - Connect as ACP harness
  5. Voice interface - Add TTS/STT

Comparison to OpenClaw

Feature Your Agent OpenClaw
Scope Learning project, custom tools Production-ready, multi-channel
Backend Ollama (local) Anthropic, OpenAI, Google, etc.
Complexity Simple, hackable Enterprise features (cron, memory, nodes)
Control Full source access Plugin/config based

Key insight: Building your own teaches you what OpenClaw does under the hood. You'll use OpenClaw better by understanding agent internals.


Next Steps

  1. Review this architecture on the flight
  2. Sketch out pseudocode for the main loop
  3. Think about which tools you want first
  4. Consider edge cases (errors, timeouts, infinite loops)

Read roadmap.md for phased implementation plan.