Architecture - Self-Hosted AI Agent

System Overview

A self-hosted AI agent is fundamentally:

Loop: Accept input → Process → Generate response → Execute actions → Repeat
Context: Maintain conversation history and system state
Tools: Interface between LLM and the real world (files, shell, APIs)
Control: Decide when to stop, when to call tools, when to ask for help

Component Architecture

┌─────────────────────────────────────────────────────┐
│                   Agent Controller                   │
│  (Orchestrates conversation loop, manages state)     │
└────────────┬───────────────────────────┬────────────┘
             │                           │
             ▼                           ▼
    ┌────────────────┐          ┌────────────────┐
    │  LLM Interface │          │  Tool Registry │
    │   (Ollama)     │          │                │
    └────────────────┘          └────────┬───────┘
             │                           │
             │                           ▼
             │                  ┌─────────────────┐
             │                  │  Tool Executor  │
             │                  │  - FileOps      │
             │                  │  - ShellExec    │
             │                  │  - WebSearch    │
             │                  │  - CodeRunner   │
             │                  └─────────────────┘
             │
             ▼
    ┌────────────────┐
    │ Context Manager│
    │ - History      │
    │ - Memory       │
    │ - Token Budget │
    └────────────────┘

Core Components

1. Agent Controller (`src/agent.py`)

Responsibilities:

Main conversation loop
Route messages between user, LLM, and tools
Decide when to terminate (task complete, error, timeout)
Handle interruptions and errors gracefully

Key Methods:

class Agent:
    def run(self, user_message: str) -> str:
        """Main entry point - process a user message"""
        
    def _loop(self) -> str:
        """Internal loop: LLM → Tool → LLM until done"""
        
    def _should_continue(self) -> bool:
        """Termination logic (max turns, success, error)"""

2. LLM Interface (`src/llm.py`)

Responsibilities:

Send prompts to Ollama
Parse responses (text vs tool calls)
Handle streaming (optional)
Manage model parameters (temperature, top_p, etc.)

Key Methods:

class OllamaLLM:
    def generate(self, messages: List[Message], tools: List[Tool] = None) -> Response:
        """Send messages to Ollama, get response"""
        
    def supports_tools(self) -> bool:
        """Check if current model supports function calling"""

Ollama API Basics:

# Chat completion
POST http://localhost:11434/api/chat
{
  "model": "llama3.3",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

# With tools (function calling)
POST http://localhost:11434/api/chat
{
  "model": "llama3.3",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "execute_shell",
        "description": "Run a shell command",
        "parameters": {
          "type": "object",
          "properties": {
            "command": {"type": "string"}
          },
          "required": ["command"]
        }
      }
    }
  ]
}

3. Tool System (`src/tools/`)

Responsibilities:

Define available tools (functions the LLM can call)
Execute tool requests safely
Return structured results to LLM

Tool Structure:

class Tool:
    name: str
    description: str
    parameters: dict  # JSON schema
    
    def execute(self, **kwargs) -> ToolResult:
        """Run the tool, return result"""

class ToolResult:
    success: bool
    output: str
    error: str = None

Example Tools:

read_file(path) - Read file content
write_file(path, content) - Write to file
execute_shell(command) - Run shell command
search_web(query) - Search the web
run_python(code) - Execute Python code safely

Safety Considerations:

Whitelist allowed commands (no rm -rf /)
Sandbox file operations (restrict to project directory)
Timeout long-running operations
Ask for confirmation on destructive actions

4. Context Manager (`src/context.py`)

Responsibilities:

Track conversation history (messages)
Estimate token usage (prevent context overflow)
Implement memory strategies (summarization, pruning)
Persist state between sessions (optional)

Memory Strategies:

A. Sliding Window (simplest)

Keep last N messages, drop oldest
Fast but loses old context

B. Summarization

Periodically summarize old messages into single message
Preserves important info, reduces tokens

C. Semantic Memory (advanced)

Store embeddings of past conversations
Retrieve relevant memories based on current context
Similar to RAG (Retrieval Augmented Generation)

Token Budget Example:

class ContextManager:
    def __init__(self, max_tokens=4096):
        self.messages = []
        self.max_tokens = max_tokens
        
    def add_message(self, message):
        self.messages.append(message)
        if self.estimate_tokens() > self.max_tokens * 0.8:
            self._prune()  # Remove or summarize old messages
            
    def estimate_tokens(self) -> int:
        # Rough estimate: ~4 chars per token
        return sum(len(m.content) for m in self.messages) // 4

Data Flow

Typical Request Flow

User: "Create a Python script that prints hello world"
  ↓
Agent Controller: Add user message to context
  ↓
LLM Interface: Send context + tools to Ollama
  ↓
Ollama: Returns tool call → write_file("hello.py", "print('Hello')")
  ↓
Tool Executor: Execute write_file
  ↓
Tool Result: Success, file written
  ↓
Agent Controller: Add tool result to context
  ↓
LLM Interface: Send updated context to Ollama
  ↓
Ollama: Returns text response → "I've created hello.py"
  ↓
Agent Controller: Return response to user

Design Decisions

Why Ollama?

Simple HTTP API
Runs locally (privacy, no API costs)
Good model selection (llama, qwen, codellama)
Active development, good docs

Why Python?

Fast prototyping
Rich ecosystem (requests, pytest, etc.)
Easy to read/modify
You already know it from MagicMirror work

Why Tool Calling vs Prompt Engineering?

Tool calling (function calling):

Structured output (JSON)
Easier to parse and execute
Models trained specifically for this

Prompt engineering (text-based tools):

Works with any model
More flexible but error-prone
Requires careful parsing

Recommendation: Start with tool calling if your model supports it (llama3.3, qwen2.5-coder do). Fall back to prompt engineering if needed.

Extension Points

Future enhancements:

Multi-agent orchestration - Spawn sub-agents for specific tasks
Plugin system - Load tools dynamically
Web UI - Simple chat interface (Flask/FastAPI)
OpenClaw integration - Connect as ACP harness
Voice interface - Add TTS/STT

Comparison to OpenClaw

Feature	Your Agent	OpenClaw
Scope	Learning project, custom tools	Production-ready, multi-channel
Backend	Ollama (local)	Anthropic, OpenAI, Google, etc.
Complexity	Simple, hackable	Enterprise features (cron, memory, nodes)
Control	Full source access	Plugin/config based

Key insight: Building your own teaches you what OpenClaw does under the hood. You'll use OpenClaw better by understanding agent internals.

Next Steps

Review this architecture on the flight
Sketch out pseudocode for the main loop
Think about which tools you want first
Consider edge cases (errors, timeouts, infinite loops)

Read roadmap.md for phased implementation plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture - Self-Hosted AI Agent

System Overview

Component Architecture

Core Components

1. Agent Controller (`src/agent.py`)

2. LLM Interface (`src/llm.py`)

3. Tool System (`src/tools/`)

4. Context Manager (`src/context.py`)

Data Flow

Typical Request Flow

Design Decisions

Why Ollama?

Why Python?

Why Tool Calling vs Prompt Engineering?

Extension Points

Comparison to OpenClaw

Next Steps

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture - Self-Hosted AI Agent

System Overview

Component Architecture

Core Components

1. Agent Controller (src/agent.py)

2. LLM Interface (src/llm.py)

3. Tool System (src/tools/)

4. Context Manager (src/context.py)

Data Flow

Typical Request Flow

Design Decisions

Why Ollama?

Why Python?

Why Tool Calling vs Prompt Engineering?

Extension Points

Comparison to OpenClaw

Next Steps

1. Agent Controller (`src/agent.py`)

2. LLM Interface (`src/llm.py`)

3. Tool System (`src/tools/`)

4. Context Manager (`src/context.py`)