Skip to content

Latest commit

 

History

History
313 lines (234 loc) · 7.97 KB

File metadata and controls

313 lines (234 loc) · 7.97 KB

Roadmap - Phased Development Plan

Overview

Build incrementally. Each phase adds capability without breaking previous work.

Timeline: ~2-3 weeks (assuming part-time work)


Phase 1: Foundation (Days 1-3)

Goal: Get Ollama running and build basic inference loop

Tasks

  1. Install Ollama on Alienware

    curl -fsSL https://ollama.com/install.sh | sh
    ollama serve  # Start server
  2. Pull and test models

    ollama pull llama3.3        # General purpose, 70B
    ollama pull qwen2.5-coder:7b  # Code-specific, smaller
    ollama pull codellama:7b    # Alternative coder
  3. Manual API testing

    # Test basic inference
    curl http://localhost:11434/api/chat -d '{
      "model": "llama3.3",
      "messages": [{"role": "user", "content": "Say hello"}],
      "stream": false
    }'
  4. Create minimal Python client

    • src/llm.py - Simple Ollama HTTP wrapper
    • src/agent.py - Single turn: user input → LLM → output
    • No tools yet, just text responses
  5. Test basic conversation

    python src/agent.py "What is Python?"
    python src/agent.py "Write a hello world function"

Success Criteria

  • ✅ Ollama running, models downloaded
  • ✅ Can send prompts via API
  • ✅ Agent script returns LLM responses
  • ✅ Code runs without errors

Phase 2: Tool System (Days 4-7)

Goal: Add tool calling - let the LLM execute functions

Tasks

  1. Design tool interface

    • src/tools/base.py - Tool base class
    • Define tool schema (name, description, parameters)
  2. Implement first tools

    • src/tools/filesystem.py:
      • read_file(path) - Read file content
      • write_file(path, content) - Write file
      • list_directory(path) - List files
  3. Add tool registry

    • src/tools/registry.py - Register and lookup tools
    • Convert tools to Ollama function format
  4. Implement tool calling loop

    • Modify src/agent.py:
      • Send tools to LLM
      • Parse tool call responses
      • Execute requested tool
      • Send result back to LLM
      • Repeat until LLM returns text response
  5. Test tool execution

    python src/agent.py "Read the file test.txt"
    python src/agent.py "Create a file called hello.py with a hello world function"

Success Criteria

  • ✅ LLM can call tools
  • ✅ Tools execute correctly
  • ✅ Results feed back to LLM
  • ✅ Agent completes multi-step tasks

Phase 3: Context Management (Days 8-10)

Goal: Handle conversation history and token limits

Tasks

  1. Implement context manager

    • src/context.py:
      • Store message history
      • Estimate token usage
      • Prune old messages when needed
  2. Add conversation persistence

    • Save conversations to ~/.agent/sessions/
    • Load previous context (optional)
  3. Implement token management strategies

    • Sliding window (keep last N messages)
    • Summarization (compress old context)
  4. Multi-turn conversations

    • Modify agent to maintain context across multiple user inputs
    • Add /clear command to reset context
  5. Test context handling

    python src/agent.py
    > Create a file called test.py
    > Now add a function to it
    > What's in the file?

Success Criteria

  • ✅ Agent remembers previous messages
  • ✅ Context doesn't overflow token limit
  • ✅ Can handle long conversations

Phase 4: Safety & Error Handling (Days 11-13)

Goal: Make it robust and safe to use

Tasks

  1. Add safety checks

    • Whitelist allowed shell commands
    • Restrict file operations to safe directories
    • Timeout long-running operations
    • Ask confirmation for destructive actions
  2. Implement error handling

    • Graceful failure when tools error
    • Retry logic for transient failures
    • Clear error messages to user
  3. Add more tools

    • src/tools/shell.py:
      • execute_shell(command) - Run shell commands (safe)
    • src/tools/web.py:
      • search_web(query) - Web search
    • src/tools/python.py:
      • run_python(code) - Execute Python code in sandbox
  4. Testing suite

    • tests/test_tools.py - Unit tests for each tool
    • tests/test_agent.py - Integration tests
    • tests/test_context.py - Context management tests
  5. Documentation

    • Update architecture.md with safety notes
    • Add usage examples to README.md

Success Criteria

  • ✅ Agent handles errors gracefully
  • ✅ Dangerous operations require confirmation
  • ✅ All tools have tests
  • ✅ Project is well-documented

Phase 5: Polish & Features (Days 14+)

Goal: Make it pleasant to use and extend

Tasks

  1. Interactive mode

    • REPL-style interface
    • Command history
    • Tab completion (optional)
  2. Configuration system

    • config.yaml - Model selection, safety settings, tool config
    • Command-line flags (--model, --debug, --tools)
  3. Logging & debugging

    • Log all LLM requests/responses
    • Debug mode showing tool execution
    • Performance metrics (tokens used, time per turn)
  4. Advanced features (pick what interests you)

    • Streaming responses (real-time output)
    • Multi-agent orchestration (spawn sub-agents)
    • Plugin system (load tools from external modules)
    • Web UI (Flask-based chat interface)
    • Voice interface (Whisper STT + TTS)
  5. Integration experiments

    • Connect to OpenClaw as ACP harness
    • Integrate with MagicMirror (control modules via agent)
    • Home Assistant integration

Success Criteria

  • ✅ Agent is production-ready for personal use
  • ✅ Easy to add new tools
  • ✅ Well-tested and documented
  • ✅ You understand every line of code

Milestone Checklist

Track progress as you build:

Phase 1: Foundation

  • Ollama installed and running
  • Models downloaded and tested
  • Basic Python client works
  • Single-turn conversations work

Phase 2: Tools

  • Tool interface designed
  • File operations implemented
  • Tool calling loop works
  • Multi-step tasks complete

Phase 3: Context

  • Context manager implemented
  • Token estimation works
  • Multi-turn conversations work
  • History persists (optional)

Phase 4: Safety

  • Safety checks in place
  • Error handling robust
  • Additional tools added
  • Test suite passing

Phase 5: Polish

  • Interactive mode works
  • Configuration system
  • Logging and debugging
  • Advanced features (choose N)

Decisions to Make During Flight

Think about these while reviewing docs:

  1. Which model to use as primary?

    • llama3.3 (70B) - Smarter but slower
    • qwen2.5-coder:7b - Faster, good for code
    • Test both, see what works
  2. Tool calling vs prompt engineering?

    • Start with tool calling (cleaner)
    • Fall back to prompts if model doesn't support it
  3. Which tools to prioritize?

    • File ops (read/write) - Essential
    • Shell execution - Powerful but risky
    • Web search - Useful for research tasks
    • Code execution - Great for testing snippets
  4. Safety vs convenience?

    • Always ask before destructive operations?
    • Or trust the LLM with whitelisted commands?
    • Your call based on risk tolerance
  5. How to test?

    • Unit tests for each component?
    • Integration tests for full workflows?
    • Manual testing only?
    • Combination (recommended)

Resources to Review on Flight

See reading-list.md for detailed resources.

Quick picks:

  • Anthropic's tool use guide (concepts apply to any LLM)
  • Ollama API docs
  • OpenClaw architecture (see what you're replicating)
  • Your own MagicMirror code (patterns to reuse)

After the Flight

When you land Thursday:

  1. Install Ollama on Alienware (10 minutes)
  2. Run through Phase 1 tasks (2-3 hours)
  3. Report back - share what worked, what didn't
  4. Adjust plan based on what you learned

This isn't just a project - it's your bootcamp for understanding AI agents. Every line you write teaches you something you can apply to future work.

Let's build something cool. 🚀