Roadmap - Phased Development Plan

Overview

Build incrementally. Each phase adds capability without breaking previous work.

Timeline: ~2-3 weeks (assuming part-time work)

Phase 1: Foundation (Days 1-3)

Goal: Get Ollama running and build basic inference loop

Tasks

Install Ollama on Alienware

curl -fsSL https://ollama.com/install.sh | sh
ollama serve  # Start server

Pull and test models

ollama pull llama3.3        # General purpose, 70B
ollama pull qwen2.5-coder:7b  # Code-specific, smaller
ollama pull codellama:7b    # Alternative coder

Manual API testing

# Test basic inference
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.3",
  "messages": [{"role": "user", "content": "Say hello"}],
  "stream": false
}'

Create minimal Python client
- src/llm.py - Simple Ollama HTTP wrapper
- src/agent.py - Single turn: user input → LLM → output
- No tools yet, just text responses

Test basic conversation

python src/agent.py "What is Python?"
python src/agent.py "Write a hello world function"

Success Criteria

✅ Ollama running, models downloaded
✅ Can send prompts via API
✅ Agent script returns LLM responses
✅ Code runs without errors

Phase 2: Tool System (Days 4-7)

Goal: Add tool calling - let the LLM execute functions

Tasks

Design tool interface
- src/tools/base.py - Tool base class
- Define tool schema (name, description, parameters)
Implement first tools
- src/tools/filesystem.py:
  - read_file(path) - Read file content
  - write_file(path, content) - Write file
  - list_directory(path) - List files
Add tool registry
- src/tools/registry.py - Register and lookup tools
- Convert tools to Ollama function format
Implement tool calling loop
- Modify src/agent.py:
  - Send tools to LLM
  - Parse tool call responses
  - Execute requested tool
  - Send result back to LLM
  - Repeat until LLM returns text response

Test tool execution

python src/agent.py "Read the file test.txt"
python src/agent.py "Create a file called hello.py with a hello world function"

Success Criteria

✅ LLM can call tools
✅ Tools execute correctly
✅ Results feed back to LLM
✅ Agent completes multi-step tasks

Phase 3: Context Management (Days 8-10)

Goal: Handle conversation history and token limits

Tasks

Implement context manager
- src/context.py:
  - Store message history
  - Estimate token usage
  - Prune old messages when needed
Add conversation persistence
- Save conversations to ~/.agent/sessions/
- Load previous context (optional)
Implement token management strategies
- Sliding window (keep last N messages)
- Summarization (compress old context)
Multi-turn conversations
- Modify agent to maintain context across multiple user inputs
- Add /clear command to reset context

Test context handling

python src/agent.py
> Create a file called test.py
> Now add a function to it
> What's in the file?

Success Criteria

✅ Agent remembers previous messages
✅ Context doesn't overflow token limit
✅ Can handle long conversations

Phase 4: Safety & Error Handling (Days 11-13)

Goal: Make it robust and safe to use

Tasks

Add safety checks
- Whitelist allowed shell commands
- Restrict file operations to safe directories
- Timeout long-running operations
- Ask confirmation for destructive actions
Implement error handling
- Graceful failure when tools error
- Retry logic for transient failures
- Clear error messages to user
Add more tools
- src/tools/shell.py:
  - execute_shell(command) - Run shell commands (safe)
- src/tools/web.py:
  - search_web(query) - Web search
- src/tools/python.py:
  - run_python(code) - Execute Python code in sandbox
Testing suite
- tests/test_tools.py - Unit tests for each tool
- tests/test_agent.py - Integration tests
- tests/test_context.py - Context management tests
Documentation
- Update architecture.md with safety notes
- Add usage examples to README.md

Success Criteria

✅ Agent handles errors gracefully
✅ Dangerous operations require confirmation
✅ All tools have tests
✅ Project is well-documented

Phase 5: Polish & Features (Days 14+)

Goal: Make it pleasant to use and extend

Tasks

Interactive mode
- REPL-style interface
- Command history
- Tab completion (optional)
Configuration system
- config.yaml - Model selection, safety settings, tool config
- Command-line flags (--model, --debug, --tools)
Logging & debugging
- Log all LLM requests/responses
- Debug mode showing tool execution
- Performance metrics (tokens used, time per turn)
Advanced features (pick what interests you)
- Streaming responses (real-time output)
- Multi-agent orchestration (spawn sub-agents)
- Plugin system (load tools from external modules)
- Web UI (Flask-based chat interface)
- Voice interface (Whisper STT + TTS)
Integration experiments
- Connect to OpenClaw as ACP harness
- Integrate with MagicMirror (control modules via agent)
- Home Assistant integration

Success Criteria

✅ Agent is production-ready for personal use
✅ Easy to add new tools
✅ Well-tested and documented
✅ You understand every line of code

Milestone Checklist

Track progress as you build:

Phase 1: Foundation

Ollama installed and running
Models downloaded and tested
Basic Python client works
Single-turn conversations work

Phase 2: Tools

Tool interface designed
File operations implemented
Tool calling loop works
Multi-step tasks complete

Phase 3: Context

Context manager implemented
Token estimation works
Multi-turn conversations work
History persists (optional)

Phase 4: Safety

Safety checks in place
Error handling robust
Additional tools added
Test suite passing

Phase 5: Polish

Interactive mode works
Configuration system
Logging and debugging
Advanced features (choose N)

Decisions to Make During Flight

Think about these while reviewing docs:

Which model to use as primary?
- llama3.3 (70B) - Smarter but slower
- qwen2.5-coder:7b - Faster, good for code
- Test both, see what works
Tool calling vs prompt engineering?
- Start with tool calling (cleaner)
- Fall back to prompts if model doesn't support it
Which tools to prioritize?
- File ops (read/write) - Essential
- Shell execution - Powerful but risky
- Web search - Useful for research tasks
- Code execution - Great for testing snippets
Safety vs convenience?
- Always ask before destructive operations?
- Or trust the LLM with whitelisted commands?
- Your call based on risk tolerance
How to test?
- Unit tests for each component?
- Integration tests for full workflows?
- Manual testing only?
- Combination (recommended)

Resources to Review on Flight

See reading-list.md for detailed resources.

Quick picks:

Anthropic's tool use guide (concepts apply to any LLM)
Ollama API docs
OpenClaw architecture (see what you're replicating)
Your own MagicMirror code (patterns to reuse)

After the Flight

When you land Thursday:

Install Ollama on Alienware (10 minutes)
Run through Phase 1 tasks (2-3 hours)
Report back - share what worked, what didn't
Adjust plan based on what you learned

This isn't just a project - it's your bootcamp for understanding AI agents. Every line you write teaches you something you can apply to future work.

Let's build something cool. 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap - Phased Development Plan

Overview

Phase 1: Foundation (Days 1-3)

Tasks

Success Criteria

Phase 2: Tool System (Days 4-7)

Tasks

Success Criteria

Phase 3: Context Management (Days 8-10)

Tasks

Success Criteria

Phase 4: Safety & Error Handling (Days 11-13)

Tasks

Success Criteria

Phase 5: Polish & Features (Days 14+)

Tasks

Success Criteria

Milestone Checklist

Decisions to Make During Flight

Resources to Review on Flight

After the Flight

FilesExpand file tree

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

Roadmap - Phased Development Plan

Overview

Phase 1: Foundation (Days 1-3)

Tasks

Success Criteria

Phase 2: Tool System (Days 4-7)

Tasks

Success Criteria

Phase 3: Context Management (Days 8-10)

Tasks

Success Criteria

Phase 4: Safety & Error Handling (Days 11-13)

Tasks

Success Criteria

Phase 5: Polish & Features (Days 14+)

Tasks

Success Criteria

Milestone Checklist

Decisions to Make During Flight

Resources to Review on Flight

After the Flight