- Python 3.11+
- Node.js 18+
- Docker
- Ollama (or an Anthropic/OpenAI API key)
1. Install Python dependencies
pip install -r requirements.txt2. Configure backend environment
cp .env.example .envEdit .env:
# tested and verified
LLM_MODEL=claude-sonnet-4-6 # tested and verified
ANTHROPIC_API_KEY= # leave blank if using Ollama
# tested and verified
#LLM_MODEL=gemini/gemini-2.5-flash
#GEMINI_API_KEY=
3. Configure real estate skill
cp skills/real_estate/.env.example skills/real_estate/.env
# fill in REALTY_API_KEY4. (Optional) Pull the Ollama model Depends on which model you are using
ollama pull llama3.15. Build skill images
docker build -t current_time:latest skills/current_time/
docker build -t real_estate:latest skills/real_estate/Skill containers are published on sequential host ports starting at 9000. No Docker bridge network required.
6. Install frontend dependencies
cd frontend && npm installTerminal 1 — Start Ollama
ollama serveTerminal 2 — Start the agent server
uvicorn main:app --reloadTerminal 3 — Start the frontend
cd frontend && npm run devOpen http://localhost:5173 — the Vite dev server proxies /chat, /sessions, and /skills to the FastAPI backend at localhost:8000.
python chat.pychat.py uses POST /chat (blocking) and prints the agent's text response. Property card rendering is frontend-only.
| Provider | .env setting |
|---|---|
| Ollama (local) | LLM_MODEL=ollama_chat/llama3.1 |
| Claude | LLM_MODEL=claude-sonnet-4-6 + ANTHROPIC_API_KEY=sk-... |
| GPT-4o | LLM_MODEL=gpt-4o + OPENAI_API_KEY=sk-... |
Ollama note: always use the
ollama_chat/prefix, notollama/. Theollama/prefix routes to a text completion endpoint that breaks tool calling in multi-turn conversations.
Restart the server after changing .env.
| Variable | Default | Description |
|---|---|---|
LLM_MODEL |
— | LiteLLM model string (required) |
CONTEXT_LIMIT |
180000 |
Max tokens before reactive compaction |
LLM_TIMEOUT |
60 |
Seconds before a hung LLM call is cancelled |
MAX_TOOL_ITERATIONS |
10 |
Max tool call rounds per user turn |
POOL_SIZE |
2 |
Pre-warmed containers per skill |
CONTAINER_MEM_LIMIT |
256m |
Docker memory limit per skill container |
HOST_PORT_START |
9000 |
First host port assigned to skill containers |
MEMORY_ROOT |
memory |
Directory for session and preference storage |
LOG_LEVEL |
INFO |
Log level for agent core (DEBUG for full traces) |
Simple turn (no tool call):
curl -N -X POST http://localhost:8000/chat/stream \
-H "Content-Type: application/json" \
-d '{"session_id": "test-s1", "message": "what time is it"}'Property search:
curl -N -X POST http://localhost:8000/chat/stream \
-H "Content-Type: application/json" \
-d '{"session_id": "test-s2", "message": "find me a 3-bed house in Austin under 500k"}'The -N flag disables curl's output buffering so SSE events appear as they arrive.