Skip to content

Latest commit

 

History

History
141 lines (103 loc) · 3.43 KB

File metadata and controls

141 lines (103 loc) · 3.43 KB

Running the Stack

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Docker
  • Ollama (or an Anthropic/OpenAI API key)

First-Time Setup

1. Install Python dependencies

pip install -r requirements.txt

2. Configure backend environment

cp .env.example .env

Edit .env:

# tested and verified
LLM_MODEL=claude-sonnet-4-6               # tested and verified
ANTHROPIC_API_KEY=                        # leave blank if using Ollama

# tested and verified
#LLM_MODEL=gemini/gemini-2.5-flash       
#GEMINI_API_KEY=

3. Configure real estate skill

cp skills/real_estate/.env.example skills/real_estate/.env
# fill in REALTY_API_KEY

4. (Optional) Pull the Ollama model Depends on which model you are using

ollama pull llama3.1

5. Build skill images

docker build -t current_time:latest skills/current_time/
docker build -t real_estate:latest skills/real_estate/

Skill containers are published on sequential host ports starting at 9000. No Docker bridge network required.

6. Install frontend dependencies

cd frontend && npm install

Running the Stack

Terminal 1 — Start Ollama

ollama serve

Terminal 2 — Start the agent server

uvicorn main:app --reload

Terminal 3 — Start the frontend

cd frontend && npm run dev

Open http://localhost:5173 — the Vite dev server proxies /chat, /sessions, and /skills to the FastAPI backend at localhost:8000.


CLI client (alternative to frontend)

python chat.py

chat.py uses POST /chat (blocking) and prints the agent's text response. Property card rendering is frontend-only.


Switching LLM providers

Provider .env setting
Ollama (local) LLM_MODEL=ollama_chat/llama3.1
Claude LLM_MODEL=claude-sonnet-4-6 + ANTHROPIC_API_KEY=sk-...
GPT-4o LLM_MODEL=gpt-4o + OPENAI_API_KEY=sk-...

Ollama note: always use the ollama_chat/ prefix, not ollama/. The ollama/ prefix routes to a text completion endpoint that breaks tool calling in multi-turn conversations.

Restart the server after changing .env.


Environment variables

Variable Default Description
LLM_MODEL LiteLLM model string (required)
CONTEXT_LIMIT 180000 Max tokens before reactive compaction
LLM_TIMEOUT 60 Seconds before a hung LLM call is cancelled
MAX_TOOL_ITERATIONS 10 Max tool call rounds per user turn
POOL_SIZE 2 Pre-warmed containers per skill
CONTAINER_MEM_LIMIT 256m Docker memory limit per skill container
HOST_PORT_START 9000 First host port assigned to skill containers
MEMORY_ROOT memory Directory for session and preference storage
LOG_LEVEL INFO Log level for agent core (DEBUG for full traces)

Testing the streaming endpoint

Simple turn (no tool call):

curl -N -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test-s1", "message": "what time is it"}'

Property search:

curl -N -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test-s2", "message": "find me a 3-bed house in Austin under 500k"}'

The -N flag disables curl's output buffering so SSE events appear as they arrive.