Running the Stack

Prerequisites

Python 3.11+
Node.js 18+
Docker
Ollama (or an Anthropic/OpenAI API key)

First-Time Setup

1. Install Python dependencies

pip install -r requirements.txt

2. Configure backend environment

cp .env.example .env

Edit .env:

# tested and verified
LLM_MODEL=claude-sonnet-4-6               # tested and verified
ANTHROPIC_API_KEY=                        # leave blank if using Ollama

# tested and verified
#LLM_MODEL=gemini/gemini-2.5-flash       
#GEMINI_API_KEY=

3. Configure real estate skill

cp skills/real_estate/.env.example skills/real_estate/.env
# fill in REALTY_API_KEY

4. (Optional) Pull the Ollama model Depends on which model you are using

ollama pull llama3.1

5. Build skill images

docker build -t current_time:latest skills/current_time/
docker build -t real_estate:latest skills/real_estate/

Skill containers are published on sequential host ports starting at 9000. No Docker bridge network required.

6. Install frontend dependencies

cd frontend && npm install

Running the Stack

Terminal 1 — Start Ollama

ollama serve

Terminal 2 — Start the agent server

uvicorn main:app --reload

Terminal 3 — Start the frontend

cd frontend && npm run dev

Open http://localhost:5173 — the Vite dev server proxies /chat, /sessions, and /skills to the FastAPI backend at localhost:8000.

CLI client (alternative to frontend)

python chat.py

chat.py uses POST /chat (blocking) and prints the agent's text response. Property card rendering is frontend-only.

Switching LLM providers

Provider	`.env` setting
Ollama (local)	`LLM_MODEL=ollama_chat/llama3.1`
Claude	`LLM_MODEL=claude-sonnet-4-6` + `ANTHROPIC_API_KEY=sk-...`
GPT-4o	`LLM_MODEL=gpt-4o` + `OPENAI_API_KEY=sk-...`

Ollama note: always use the ollama_chat/ prefix, not ollama/. The ollama/ prefix routes to a text completion endpoint that breaks tool calling in multi-turn conversations.

Restart the server after changing .env.

Environment variables

Variable	Default	Description
`LLM_MODEL`	—	LiteLLM model string (required)
`CONTEXT_LIMIT`	`180000`	Max tokens before reactive compaction
`LLM_TIMEOUT`	`60`	Seconds before a hung LLM call is cancelled
`MAX_TOOL_ITERATIONS`	`10`	Max tool call rounds per user turn
`POOL_SIZE`	`2`	Pre-warmed containers per skill
`CONTAINER_MEM_LIMIT`	`256m`	Docker memory limit per skill container
`HOST_PORT_START`	`9000`	First host port assigned to skill containers
`MEMORY_ROOT`	`memory`	Directory for session and preference storage
`LOG_LEVEL`	`INFO`	Log level for agent core (`DEBUG` for full traces)

Testing the streaming endpoint

Simple turn (no tool call):

curl -N -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test-s1", "message": "what time is it"}'

Property search:

curl -N -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test-s2", "message": "find me a 3-bed house in Austin under 500k"}'

The -N flag disables curl's output buffering so SSE events appear as they arrive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the Stack

Prerequisites

First-Time Setup

Running the Stack

CLI client (alternative to frontend)

Switching LLM providers

Environment variables

Testing the streaming endpoint

FilesExpand file tree

running.md

Latest commit

History

running.md

File metadata and controls

Running the Stack

Prerequisites

First-Time Setup

Running the Stack

CLI client (alternative to frontend)

Switching LLM providers

Environment variables

Testing the streaming endpoint