Design

Vision

Generic agent platform. Skills are pluggable Docker containers. Current time is Skill #1 (platform verification). Real estate search is Skill #2.

User Experience

User types: "find me a 3-bed house in Austin under $500k"
Agent searches Zillow and returns matching properties as a card grid with address, price, beds, baths, sqft, and photos
User clicks a card → agent automatically fetches full property details (no typing required); a detail card and map appear inline
User asks for more detail: "show me more on the second one" → agent fetches full property details the same way
User refines: "same but closer to downtown" → agent searches again with updated criteria
User preferences (budget, location, property type) are remembered across sessions — the agent recalls them in future conversations without being told again

UI

PropSearch — React frontend

graph TD
    subgraph Browser["Browser (PropSearch)"]
        A[Chat Input] --> B[Message Stream]
        B --> C{Message Type}
        C -->|text| D[Text Bubble]
        C -->|listings| E["PropertyGrid — grid or list"]
        C -->|detail| F["PropertyGrid variant=detail"]
        F --> G["MapView — OpenStreetMap"]
        E -->|click card| H["auto-send detail request"]
        H --> F
        E --> I["View on Zillow"]
        J[Sidebar] --> K["Session list with titles"]
        K --> L["Delete session"]
        M[Toggle] --> N["grid / list"]
    end

    subgraph API
        O["POST /chat/stream (SSE)"]
        P["GET /sessions"]
        Q["GET /sessions/{id}"]
        R["DELETE /sessions/{id}"]
        S["GET /sessions/{id}/title"]
    end

    A --> O
    O --> B
    J --> P
    K --> Q
    L --> R

Views

View	Triggered by	Shows
Grid (default)	search result	Photo cards in 1–3 column responsive grid
List	toggle button	Compact horizontal rows: thumbnail, address, price, beds/baths/sqft
Detail	click card or ask about a property	Richer card (year built, lot size, HOA, Zestimate) + Leaflet map

Grid/list preference is persisted in localStorage. Map only appears for detail results — the Zillow search API does not return coordinates; only the detail API (/pro/byzpid) does.

Clicking a card

Clicking any property card (grid or list) automatically sends:

Show me the details for {address} (zpid: {zpid})

The agent calls get_property_details, and the detail card + map render inline below the response — no user typing required.

Architecture

┌─────────────────────────────────────────────────┐
│                  AGENT CORE                      │
│                                                  │
│  FastAPI  →  AgentLoop  →  LiteLLM              │
│                  │                               │
│            SkillRegistry                        │
│            MemoryManager                        │
│            SessionManager                       │
└──────────────────┬──────────────────────────────┘
                   │ HTTP localhost:{port}
        ┌──────────┼──────────┐
        ▼          ▼          ▼
   localhost:9000  localhost:9002  (future skills)
   (current_time_0) (real_estate_0)
   localhost:9001  localhost:9003
   (current_time_1) (real_estate_1)

Components

Component	File	Responsibility
AgentLoop	`core/agent.py`	LiteLLM call → tool dispatch → memory save → repeat
ContainerPool	`core/container_pool.py`	Start, pool, and execute skill containers
SkillRegistry	`core/skill_registry.py`	Discover skills, merge tools/prompts, route dispatch
MemoryManager	`core/memory.py`	preferences.md + ChromaDB per skill
SessionManager	`core/session.py`	Session history CRUD
Sanitizer	`core/sanitizer.py`	Scrub secrets from tool results before LLM sees them

Skill Contract

Every skill is a Docker container exposing two endpoints:

GET /schema → { system_prompt, tools[] } — LiteLLM-format tool definitions
POST /execute → { tool, params } → { result } — tool execution

Each skill directory contains two files read by the host at startup:

skills/<name>/SKILL.md — the --- YAML block is machine-parsed (name, image); body is human-readable documentation
skills/<name>/AGENT.md — optional; agent identity and hard constraints injected at L0. Skills without this file contribute no identity content.

---
name: current_time
image: current_time:latest
---

## Purpose
Returns the current date and time in UTC.

## Tools
- `get_current_time` — no parameters required

## Usage
No API keys or configuration needed.

Skill secrets go in skills/<name>/.env — injected via --env-file at container start.

Memory Model

Four layers:

Layer	MemPalace	Storage	Scope	Purpose
Identity	L0	`skills/{name}/AGENT.md`	Per-skill, on disk	Agent persona and hard constraints — injected at position 0
Preferences	L1	`memory/{skill}/preferences.md`	Cross-session, per-skill	Explicit user facts — always injected
Semantic history	L3	`memory/{skill}/chroma/`	Cross-session, per-skill	Past interactions retrieved by similarity
Session episodes	L2	`memory/sessions/{id}/chroma/`	Per-session	Older session exchanges indexed for relevance retrieval
Session raw	—	`memory/sessions/{id}.json`	Per-session	Full message history on disk — source of truth

preferences.md format

Typed entries — easier for LLM to parse, supports per-entry updates:

[PREFERENCE] budget_max: $500,000
[PREFERENCE] location: Austin TX
[DECISION] Exclude condos from all searches
[OBSERVATION] User prefers larger yards — refined after first search

ChromaDB write filter

Tool results are scored 1–5 by the LLM before storage. Only results ≥ 3 are stored. Prevents errors and empty responses from polluting semantic retrieval.

Context Injection Order (every turn)

0. AGENT.md (per skill)     ← L0: identity, hard constraints — loaded by SkillRegistry at startup
1. System prompt            ← merged from all skill /schema responses
2. preferences.md           ← L1: always loaded, skill-scoped
3. ChromaDB top-N           ← L3: cross-session semantic retrieval
4. Session: last 5 turns    ← verbatim, for conversational coherence
5. Session: older turns     ← L2: semantic top-K retrieval from session episode store
6. User message

Steps 4 and 5 replace the previous single "session history" injection. The last 5 exchanges are always included verbatim for coherence. Older history is no longer compacted into a summary blob — instead each exchange is stored as an episode in a per-session ChromaDB collection and retrieved by similarity to the current message. This prevents irrelevant older context (e.g. a prior city search) from consuming tokens when the user pivots to a new topic.

Container Pool

Each skill gets pool_size (default 2) pre-warmed containers. Each container is published on a unique host port starting at 9000 (HOST_PORT_START env var), so the agent reaches them via http://localhost:{port} — no bridge network required. Ports are assigned sequentially and stored as Docker labels for recovery on restart.

After every tool call the used container is destroyed and recreated in the background — prevents side effect bleed (temp files, in-process state) while keeping the pool warm.

Context Compaction

One trigger:

Reactive — context overflow error caught, compact and retry

Proactive compaction of session history was removed in Phase 3. The L2 episode store retrieves only relevant older turns by similarity, which keeps token usage low without needing proactive compaction in most sessions.

Rule: never split a tool call / tool result pair across a compaction boundary.

API Endpoints

Endpoint	Transport	Purpose
`POST /chat`	JSON	Blocking — returns complete response including `data` field
`POST /chat/stream`	SSE	Streaming — yields `token`, `data`, `done` events
`GET /skills`	JSON	List loaded skill names
`GET /sessions`	JSON	List session IDs, most recent first
`GET /sessions/{id}`	JSON	Full message history for a session
`DELETE /sessions/{id}`	—	Delete a session
`GET /sessions/{id}/title`	JSON	Generate a short title from the session's first user message

SSE event format (`POST /chat/stream`)

data: {"type": "token", "content": "I found "}
data: {"type": "token", "content": "3 properties..."}
data: {"type": "data", "data": {"type": "listings", "items": [...]}}
data: {"type": "hints", "hints": ["Show me with a garage", "Filter to houses only", "..."]}
data: {"type": "done", "session_id": "abc123"}

data event only fires when search_properties, get_property_details, or get_property_details_by_address returns a non-error result. chat.py uses POST /chat and is unaffected by the streaming endpoint.

Key Design Decisions

Decision	Rationale
No LangChain	Full transparency over agent loop; ~50 lines vs framework abstraction
LiteLLM	Swap LLM provider via one env var, no code changes
Docker per skill	Dependency isolation + language-agnostic skill authoring
Destroy container after use	Prevents side effect bleed between calls
Typed preferences.md entries	LLM parses easier; per-entry updates without rewriting file
Score before ChromaDB write	Keeps semantic retrieval clean; one cheap LLM call per tool use
Per-skill `.env`	Skill secrets never touch host env
Host port publishing (9000+)	Agent reaches containers via localhost; no bridge network DNS needed
Full streaming	All LLM calls stream; tool call chunks accumulated before dispatch; fewer total LLM calls than partial streaming
Structured `data` field	Frontend renders property cards from `data.items` without parsing text; `chat.py` reads `message` only and is unaffected
Map only on detail view	Zillow search API does not return coordinates; detail API (`/pro/byzpid`) does — map is shown only where data is available
Click-to-detail	Clicking a card sends a pre-composed message so the agent reliably calls `get_property_details`; no new API surface needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design

Vision

User Experience

UI

PropSearch — React frontend

Views

Clicking a card

Architecture

Components

Skill Contract

Memory Model

preferences.md format

ChromaDB write filter

Context Injection Order (every turn)

Container Pool

Context Compaction

API Endpoints

SSE event format (`POST /chat/stream`)

Key Design Decisions

FilesExpand file tree

design.md

Latest commit

History

design.md

File metadata and controls

Design

Vision

User Experience

UI

PropSearch — React frontend

Views

Clicking a card

Architecture

Components

Skill Contract

Memory Model

preferences.md format

ChromaDB write filter

Context Injection Order (every turn)

Container Pool

Context Compaction

API Endpoints

SSE event format (POST /chat/stream)

Key Design Decisions

SSE event format (`POST /chat/stream`)