Generic agent platform. Skills are pluggable Docker containers. Current time is Skill #1 (platform verification). Real estate search is Skill #2.
- User types: "find me a 3-bed house in Austin under $500k"
- Agent searches Zillow and returns matching properties as a card grid with address, price, beds, baths, sqft, and photos
- User clicks a card → agent automatically fetches full property details (no typing required); a detail card and map appear inline
- User asks for more detail: "show me more on the second one" → agent fetches full property details the same way
- User refines: "same but closer to downtown" → agent searches again with updated criteria
- User preferences (budget, location, property type) are remembered across sessions — the agent recalls them in future conversations without being told again
graph TD
subgraph Browser["Browser (PropSearch)"]
A[Chat Input] --> B[Message Stream]
B --> C{Message Type}
C -->|text| D[Text Bubble]
C -->|listings| E["PropertyGrid — grid or list"]
C -->|detail| F["PropertyGrid variant=detail"]
F --> G["MapView — OpenStreetMap"]
E -->|click card| H["auto-send detail request"]
H --> F
E --> I["View on Zillow"]
J[Sidebar] --> K["Session list with titles"]
K --> L["Delete session"]
M[Toggle] --> N["grid / list"]
end
subgraph API
O["POST /chat/stream (SSE)"]
P["GET /sessions"]
Q["GET /sessions/{id}"]
R["DELETE /sessions/{id}"]
S["GET /sessions/{id}/title"]
end
A --> O
O --> B
J --> P
K --> Q
L --> R
| View | Triggered by | Shows |
|---|---|---|
| Grid (default) | search result | Photo cards in 1–3 column responsive grid |
| List | toggle button | Compact horizontal rows: thumbnail, address, price, beds/baths/sqft |
| Detail | click card or ask about a property | Richer card (year built, lot size, HOA, Zestimate) + Leaflet map |
Grid/list preference is persisted in localStorage. Map only appears for detail results — the Zillow search API does not return coordinates; only the detail API (/pro/byzpid) does.
Clicking any property card (grid or list) automatically sends:
Show me the details for {address} (zpid: {zpid})
The agent calls get_property_details, and the detail card + map render inline below the response — no user typing required.
┌─────────────────────────────────────────────────┐
│ AGENT CORE │
│ │
│ FastAPI → AgentLoop → LiteLLM │
│ │ │
│ SkillRegistry │
│ MemoryManager │
│ SessionManager │
└──────────────────┬──────────────────────────────┘
│ HTTP localhost:{port}
┌──────────┼──────────┐
▼ ▼ ▼
localhost:9000 localhost:9002 (future skills)
(current_time_0) (real_estate_0)
localhost:9001 localhost:9003
(current_time_1) (real_estate_1)
| Component | File | Responsibility |
|---|---|---|
| AgentLoop | core/agent.py |
LiteLLM call → tool dispatch → memory save → repeat |
| ContainerPool | core/container_pool.py |
Start, pool, and execute skill containers |
| SkillRegistry | core/skill_registry.py |
Discover skills, merge tools/prompts, route dispatch |
| MemoryManager | core/memory.py |
preferences.md + ChromaDB per skill |
| SessionManager | core/session.py |
Session history CRUD |
| Sanitizer | core/sanitizer.py |
Scrub secrets from tool results before LLM sees them |
Every skill is a Docker container exposing two endpoints:
GET /schema→{ system_prompt, tools[] }— LiteLLM-format tool definitionsPOST /execute→{ tool, params }→{ result }— tool execution
Each skill directory contains two files read by the host at startup:
skills/<name>/SKILL.md— the---YAML block is machine-parsed (name, image); body is human-readable documentationskills/<name>/AGENT.md— optional; agent identity and hard constraints injected at L0. Skills without this file contribute no identity content.
---
name: current_time
image: current_time:latest
---
## Purpose
Returns the current date and time in UTC.
## Tools
- `get_current_time` — no parameters required
## Usage
No API keys or configuration needed.Skill secrets go in skills/<name>/.env — injected via --env-file at container start.
Four layers:
| Layer | MemPalace | Storage | Scope | Purpose |
|---|---|---|---|---|
| Identity | L0 | skills/{name}/AGENT.md |
Per-skill, on disk | Agent persona and hard constraints — injected at position 0 |
| Preferences | L1 | memory/{skill}/preferences.md |
Cross-session, per-skill | Explicit user facts — always injected |
| Semantic history | L3 | memory/{skill}/chroma/ |
Cross-session, per-skill | Past interactions retrieved by similarity |
| Session episodes | L2 | memory/sessions/{id}/chroma/ |
Per-session | Older session exchanges indexed for relevance retrieval |
| Session raw | — | memory/sessions/{id}.json |
Per-session | Full message history on disk — source of truth |
Typed entries — easier for LLM to parse, supports per-entry updates:
[PREFERENCE] budget_max: $500,000
[PREFERENCE] location: Austin TX
[DECISION] Exclude condos from all searches
[OBSERVATION] User prefers larger yards — refined after first search
Tool results are scored 1–5 by the LLM before storage. Only results ≥ 3 are stored. Prevents errors and empty responses from polluting semantic retrieval.
0. AGENT.md (per skill) ← L0: identity, hard constraints — loaded by SkillRegistry at startup
1. System prompt ← merged from all skill /schema responses
2. preferences.md ← L1: always loaded, skill-scoped
3. ChromaDB top-N ← L3: cross-session semantic retrieval
4. Session: last 5 turns ← verbatim, for conversational coherence
5. Session: older turns ← L2: semantic top-K retrieval from session episode store
6. User message
Steps 4 and 5 replace the previous single "session history" injection. The last 5 exchanges are always included verbatim for coherence. Older history is no longer compacted into a summary blob — instead each exchange is stored as an episode in a per-session ChromaDB collection and retrieved by similarity to the current message. This prevents irrelevant older context (e.g. a prior city search) from consuming tokens when the user pivots to a new topic.
Each skill gets pool_size (default 2) pre-warmed containers. Each container is published on a unique host port starting at 9000 (HOST_PORT_START env var), so the agent reaches them via http://localhost:{port} — no bridge network required. Ports are assigned sequentially and stored as Docker labels for recovery on restart.
After every tool call the used container is destroyed and recreated in the background — prevents side effect bleed (temp files, in-process state) while keeping the pool warm.
One trigger:
- Reactive — context overflow error caught, compact and retry
Proactive compaction of session history was removed in Phase 3. The L2 episode store retrieves only relevant older turns by similarity, which keeps token usage low without needing proactive compaction in most sessions.
Rule: never split a tool call / tool result pair across a compaction boundary.
| Endpoint | Transport | Purpose |
|---|---|---|
POST /chat |
JSON | Blocking — returns complete response including data field |
POST /chat/stream |
SSE | Streaming — yields token, data, done events |
GET /skills |
JSON | List loaded skill names |
GET /sessions |
JSON | List session IDs, most recent first |
GET /sessions/{id} |
JSON | Full message history for a session |
DELETE /sessions/{id} |
— | Delete a session |
GET /sessions/{id}/title |
JSON | Generate a short title from the session's first user message |
data: {"type": "token", "content": "I found "}
data: {"type": "token", "content": "3 properties..."}
data: {"type": "data", "data": {"type": "listings", "items": [...]}}
data: {"type": "hints", "hints": ["Show me with a garage", "Filter to houses only", "..."]}
data: {"type": "done", "session_id": "abc123"}
data event only fires when search_properties, get_property_details, or get_property_details_by_address returns a non-error result. chat.py uses POST /chat and is unaffected by the streaming endpoint.
| Decision | Rationale |
|---|---|
| No LangChain | Full transparency over agent loop; ~50 lines vs framework abstraction |
| LiteLLM | Swap LLM provider via one env var, no code changes |
| Docker per skill | Dependency isolation + language-agnostic skill authoring |
| Destroy container after use | Prevents side effect bleed between calls |
| Typed preferences.md entries | LLM parses easier; per-entry updates without rewriting file |
| Score before ChromaDB write | Keeps semantic retrieval clean; one cheap LLM call per tool use |
Per-skill .env |
Skill secrets never touch host env |
| Host port publishing (9000+) | Agent reaches containers via localhost; no bridge network DNS needed |
| Full streaming | All LLM calls stream; tool call chunks accumulated before dispatch; fewer total LLM calls than partial streaming |
Structured data field |
Frontend renders property cards from data.items without parsing text; chat.py reads message only and is unaffected |
| Map only on detail view | Zillow search API does not return coordinates; detail API (/pro/byzpid) does — map is shown only where data is available |
| Click-to-detail | Clicking a card sends a pre-composed message so the agent reliably calls get_property_details; no new API surface needed |