sequenceDiagram
actor User
participant chat.py
participant FastAPI
participant AgentLoop
participant LiteLLM
participant SkillRegistry
participant ContainerPool
participant MemoryManager
User->>chat.py: message
chat.py->>FastAPI: POST /chat
FastAPI->>AgentLoop: run(session_id, message)
AgentLoop->>MemoryManager: load preferences + ChromaDB top-N
AgentLoop->>LiteLLM: completion(tools, messages)
LiteLLM-->>AgentLoop: tool_call
AgentLoop->>SkillRegistry: execute(tool_name, params)
SkillRegistry->>ContainerPool: execute(manifest, tool, params)
ContainerPool->>ContainerPool: POST /execute to skill container
ContainerPool-->>SkillRegistry: result
SkillRegistry-->>AgentLoop: result
AgentLoop->>AgentLoop: sanitize result
AgentLoop->>AgentLoop: if save_search_criteria → save_preferences
AgentLoop->>AgentLoop: score result 1-5
AgentLoop->>MemoryManager: save_history if score >= 3
AgentLoop->>LiteLLM: completion with tool result
LiteLLM-->>AgentLoop: final response
AgentLoop->>MemoryManager: save session episode (L2)
AgentLoop-->>FastAPI: AgentResponse
FastAPI-->>chat.py: response
chat.py-->>User: print response
sequenceDiagram
actor User
participant Frontend
participant FastAPI
participant AgentLoop
participant LiteLLM
participant SkillRegistry
User->>Frontend: message
Frontend->>FastAPI: POST /chat/stream
FastAPI->>AgentLoop: run_stream(session_id, message)
loop until finish_reason=stop
AgentLoop->>LiteLLM: acompletion(stream=True)
LiteLLM-->>AgentLoop: chunk stream
AgentLoop->>AgentLoop: accumulate tool call chunks
AgentLoop->>SkillRegistry: execute(tool_name, params)
SkillRegistry-->>AgentLoop: result
end
AgentLoop-->>FastAPI: yield token events
FastAPI-->>Frontend: "data: {type:token,...}"
AgentLoop-->>FastAPI: yield data event
FastAPI-->>Frontend: "data: {type:data,...}"
AgentLoop-->>FastAPI: yield hints event
FastAPI-->>Frontend: "data: {type:hints,...}"
AgentLoop-->>FastAPI: yield done event
FastAPI-->>Frontend: "data: {type:done,...}"
flowchart TD
A[Build context] --> C[LiteLLM call]
C -->|finish_reason: stop| E[Save session]
C -->|finish_reason: tool_calls| F[Dispatch tool calls]
C -->|ContextWindowExceededError| G[Reactive compact]
G --> C
F --> H[Sanitize result]
H --> H1{save_search_criteria?}
H1 -->|Yes| H2[Save preferences] --> I
H1 -->|No| I[Score result 1-5]
I -->|"score >= 3"| J[Save to ChromaDB]
I -->|"score < 3"| K[Discard]
J --> L[Append to history]
K --> L
L --> A
E --> N[Save L2 episode]
N --> O[Generate hints]
O --> M[Return / yield done]
graph LR
subgraph Identity ["Identity Layer (per skill, on disk)"]
AM["skills/{name}/AGENT.md\nL0 — identity, hard constraints\ninjected at position 0"]
end
subgraph Session ["Session Layer (per conversation)"]
SJ["sessions/id.json\nfull message history\n(source of truth)"]
SE["sessions/id/chroma/\nL2 episode store\nolder turns indexed by embedding"]
end
subgraph Persistent ["Persistent Layer (per skill, cross-session)"]
PM["preferences.md\nL1 — PREFERENCE / DECISION / OBSERVATION\nalways injected"]
CD["skill/chroma/\nL3 — semantic history\nscore >= 3 only"]
end
AgentLoop["AgentLoop"] -->|"position 0"| AM
AgentLoop -->|"last 5 turns verbatim"| SJ
AgentLoop -->|"older turns: top-K by similarity"| SE
AgentLoop -->|"always inject"| PM
AgentLoop -->|"top-N by similarity"| CD
SJ -->|"each exchange saved as episode"| SE
stateDiagram-v2
[*] --> Warm : start() x pool_size
Warm --> InUse : execute() checks out container
InUse --> Recreating : tool call completes
Recreating --> Warm : new container ready