Skip to content

Latest commit

 

History

History
141 lines (116 loc) · 4.05 KB

File metadata and controls

141 lines (116 loc) · 4.05 KB

Architecture

Request Flow

sequenceDiagram
    actor User
    participant chat.py
    participant FastAPI
    participant AgentLoop
    participant LiteLLM
    participant SkillRegistry
    participant ContainerPool
    participant MemoryManager

    User->>chat.py: message
    chat.py->>FastAPI: POST /chat
    FastAPI->>AgentLoop: run(session_id, message)

    AgentLoop->>MemoryManager: load preferences + ChromaDB top-N
    AgentLoop->>LiteLLM: completion(tools, messages)
    LiteLLM-->>AgentLoop: tool_call

    AgentLoop->>SkillRegistry: execute(tool_name, params)
    SkillRegistry->>ContainerPool: execute(manifest, tool, params)
    ContainerPool->>ContainerPool: POST /execute to skill container
    ContainerPool-->>SkillRegistry: result
    SkillRegistry-->>AgentLoop: result

    AgentLoop->>AgentLoop: sanitize result
    AgentLoop->>AgentLoop: if save_search_criteria → save_preferences
    AgentLoop->>AgentLoop: score result 1-5
    AgentLoop->>MemoryManager: save_history if score >= 3
    AgentLoop->>LiteLLM: completion with tool result
    LiteLLM-->>AgentLoop: final response

    AgentLoop->>MemoryManager: save session episode (L2)
    AgentLoop-->>FastAPI: AgentResponse
    FastAPI-->>chat.py: response
    chat.py-->>User: print response
Loading

Streaming Request Flow

sequenceDiagram
    actor User
    participant Frontend
    participant FastAPI
    participant AgentLoop
    participant LiteLLM
    participant SkillRegistry

    User->>Frontend: message
    Frontend->>FastAPI: POST /chat/stream
    FastAPI->>AgentLoop: run_stream(session_id, message)

    loop until finish_reason=stop
        AgentLoop->>LiteLLM: acompletion(stream=True)
        LiteLLM-->>AgentLoop: chunk stream
        AgentLoop->>AgentLoop: accumulate tool call chunks
        AgentLoop->>SkillRegistry: execute(tool_name, params)
        SkillRegistry-->>AgentLoop: result
    end

    AgentLoop-->>FastAPI: yield token events
    FastAPI-->>Frontend: "data: {type:token,...}"
    AgentLoop-->>FastAPI: yield data event
    FastAPI-->>Frontend: "data: {type:data,...}"
    AgentLoop-->>FastAPI: yield hints event
    FastAPI-->>Frontend: "data: {type:hints,...}"
    AgentLoop-->>FastAPI: yield done event
    FastAPI-->>Frontend: "data: {type:done,...}"
Loading

Agent Loop

flowchart TD
    A[Build context] --> C[LiteLLM call]
    C -->|finish_reason: stop| E[Save session]
    C -->|finish_reason: tool_calls| F[Dispatch tool calls]
    C -->|ContextWindowExceededError| G[Reactive compact]
    G --> C
    F --> H[Sanitize result]
    H --> H1{save_search_criteria?}
    H1 -->|Yes| H2[Save preferences] --> I
    H1 -->|No| I[Score result 1-5]
    I -->|"score >= 3"| J[Save to ChromaDB]
    I -->|"score < 3"| K[Discard]
    J --> L[Append to history]
    K --> L
    L --> A
    E --> N[Save L2 episode]
    N --> O[Generate hints]
    O --> M[Return / yield done]
Loading

Memory Model

graph LR
    subgraph Identity ["Identity Layer (per skill, on disk)"]
        AM["skills/{name}/AGENT.md\nL0 — identity, hard constraints\ninjected at position 0"]
    end

    subgraph Session ["Session Layer (per conversation)"]
        SJ["sessions/id.json\nfull message history\n(source of truth)"]
        SE["sessions/id/chroma/\nL2 episode store\nolder turns indexed by embedding"]
    end

    subgraph Persistent ["Persistent Layer (per skill, cross-session)"]
        PM["preferences.md\nL1 — PREFERENCE / DECISION / OBSERVATION\nalways injected"]
        CD["skill/chroma/\nL3 — semantic history\nscore >= 3 only"]
    end

    AgentLoop["AgentLoop"] -->|"position 0"| AM
    AgentLoop -->|"last 5 turns verbatim"| SJ
    AgentLoop -->|"older turns: top-K by similarity"| SE
    AgentLoop -->|"always inject"| PM
    AgentLoop -->|"top-N by similarity"| CD
    SJ -->|"each exchange saved as episode"| SE
Loading

Container Pool Lifecycle

stateDiagram-v2
    [*] --> Warm : start() x pool_size
    Warm --> InUse : execute() checks out container
    InUse --> Recreating : tool call completes
    Recreating --> Warm : new container ready
Loading