|
| 1 | +@startuml |
| 2 | + |
| 3 | +participant Client |
| 4 | +participant Endpoint as "Query Endpoint handler" |
| 5 | +participant Auth |
| 6 | +participant LlamaStack as "Llama Stack Client" |
| 7 | +participant Cache as Cache |
| 8 | + |
| 9 | +Client->>Endpoint: POST /query + attachments |
| 10 | +Endpoint->>Auth: Validate auth & permissions |
| 11 | +Auth-->>Endpoint: Authorized ✓ |
| 12 | +Endpoint->>Auth: Check config & token quota |
| 13 | +Auth-->>Endpoint: Config valid, tokens available |
| 14 | +Endpoint->>DB: Retrieve user conversation (optional) |
| 15 | +DB-->>Endpoint: UserConversation or None |
| 16 | +Endpoint->>Endpoint: Select model/provider from hints/config |
| 17 | +Endpoint->>LlamaStack: Get model capabilities |
| 18 | +LlamaStack-->>Endpoint: Capabilities response |
| 19 | +Endpoint->>Endpoint: Build system prompt, toolgroups, MCP headers |
| 20 | +Endpoint->>LlamaStack: Create turn (agent interaction) |
| 21 | +LlamaStack-->>Endpoint: Turn response + tool calls + RAG chunks |
| 22 | +Endpoint->>Endpoint: Parse metadata & referenced documents |
| 23 | +Endpoint->>Endpoint: Transform to QueryResponse |
| 24 | +Endpoint->>DB: Persist conversation metadata (model, topic, count) |
| 25 | +Endpoint->>Cache: Store conversation with timing metadata |
| 26 | +Endpoint-->>Client: Return QueryResponse + token metrics |
| 27 | + |
| 28 | +alt Connection Error |
| 29 | + LlamaStack-->>Endpoint: APIConnectionError |
| 30 | + Endpoint-->>Client: HTTP 500 |
| 31 | +end |
| 32 | + |
| 33 | +alt Quota Exceeded |
| 34 | + Auth-->>Endpoint: Rate limit violation |
| 35 | + Endpoint-->>Client: HTTP 429 |
| 36 | +end |
| 37 | + |
| 38 | +alt Invalid Request |
| 39 | + Endpoint-->>Client: Missing/invalid conversation or attachments |
| 40 | + Endpoint-->>Client: HTTP 400/403/404 |
| 41 | +end |
| 42 | + |
| 43 | +@enduml |
0 commit comments