|
| 1 | +# NeMo-Guardrails LLMRails Refactor |
| 2 | + |
| 3 | +## High-Level Request Flow |
| 4 | + |
| 5 | +```mermaid |
| 6 | +flowchart TD |
| 7 | + Start([Client Request]) --> Entry[LLMRails.generate_async] |
| 8 | +
|
| 9 | + Entry --> Validate{Validate Input} |
| 10 | + Validate -->|prompt or messages?| Convert[Convert to Messages Format] |
| 11 | +
|
| 12 | + Convert --> ProcessOptions[Process Generation Options] |
| 13 | + ProcessOptions --> InitContext[Initialize Context Variables] |
| 14 | + InitContext --> InjectOptions[Inject Options into Messages] |
| 15 | +
|
| 16 | + InjectOptions --> EventTranslation[EventTranslator.messages_to_events] |
| 17 | + EventTranslation --> CheckCache{Check Event Cache<br/>Colang 1.0 only} |
| 18 | + CheckCache -->|Cache Hit| UseCached[Use Cached Events] |
| 19 | + CheckCache -->|Cache Miss| Transform[Transform Messages to Events] |
| 20 | + UseCached --> Events[Event List] |
| 21 | + Transform --> Events |
| 22 | +
|
| 23 | + Events --> RuntimeOrch[RuntimeOrchestrator.generate_events] |
| 24 | +
|
| 25 | + RuntimeOrch --> VersionCheck{Colang Version?} |
| 26 | +
|
| 27 | + VersionCheck -->|1.0| Runtime1[RuntimeV1_0.generate_events] |
| 28 | + VersionCheck -->|2.x| Runtime2[RuntimeV2_x.process_events] |
| 29 | +
|
| 30 | + Runtime1 --> ExecuteFlows1[Execute Colang 1.0 Flows] |
| 31 | + Runtime2 --> ExecuteFlows2[Execute Colang 2.x Flows] |
| 32 | +
|
| 33 | + ExecuteFlows1 --> Rails |
| 34 | + ExecuteFlows2 --> Rails |
| 35 | +
|
| 36 | + subgraph Rails["Rails Processing"] |
| 37 | + InputRails[Input Rails] --> DialogRails[Dialog Rails] |
| 38 | + DialogRails --> RetrievalRails[Retrieval Rails] |
| 39 | + RetrievalRails --> GenerationRails[Generation Rails] |
| 40 | + GenerationRails --> OutputRails[Output Rails] |
| 41 | + end |
| 42 | +
|
| 43 | + Rails --> Actions[Execute Actions] |
| 44 | +
|
| 45 | + subgraph Actions["Action Execution"] |
| 46 | + SelfCheck[self_check_input/output] |
| 47 | + LLMGeneration[LLM Generation Actions] |
| 48 | + KBRetrieval[KB Retrieval Actions] |
| 49 | + CustomActions[Custom Registered Actions] |
| 50 | + end |
| 51 | +
|
| 52 | + Actions --> NewEvents[New Events Generated] |
| 53 | + NewEvents --> CacheUpdate{Update Cache?<br/>Colang 1.0 only} |
| 54 | + CacheUpdate -->|Yes| UpdateCache[Update Event Cache] |
| 55 | + CacheUpdate -->|No| AssembleResponse |
| 56 | + UpdateCache --> AssembleResponse |
| 57 | +
|
| 58 | + AssembleResponse[ResponseAssembler.assemble_response] |
| 59 | + AssembleResponse --> ExtractData[Extract Responses & Metadata] |
| 60 | + ExtractData --> BuildMessage[Build Response Message] |
| 61 | + BuildMessage --> AddMetadata[Add Tool Calls, Reasoning, etc.] |
| 62 | +
|
| 63 | + AddMetadata --> CreateLog{Include Log?} |
| 64 | + CreateLog -->|Yes| ComputeLog[Compute Generation Log] |
| 65 | + CreateLog -->|No| FinalResponse |
| 66 | + ComputeLog --> FinalResponse[GenerationResponse Object] |
| 67 | +
|
| 68 | + FinalResponse --> Tracing{Tracing Enabled?} |
| 69 | + Tracing -->|Yes| ExportTraces[Export Traces] |
| 70 | + Tracing -->|No| Return |
| 71 | + ExportTraces --> Return |
| 72 | +
|
| 73 | + Return([Return Response to Client]) |
| 74 | +
|
| 75 | + style Start fill:#e1f5e1 |
| 76 | + style Return fill:#e1f5e1 |
| 77 | + style Rails fill:#fff4e6 |
| 78 | + style Actions fill:#e6f3ff |
| 79 | +``` |
| 80 | + |
| 81 | +## Streaming Request Flow |
| 82 | + |
| 83 | +```mermaid |
| 84 | +sequenceDiagram |
| 85 | + participant Client |
| 86 | + participant LLMRails |
| 87 | + participant StreamHandler as StreamingHandler |
| 88 | + participant EventTranslator |
| 89 | + participant RuntimeOrch as RuntimeOrchestrator |
| 90 | + participant Runtime |
| 91 | + participant LLMGen as LLM Generation |
| 92 | + participant OutputRails as Output Rails |
| 93 | +
|
| 94 | + Client->>LLMRails: stream_async(messages) |
| 95 | + LLMRails->>StreamHandler: Create StreamingHandler |
| 96 | +
|
| 97 | + par Generation Task |
| 98 | + LLMRails->>LLMRails: generate_async(with streaming_handler) |
| 99 | + LLMRails->>EventTranslator: messages_to_events |
| 100 | + EventTranslator-->>LLMRails: events |
| 101 | + LLMRails->>RuntimeOrch: generate_events |
| 102 | + RuntimeOrch->>Runtime: process events |
| 103 | + Runtime->>LLMGen: Execute generation actions |
| 104 | + LLMGen->>StreamHandler: push_chunk (tokens) |
| 105 | + LLMGen->>StreamHandler: push_chunk (tokens) |
| 106 | + LLMGen->>StreamHandler: push_chunk (tokens) |
| 107 | + LLMGen-->>Runtime: Complete |
| 108 | + Runtime-->>RuntimeOrch: new_events |
| 109 | + RuntimeOrch-->>LLMRails: new_events |
| 110 | + LLMRails->>StreamHandler: push_chunk(END_OF_STREAM) |
| 111 | + end |
| 112 | +
|
| 113 | + alt Output Rails Enabled |
| 114 | + loop For each chunk batch |
| 115 | + StreamHandler->>OutputRails: Buffer chunks |
| 116 | + OutputRails->>Runtime: Check output rails |
| 117 | + Runtime-->>OutputRails: allowed/blocked |
| 118 | + alt Not Blocked |
| 119 | + OutputRails->>Client: Yield chunks |
| 120 | + else Blocked |
| 121 | + OutputRails->>Client: Yield error JSON |
| 122 | + OutputRails->>Client: STOP |
| 123 | + end |
| 124 | + end |
| 125 | + else No Output Rails |
| 126 | + loop Streaming |
| 127 | + StreamHandler->>Client: Yield token |
| 128 | + end |
| 129 | + end |
| 130 | +``` |
| 131 | + |
| 132 | +## Key Components Description |
| 133 | + |
| 134 | +### LLMRails |
| 135 | +- **Purpose**: Main entry point for the guardrails system |
| 136 | +- **Key Methods**: |
| 137 | + - `generate_async()`: Main generation method |
| 138 | + - `stream_async()`: Streaming generation |
| 139 | + - `register_action()`: Register custom actions |
| 140 | +- **Responsibilities**: Coordinates all components and manages the request lifecycle |
| 141 | + |
| 142 | +### EventTranslator |
| 143 | +- **Purpose**: Convert between message format and internal event format |
| 144 | +- **Features**: |
| 145 | + - Caches message-to-event mappings (Colang 1.0) |
| 146 | + - Handles both Colang 1.0 and 2.x formats |
| 147 | + - Supports context injection |
| 148 | + |
| 149 | +### RuntimeOrchestrator |
| 150 | +- **Purpose**: Manages the Colang runtime execution |
| 151 | +- **Features**: |
| 152 | + - Version-aware (Colang 1.0 vs 2.x) |
| 153 | + - Process events through flows |
| 154 | + - Coordinate action execution |
| 155 | + |
| 156 | +### RuntimeV1_0 / RuntimeV2_x |
| 157 | +- **Purpose**: Execute Colang flows and manage state |
| 158 | +- **Features**: |
| 159 | + - Flow execution engine |
| 160 | + - Action dispatcher |
| 161 | + - State management |
| 162 | + - Event processing |
| 163 | + |
| 164 | +### LLM Generation Actions |
| 165 | +- **Purpose**: Handle LLM calls for various tasks |
| 166 | +- **Key Actions**: |
| 167 | + - `generate_user_intent`: Canonical form generation |
| 168 | + - `generate_next_step`: Next step prediction |
| 169 | + - `generate_bot_message`: Response generation |
| 170 | + - `retrieve_relevant_chunks`: KB retrieval |
| 171 | + |
| 172 | +### ResponseAssembler |
| 173 | +- **Purpose**: Build final response from events |
| 174 | +- **Features**: |
| 175 | + - Extract bot messages |
| 176 | + - Handle tool calls |
| 177 | + - Include reasoning content |
| 178 | + - Generate logs |
| 179 | + - Compute state for next request |
| 180 | + |
| 181 | +### ModelFactory |
| 182 | +- **Purpose**: Manage LLM instances |
| 183 | +- **Features**: |
| 184 | + - Main LLM initialization |
| 185 | + - Specialized LLMs (embeddings, fact-checking, etc.) |
| 186 | + - Model configuration |
| 187 | + - Streaming support detection |
| 188 | + |
| 189 | +### KnowledgeBaseBuilder |
| 190 | +- **Purpose**: Build and manage knowledge base |
| 191 | +- **Features**: |
| 192 | + - Vector store creation |
| 193 | + - Document indexing |
| 194 | + - Embedding generation |
| 195 | + - Retrieval support |
0 commit comments