Skip to content

Commit defd95a

Browse files
Add documentation for LLMRails refactor
1 parent 40df9bc commit defd95a

1 file changed

Lines changed: 195 additions & 0 deletions

File tree

llmrails_refactor.md

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# NeMo-Guardrails LLMRails Refactor
2+
3+
## High-Level Request Flow
4+
5+
```mermaid
6+
flowchart TD
7+
Start([Client Request]) --> Entry[LLMRails.generate_async]
8+
9+
Entry --> Validate{Validate Input}
10+
Validate -->|prompt or messages?| Convert[Convert to Messages Format]
11+
12+
Convert --> ProcessOptions[Process Generation Options]
13+
ProcessOptions --> InitContext[Initialize Context Variables]
14+
InitContext --> InjectOptions[Inject Options into Messages]
15+
16+
InjectOptions --> EventTranslation[EventTranslator.messages_to_events]
17+
EventTranslation --> CheckCache{Check Event Cache<br/>Colang 1.0 only}
18+
CheckCache -->|Cache Hit| UseCached[Use Cached Events]
19+
CheckCache -->|Cache Miss| Transform[Transform Messages to Events]
20+
UseCached --> Events[Event List]
21+
Transform --> Events
22+
23+
Events --> RuntimeOrch[RuntimeOrchestrator.generate_events]
24+
25+
RuntimeOrch --> VersionCheck{Colang Version?}
26+
27+
VersionCheck -->|1.0| Runtime1[RuntimeV1_0.generate_events]
28+
VersionCheck -->|2.x| Runtime2[RuntimeV2_x.process_events]
29+
30+
Runtime1 --> ExecuteFlows1[Execute Colang 1.0 Flows]
31+
Runtime2 --> ExecuteFlows2[Execute Colang 2.x Flows]
32+
33+
ExecuteFlows1 --> Rails
34+
ExecuteFlows2 --> Rails
35+
36+
subgraph Rails["Rails Processing"]
37+
InputRails[Input Rails] --> DialogRails[Dialog Rails]
38+
DialogRails --> RetrievalRails[Retrieval Rails]
39+
RetrievalRails --> GenerationRails[Generation Rails]
40+
GenerationRails --> OutputRails[Output Rails]
41+
end
42+
43+
Rails --> Actions[Execute Actions]
44+
45+
subgraph Actions["Action Execution"]
46+
SelfCheck[self_check_input/output]
47+
LLMGeneration[LLM Generation Actions]
48+
KBRetrieval[KB Retrieval Actions]
49+
CustomActions[Custom Registered Actions]
50+
end
51+
52+
Actions --> NewEvents[New Events Generated]
53+
NewEvents --> CacheUpdate{Update Cache?<br/>Colang 1.0 only}
54+
CacheUpdate -->|Yes| UpdateCache[Update Event Cache]
55+
CacheUpdate -->|No| AssembleResponse
56+
UpdateCache --> AssembleResponse
57+
58+
AssembleResponse[ResponseAssembler.assemble_response]
59+
AssembleResponse --> ExtractData[Extract Responses & Metadata]
60+
ExtractData --> BuildMessage[Build Response Message]
61+
BuildMessage --> AddMetadata[Add Tool Calls, Reasoning, etc.]
62+
63+
AddMetadata --> CreateLog{Include Log?}
64+
CreateLog -->|Yes| ComputeLog[Compute Generation Log]
65+
CreateLog -->|No| FinalResponse
66+
ComputeLog --> FinalResponse[GenerationResponse Object]
67+
68+
FinalResponse --> Tracing{Tracing Enabled?}
69+
Tracing -->|Yes| ExportTraces[Export Traces]
70+
Tracing -->|No| Return
71+
ExportTraces --> Return
72+
73+
Return([Return Response to Client])
74+
75+
style Start fill:#e1f5e1
76+
style Return fill:#e1f5e1
77+
style Rails fill:#fff4e6
78+
style Actions fill:#e6f3ff
79+
```
80+
81+
## Streaming Request Flow
82+
83+
```mermaid
84+
sequenceDiagram
85+
participant Client
86+
participant LLMRails
87+
participant StreamHandler as StreamingHandler
88+
participant EventTranslator
89+
participant RuntimeOrch as RuntimeOrchestrator
90+
participant Runtime
91+
participant LLMGen as LLM Generation
92+
participant OutputRails as Output Rails
93+
94+
Client->>LLMRails: stream_async(messages)
95+
LLMRails->>StreamHandler: Create StreamingHandler
96+
97+
par Generation Task
98+
LLMRails->>LLMRails: generate_async(with streaming_handler)
99+
LLMRails->>EventTranslator: messages_to_events
100+
EventTranslator-->>LLMRails: events
101+
LLMRails->>RuntimeOrch: generate_events
102+
RuntimeOrch->>Runtime: process events
103+
Runtime->>LLMGen: Execute generation actions
104+
LLMGen->>StreamHandler: push_chunk (tokens)
105+
LLMGen->>StreamHandler: push_chunk (tokens)
106+
LLMGen->>StreamHandler: push_chunk (tokens)
107+
LLMGen-->>Runtime: Complete
108+
Runtime-->>RuntimeOrch: new_events
109+
RuntimeOrch-->>LLMRails: new_events
110+
LLMRails->>StreamHandler: push_chunk(END_OF_STREAM)
111+
end
112+
113+
alt Output Rails Enabled
114+
loop For each chunk batch
115+
StreamHandler->>OutputRails: Buffer chunks
116+
OutputRails->>Runtime: Check output rails
117+
Runtime-->>OutputRails: allowed/blocked
118+
alt Not Blocked
119+
OutputRails->>Client: Yield chunks
120+
else Blocked
121+
OutputRails->>Client: Yield error JSON
122+
OutputRails->>Client: STOP
123+
end
124+
end
125+
else No Output Rails
126+
loop Streaming
127+
StreamHandler->>Client: Yield token
128+
end
129+
end
130+
```
131+
132+
## Key Components Description
133+
134+
### LLMRails
135+
- **Purpose**: Main entry point for the guardrails system
136+
- **Key Methods**:
137+
- `generate_async()`: Main generation method
138+
- `stream_async()`: Streaming generation
139+
- `register_action()`: Register custom actions
140+
- **Responsibilities**: Coordinates all components and manages the request lifecycle
141+
142+
### EventTranslator
143+
- **Purpose**: Convert between message format and internal event format
144+
- **Features**:
145+
- Caches message-to-event mappings (Colang 1.0)
146+
- Handles both Colang 1.0 and 2.x formats
147+
- Supports context injection
148+
149+
### RuntimeOrchestrator
150+
- **Purpose**: Manages the Colang runtime execution
151+
- **Features**:
152+
- Version-aware (Colang 1.0 vs 2.x)
153+
- Process events through flows
154+
- Coordinate action execution
155+
156+
### RuntimeV1_0 / RuntimeV2_x
157+
- **Purpose**: Execute Colang flows and manage state
158+
- **Features**:
159+
- Flow execution engine
160+
- Action dispatcher
161+
- State management
162+
- Event processing
163+
164+
### LLM Generation Actions
165+
- **Purpose**: Handle LLM calls for various tasks
166+
- **Key Actions**:
167+
- `generate_user_intent`: Canonical form generation
168+
- `generate_next_step`: Next step prediction
169+
- `generate_bot_message`: Response generation
170+
- `retrieve_relevant_chunks`: KB retrieval
171+
172+
### ResponseAssembler
173+
- **Purpose**: Build final response from events
174+
- **Features**:
175+
- Extract bot messages
176+
- Handle tool calls
177+
- Include reasoning content
178+
- Generate logs
179+
- Compute state for next request
180+
181+
### ModelFactory
182+
- **Purpose**: Manage LLM instances
183+
- **Features**:
184+
- Main LLM initialization
185+
- Specialized LLMs (embeddings, fact-checking, etc.)
186+
- Model configuration
187+
- Streaming support detection
188+
189+
### KnowledgeBaseBuilder
190+
- **Purpose**: Build and manage knowledge base
191+
- **Features**:
192+
- Vector store creation
193+
- Document indexing
194+
- Embedding generation
195+
- Retrieval support

0 commit comments

Comments
 (0)