|
| 1 | +# LLMs |
| 2 | + |
| 3 | +The graph engine has no concept of LLMs or tools. A node is just an |
| 4 | +async function that reads typed state and returns a partial update. |
| 5 | +Calling an LLM is one of the things a node can do during that call, the |
| 6 | +same way it might read a file, hit a database, or invoke an internal |
| 7 | +service. This page covers the patterns that emerge once you start |
| 8 | +mixing LLM calls into graph nodes. |
| 9 | + |
| 10 | +## LLM calls are async IO inside a node |
| 11 | + |
| 12 | +Construct one [`Provider`](../reference/llm.md) at startup and share it |
| 13 | +across nodes. Each `complete()` call carries the full message list and |
| 14 | +returns a [`Response`](../reference/llm.md); the provider is stateless |
| 15 | +and reentrant, so multiple nodes (or fan-out instances) can call into |
| 16 | +it concurrently without coordination. |
| 17 | + |
| 18 | +```python |
| 19 | +import os |
| 20 | +from openarmature.llm import OpenAIProvider, UserMessage |
| 21 | + |
| 22 | +provider = OpenAIProvider( |
| 23 | + base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"), |
| 24 | + model="gpt-4o-mini", |
| 25 | + api_key=os.environ["LLM_API_KEY"], |
| 26 | +) |
| 27 | + |
| 28 | + |
| 29 | +async def analyze(state: AnalysisState) -> dict: |
| 30 | + response = await provider.complete( |
| 31 | + [UserMessage(content=state.text)], |
| 32 | + ) |
| 33 | + return {"raw": response.message.content} |
| 34 | +``` |
| 35 | + |
| 36 | +The provider goes wherever your application's other long-lived |
| 37 | +dependencies go: module-level constant, dependency-injection |
| 38 | +container, factory function. It does not need to be constructed per |
| 39 | +call, and constructing it cheaply (no eager network calls) means |
| 40 | +import-time setup is fine. |
| 41 | + |
| 42 | +A real graph hits LLMs from multiple nodes. The conventional shape: |
| 43 | + |
| 44 | +```python |
| 45 | +async def classify(state): # one provider call |
| 46 | + response = await provider.complete(...) |
| 47 | + return {...} |
| 48 | + |
| 49 | +async def extract(state): # another provider call |
| 50 | + response = await provider.complete(...) |
| 51 | + return {...} |
| 52 | + |
| 53 | +async def synthesize(state): # a third |
| 54 | + response = await provider.complete(...) |
| 55 | + return {...} |
| 56 | +``` |
| 57 | + |
| 58 | +The graph composes the order; the provider sees three independent |
| 59 | +stateless calls. Conversational memory (if you want it) is the |
| 60 | +caller's responsibility: thread it through state and pass the |
| 61 | +accumulated message list into each call. |
| 62 | + |
| 63 | +## Structured output |
| 64 | + |
| 65 | +Every LLM-using node that produces typed data ends up with the same |
| 66 | +shape: render a prompt, call the model, parse the response as JSON, |
| 67 | +validate it against the expected schema, retry on parse or validation |
| 68 | +failure. Five steps of boilerplate that differ only in the schema and |
| 69 | +the prompt. |
| 70 | + |
| 71 | +Structured output collapses that into one parameter. Pass a |
| 72 | +`response_schema` to `complete()` and the provider: |
| 73 | + |
| 74 | +1. Tells the model on the wire to produce schema-conforming output. |
| 75 | +2. Parses and validates the response against the schema. |
| 76 | +3. Surfaces the validated value on `Response.parsed`. |
| 77 | +4. Raises `StructuredOutputInvalid` on parse or validation failure. |
| 78 | + |
| 79 | +Two forms are accepted: a Pydantic class (typed-instance return) and a |
| 80 | +JSON Schema dict (raw-dict return). Same wire shape underneath. |
| 81 | + |
| 82 | +### Pydantic class form |
| 83 | + |
| 84 | +```python |
| 85 | +from pydantic import BaseModel |
| 86 | + |
| 87 | +class Classification(BaseModel): |
| 88 | + intent: Literal["research", "summarize"] |
| 89 | + rationale: str |
| 90 | + |
| 91 | + |
| 92 | +async def classify(state): |
| 93 | + response = await provider.complete( |
| 94 | + [UserMessage(content=f"Route this query: {state.query!r}")], |
| 95 | + response_schema=Classification, |
| 96 | + ) |
| 97 | + return {"classification": response.parsed} |
| 98 | +``` |
| 99 | + |
| 100 | +`Response.parsed` is a validated `Classification` instance. Field |
| 101 | +access is statically typed (`response.parsed.intent` returns |
| 102 | +`Literal["research", "summarize"]`); the framework calls |
| 103 | +`.model_json_schema()` under the hood to derive the wire body and |
| 104 | +`.model_validate()` to deserialize the response. |
| 105 | + |
| 106 | +### JSON Schema dict form |
| 107 | + |
| 108 | +```python |
| 109 | +async def research(state): |
| 110 | + response = await provider.complete( |
| 111 | + [UserMessage(content=f"Plan research: {state.query!r}")], |
| 112 | + response_schema={ |
| 113 | + "type": "object", |
| 114 | + "properties": { |
| 115 | + "topics": {"type": "array", "items": {"type": "string"}}, |
| 116 | + "follow_up_questions": {"type": "array", "items": {"type": "string"}}, |
| 117 | + }, |
| 118 | + "required": ["topics", "follow_up_questions"], |
| 119 | + "additionalProperties": False, |
| 120 | + }, |
| 121 | + ) |
| 122 | + return {"research_plan": response.parsed} |
| 123 | +``` |
| 124 | + |
| 125 | +`Response.parsed` is a `dict[str, Any]` populated per the schema. Use |
| 126 | +this when the shape is dynamic, generated, or borrowed from another |
| 127 | +system that already speaks JSON Schema. |
| 128 | + |
| 129 | +### Wire paths: native and fallback |
| 130 | + |
| 131 | +Real `OpenAIProvider` traffic uses OpenAI's native `response_format` |
| 132 | +field on the request body, so the model produces schema-conforming |
| 133 | +output in one trip. Some OpenAI-compatible servers (older vLLM, some |
| 134 | +LM Studio releases, llama.cpp variants) either reject `response_format` |
| 135 | +with a 400 or silently ignore it. For those, construct the provider |
| 136 | +with `force_prompt_augmentation_fallback=True`: |
| 137 | + |
| 138 | +```python |
| 139 | +provider = OpenAIProvider( |
| 140 | + base_url="http://localhost:8000", |
| 141 | + model="some-local-model", |
| 142 | + force_prompt_augmentation_fallback=True, # opt into the fallback |
| 143 | +) |
| 144 | +``` |
| 145 | + |
| 146 | +In fallback mode the provider prepends a system directive containing |
| 147 | +the serialized schema, omits `response_format` from the wire, and |
| 148 | +parses-and-validates the response post-receive. The behavioral contract |
| 149 | +is identical: `Response.parsed` populates the same way; failures raise |
| 150 | +`StructuredOutputInvalid` the same way. The |
| 151 | +`uses_prompt_augmentation_fallback` read-only property lets callers |
| 152 | +inspect which path is active. |
| 153 | + |
| 154 | +### Strict mode |
| 155 | + |
| 156 | +OpenAI's native path supports a `strict: true` flag that engages the |
| 157 | +model's schema-constrained decoding (the model literally cannot emit |
| 158 | +non-conforming tokens). It applies only when the schema satisfies |
| 159 | +specific constraints: `additionalProperties` explicitly `false` on every |
| 160 | +object, every key in `properties` listed in `required`, no |
| 161 | +unresolvable `$ref` targets. |
| 162 | + |
| 163 | +`strict_mode_supported(schema)` performs the deep recursive check. The |
| 164 | +provider passes `strict: true` to the wire when the schema satisfies |
| 165 | +it, and `strict: false` otherwise. Either way, the provider validates |
| 166 | +the response post-receive against the supplied schema. Strict is a |
| 167 | +wire-level optimization, not a correctness requirement. |
| 168 | + |
| 169 | +If you control the schema, prefer making it strict-compatible: |
| 170 | +explicit `additionalProperties: false` plus `required` covering every |
| 171 | +property. Pydantic-derived schemas may need a tweak to satisfy this |
| 172 | +(`model_config = ConfigDict(extra="forbid")` on the class). |
| 173 | + |
| 174 | +## Routing on parsed fields |
| 175 | + |
| 176 | +A conditional edge is a function `state -> str` that names the next |
| 177 | +node. The string can come from anywhere: a hard-coded rule, a lookup |
| 178 | +table, the parsed output of an LLM call. The graph engine doesn't |
| 179 | +distinguish. |
| 180 | + |
| 181 | +This means LLM-driven routing and deterministic routing have the same |
| 182 | +shape. A classifier node writes its parsed `Classification` to state; |
| 183 | +the conditional edge reads `state.classification.intent` and returns |
| 184 | +that string. The branches don't know whether the LLM or a regex |
| 185 | +produced the discriminator. |
| 186 | + |
| 187 | +```python |
| 188 | +async def classify(state): |
| 189 | + response = await provider.complete( |
| 190 | + [UserMessage(content=f"Route: {state.query!r}")], |
| 191 | + response_schema=Classification, |
| 192 | + ) |
| 193 | + return {"classification": response.parsed} |
| 194 | + |
| 195 | + |
| 196 | +def route(state) -> str: |
| 197 | + return state.classification.intent |
| 198 | + |
| 199 | + |
| 200 | +builder.add_conditional_edge("classify", route) |
| 201 | +``` |
| 202 | + |
| 203 | +The same `route` function could read a feature flag, a config lookup, |
| 204 | +or `"research" if "?" in state.query else "summarize"`. The branch |
| 205 | +nodes don't change. Swapping a rule-based router for an LLM-based one |
| 206 | +is a one-node change. |
| 207 | + |
| 208 | +## Errors at the LLM boundary |
| 209 | + |
| 210 | +Every provider call can fail. The |
| 211 | +[`openarmature.llm` reference](../reference/llm.md) lists the canonical |
| 212 | +error categories; this section covers how they compose with the rest |
| 213 | +of the graph. |
| 214 | + |
| 215 | +**Transient categories** (retry MAY succeed): |
| 216 | +`ProviderRateLimit`, `ProviderUnavailable`, `ProviderModelNotLoaded`. |
| 217 | +These are the canonical "wrap a node in `RetryMiddleware`" set; the |
| 218 | +default classifier picks them up automatically via |
| 219 | +`TRANSIENT_CATEGORIES`. |
| 220 | + |
| 221 | +**Non-transient categories** (retry without changing the request will |
| 222 | +not succeed): `ProviderAuthentication`, `ProviderInvalidModel`, |
| 223 | +`ProviderInvalidRequest`, `ProviderInvalidResponse`, |
| 224 | +`StructuredOutputInvalid`. These propagate up as `NodeException` so |
| 225 | +the graph's error-recovery middleware (or the caller of `invoke()`) |
| 226 | +can handle them. |
| 227 | + |
| 228 | +`StructuredOutputInvalid` is the new one and worth a note. It fires |
| 229 | +when a model returns content that fails to parse as JSON, or parses |
| 230 | +but fails to validate against the supplied schema. The exception |
| 231 | +carries the requested `response_schema`, the `raw_content` the model |
| 232 | +produced, and a `failure_description`. It is non-transient by default |
| 233 | +because a model that emits non-conforming output on a given prompt |
| 234 | +usually emits the same non-conforming output on retry. Useful retry |
| 235 | +strategies for this case involve changing the prompt or doubling |
| 236 | +`max_tokens` rather than re-issuing the same call; that's a |
| 237 | +middleware concern, not the provider's default. |
| 238 | + |
| 239 | +```python |
| 240 | +from openarmature.llm import StructuredOutputInvalid |
| 241 | + |
| 242 | +async def classify_with_diagnostics(state): |
| 243 | + try: |
| 244 | + response = await provider.complete( |
| 245 | + [UserMessage(content=...)], |
| 246 | + response_schema=Classification, |
| 247 | + ) |
| 248 | + except StructuredOutputInvalid as exc: |
| 249 | + log.warning( |
| 250 | + "schema-validation failure on classify", |
| 251 | + extra={ |
| 252 | + "raw_content": exc.raw_content, |
| 253 | + "failure": exc.failure_description, |
| 254 | + }, |
| 255 | + ) |
| 256 | + raise |
| 257 | + return {"classification": response.parsed} |
| 258 | +``` |
| 259 | + |
| 260 | +Callers wanting to retry validation failures specifically can |
| 261 | +construct a `RetryMiddleware` with a custom classifier that adds |
| 262 | +`structured_output_invalid` to the transient set. The default |
| 263 | +classifier won't do this for them. |
| 264 | + |
| 265 | +## Where to next |
| 266 | + |
| 267 | +- [Model Providers](../model-providers/index.md) for the provider |
| 268 | + contract, the shipped `OpenAIProvider`, and the canonical error |
| 269 | + categories. |
| 270 | +- [Authoring a Provider](../model-providers/authoring.md) for writing |
| 271 | + a provider against a non-OpenAI wire format (Anthropic Messages, |
| 272 | + Bedrock, internal gateway). |
| 273 | +- [API reference: `openarmature.llm`](../reference/llm.md) for the |
| 274 | + full surface: message types, `Response`, `RuntimeConfig`, every |
| 275 | + error class, validation helpers. |
| 276 | +- [Examples: `00-hello-world`](https://github.com/LunarCommand/openarmature-python/tree/main/examples/00-hello-world) |
| 277 | + for a runnable graph exercising both `response_schema` forms in one |
| 278 | + pipeline. |
0 commit comments