Skip to content

Commit 1ae405b

Browse files
docs: structured output concepts page + model-providers updates
Adds docs/concepts/llms.md covering how LLM calls fit into the graph model: LLM calls as async IO inside nodes, structured output (both response_schema forms + native/fallback wire paths + strict mode), routing on parsed fields, and errors at the LLM boundary. Nav entry added to mkdocs.yml's Concepts section; concepts/index.md TOC extended. Updates docs/model-providers/index.md: Protocol signature now shows the response_schema parameter; errors table adds StructuredOutputInvalid; new Structured output section walks through both response_schema forms, the native/fallback wire paths, and strict-mode constraints. Updates docs/model-providers/authoring.md: skeleton's complete() signature now matches the Protocol (response_schema parameter); a new "Structured output" entry in Beyond the skeleton points custom- provider authors at validate_response_schema and strict_mode_supported. mkdocs builds clean in strict mode; the runnable example in the new Structured output section is verified by tests/test_docs_examples.py.
1 parent c9326e8 commit 1ae405b

5 files changed

Lines changed: 394 additions & 12 deletions

File tree

docs/concepts/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ the framework, or jump to whichever concept you need.
1212
data seam.
1313
- [Fan-out](fan-out.md): running the same subgraph many times in
1414
parallel, results merged back deterministically.
15+
- [LLMs](llms.md): how LLM calls fit into nodes, structured output,
16+
routing on parsed fields, errors at the LLM boundary.
1517
- [Observability](observability.md): node-boundary hooks, OTel mapping,
1618
log correlation.
1719
- [Checkpointing](checkpointing.md): save state at each node boundary,

docs/concepts/llms.md

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# LLMs
2+
3+
The graph engine has no concept of LLMs or tools. A node is just an
4+
async function that reads typed state and returns a partial update.
5+
Calling an LLM is one of the things a node can do during that call, the
6+
same way it might read a file, hit a database, or invoke an internal
7+
service. This page covers the patterns that emerge once you start
8+
mixing LLM calls into graph nodes.
9+
10+
## LLM calls are async IO inside a node
11+
12+
Construct one [`Provider`](../reference/llm.md) at startup and share it
13+
across nodes. Each `complete()` call carries the full message list and
14+
returns a [`Response`](../reference/llm.md); the provider is stateless
15+
and reentrant, so multiple nodes (or fan-out instances) can call into
16+
it concurrently without coordination.
17+
18+
```python
19+
import os
20+
from openarmature.llm import OpenAIProvider, UserMessage
21+
22+
provider = OpenAIProvider(
23+
base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"),
24+
model="gpt-4o-mini",
25+
api_key=os.environ["LLM_API_KEY"],
26+
)
27+
28+
29+
async def analyze(state: AnalysisState) -> dict:
30+
response = await provider.complete(
31+
[UserMessage(content=state.text)],
32+
)
33+
return {"raw": response.message.content}
34+
```
35+
36+
The provider goes wherever your application's other long-lived
37+
dependencies go: module-level constant, dependency-injection
38+
container, factory function. It does not need to be constructed per
39+
call, and constructing it cheaply (no eager network calls) means
40+
import-time setup is fine.
41+
42+
A real graph hits LLMs from multiple nodes. The conventional shape:
43+
44+
```python
45+
async def classify(state): # one provider call
46+
response = await provider.complete(...)
47+
return {...}
48+
49+
async def extract(state): # another provider call
50+
response = await provider.complete(...)
51+
return {...}
52+
53+
async def synthesize(state): # a third
54+
response = await provider.complete(...)
55+
return {...}
56+
```
57+
58+
The graph composes the order; the provider sees three independent
59+
stateless calls. Conversational memory (if you want it) is the
60+
caller's responsibility: thread it through state and pass the
61+
accumulated message list into each call.
62+
63+
## Structured output
64+
65+
Every LLM-using node that produces typed data ends up with the same
66+
shape: render a prompt, call the model, parse the response as JSON,
67+
validate it against the expected schema, retry on parse or validation
68+
failure. Five steps of boilerplate that differ only in the schema and
69+
the prompt.
70+
71+
Structured output collapses that into one parameter. Pass a
72+
`response_schema` to `complete()` and the provider:
73+
74+
1. Tells the model on the wire to produce schema-conforming output.
75+
2. Parses and validates the response against the schema.
76+
3. Surfaces the validated value on `Response.parsed`.
77+
4. Raises `StructuredOutputInvalid` on parse or validation failure.
78+
79+
Two forms are accepted: a Pydantic class (typed-instance return) and a
80+
JSON Schema dict (raw-dict return). Same wire shape underneath.
81+
82+
### Pydantic class form
83+
84+
```python
85+
from pydantic import BaseModel
86+
87+
class Classification(BaseModel):
88+
intent: Literal["research", "summarize"]
89+
rationale: str
90+
91+
92+
async def classify(state):
93+
response = await provider.complete(
94+
[UserMessage(content=f"Route this query: {state.query!r}")],
95+
response_schema=Classification,
96+
)
97+
return {"classification": response.parsed}
98+
```
99+
100+
`Response.parsed` is a validated `Classification` instance. Field
101+
access is statically typed (`response.parsed.intent` returns
102+
`Literal["research", "summarize"]`); the framework calls
103+
`.model_json_schema()` under the hood to derive the wire body and
104+
`.model_validate()` to deserialize the response.
105+
106+
### JSON Schema dict form
107+
108+
```python
109+
async def research(state):
110+
response = await provider.complete(
111+
[UserMessage(content=f"Plan research: {state.query!r}")],
112+
response_schema={
113+
"type": "object",
114+
"properties": {
115+
"topics": {"type": "array", "items": {"type": "string"}},
116+
"follow_up_questions": {"type": "array", "items": {"type": "string"}},
117+
},
118+
"required": ["topics", "follow_up_questions"],
119+
"additionalProperties": False,
120+
},
121+
)
122+
return {"research_plan": response.parsed}
123+
```
124+
125+
`Response.parsed` is a `dict[str, Any]` populated per the schema. Use
126+
this when the shape is dynamic, generated, or borrowed from another
127+
system that already speaks JSON Schema.
128+
129+
### Wire paths: native and fallback
130+
131+
Real `OpenAIProvider` traffic uses OpenAI's native `response_format`
132+
field on the request body, so the model produces schema-conforming
133+
output in one trip. Some OpenAI-compatible servers (older vLLM, some
134+
LM Studio releases, llama.cpp variants) either reject `response_format`
135+
with a 400 or silently ignore it. For those, construct the provider
136+
with `force_prompt_augmentation_fallback=True`:
137+
138+
```python
139+
provider = OpenAIProvider(
140+
base_url="http://localhost:8000",
141+
model="some-local-model",
142+
force_prompt_augmentation_fallback=True, # opt into the fallback
143+
)
144+
```
145+
146+
In fallback mode the provider prepends a system directive containing
147+
the serialized schema, omits `response_format` from the wire, and
148+
parses-and-validates the response post-receive. The behavioral contract
149+
is identical: `Response.parsed` populates the same way; failures raise
150+
`StructuredOutputInvalid` the same way. The
151+
`uses_prompt_augmentation_fallback` read-only property lets callers
152+
inspect which path is active.
153+
154+
### Strict mode
155+
156+
OpenAI's native path supports a `strict: true` flag that engages the
157+
model's schema-constrained decoding (the model literally cannot emit
158+
non-conforming tokens). It applies only when the schema satisfies
159+
specific constraints: `additionalProperties` explicitly `false` on every
160+
object, every key in `properties` listed in `required`, no
161+
unresolvable `$ref` targets.
162+
163+
`strict_mode_supported(schema)` performs the deep recursive check. The
164+
provider passes `strict: true` to the wire when the schema satisfies
165+
it, and `strict: false` otherwise. Either way, the provider validates
166+
the response post-receive against the supplied schema. Strict is a
167+
wire-level optimization, not a correctness requirement.
168+
169+
If you control the schema, prefer making it strict-compatible:
170+
explicit `additionalProperties: false` plus `required` covering every
171+
property. Pydantic-derived schemas may need a tweak to satisfy this
172+
(`model_config = ConfigDict(extra="forbid")` on the class).
173+
174+
## Routing on parsed fields
175+
176+
A conditional edge is a function `state -> str` that names the next
177+
node. The string can come from anywhere: a hard-coded rule, a lookup
178+
table, the parsed output of an LLM call. The graph engine doesn't
179+
distinguish.
180+
181+
This means LLM-driven routing and deterministic routing have the same
182+
shape. A classifier node writes its parsed `Classification` to state;
183+
the conditional edge reads `state.classification.intent` and returns
184+
that string. The branches don't know whether the LLM or a regex
185+
produced the discriminator.
186+
187+
```python
188+
async def classify(state):
189+
response = await provider.complete(
190+
[UserMessage(content=f"Route: {state.query!r}")],
191+
response_schema=Classification,
192+
)
193+
return {"classification": response.parsed}
194+
195+
196+
def route(state) -> str:
197+
return state.classification.intent
198+
199+
200+
builder.add_conditional_edge("classify", route)
201+
```
202+
203+
The same `route` function could read a feature flag, a config lookup,
204+
or `"research" if "?" in state.query else "summarize"`. The branch
205+
nodes don't change. Swapping a rule-based router for an LLM-based one
206+
is a one-node change.
207+
208+
## Errors at the LLM boundary
209+
210+
Every provider call can fail. The
211+
[`openarmature.llm` reference](../reference/llm.md) lists the canonical
212+
error categories; this section covers how they compose with the rest
213+
of the graph.
214+
215+
**Transient categories** (retry MAY succeed):
216+
`ProviderRateLimit`, `ProviderUnavailable`, `ProviderModelNotLoaded`.
217+
These are the canonical "wrap a node in `RetryMiddleware`" set; the
218+
default classifier picks them up automatically via
219+
`TRANSIENT_CATEGORIES`.
220+
221+
**Non-transient categories** (retry without changing the request will
222+
not succeed): `ProviderAuthentication`, `ProviderInvalidModel`,
223+
`ProviderInvalidRequest`, `ProviderInvalidResponse`,
224+
`StructuredOutputInvalid`. These propagate up as `NodeException` so
225+
the graph's error-recovery middleware (or the caller of `invoke()`)
226+
can handle them.
227+
228+
`StructuredOutputInvalid` is the new one and worth a note. It fires
229+
when a model returns content that fails to parse as JSON, or parses
230+
but fails to validate against the supplied schema. The exception
231+
carries the requested `response_schema`, the `raw_content` the model
232+
produced, and a `failure_description`. It is non-transient by default
233+
because a model that emits non-conforming output on a given prompt
234+
usually emits the same non-conforming output on retry. Useful retry
235+
strategies for this case involve changing the prompt or doubling
236+
`max_tokens` rather than re-issuing the same call; that's a
237+
middleware concern, not the provider's default.
238+
239+
```python
240+
from openarmature.llm import StructuredOutputInvalid
241+
242+
async def classify_with_diagnostics(state):
243+
try:
244+
response = await provider.complete(
245+
[UserMessage(content=...)],
246+
response_schema=Classification,
247+
)
248+
except StructuredOutputInvalid as exc:
249+
log.warning(
250+
"schema-validation failure on classify",
251+
extra={
252+
"raw_content": exc.raw_content,
253+
"failure": exc.failure_description,
254+
},
255+
)
256+
raise
257+
return {"classification": response.parsed}
258+
```
259+
260+
Callers wanting to retry validation failures specifically can
261+
construct a `RetryMiddleware` with a custom classifier that adds
262+
`structured_output_invalid` to the transient set. The default
263+
classifier won't do this for them.
264+
265+
## Where to next
266+
267+
- [Model Providers](../model-providers/index.md) for the provider
268+
contract, the shipped `OpenAIProvider`, and the canonical error
269+
categories.
270+
- [Authoring a Provider](../model-providers/authoring.md) for writing
271+
a provider against a non-OpenAI wire format (Anthropic Messages,
272+
Bedrock, internal gateway).
273+
- [API reference: `openarmature.llm`](../reference/llm.md) for the
274+
full surface: message types, `Response`, `RuntimeConfig`, every
275+
error class, validation helpers.
276+
- [Examples: `00-hello-world`](https://github.com/LunarCommand/openarmature-python/tree/main/examples/00-hello-world)
277+
for a runnable graph exercising both `response_schema` forms in one
278+
pipeline.

docs/model-providers/authoring.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ from collections.abc import Sequence
2323
from typing import Any
2424

2525
import httpx
26+
from pydantic import BaseModel
2627
from openarmature.llm import (
2728
AssistantMessage,
2829
Message,
@@ -64,7 +65,13 @@ class MyProvider:
6465
messages: Sequence[Message],
6566
tools: Sequence[Tool] | None = None,
6667
config: RuntimeConfig | None = None,
68+
response_schema: dict[str, Any] | type[BaseModel] | None = None,
6769
) -> Response:
70+
# response_schema support is an optional capability; a skeleton
71+
# provider can raise ProviderInvalidRequest when it's set, or
72+
# ignore it and return free-form text. A production provider
73+
# would wire it through to native response_format support or
74+
# the prompt-augmentation fallback. See ``openarmature.llm.OpenAIProvider``.
6875
validate_message_list(messages)
6976
validate_tools(tools)
7077

@@ -183,6 +190,14 @@ of:
183190
- **Tool calls.** Wire-mapping the `tool_calls` array on
184191
`AssistantMessage` to the Provider's expected shape, parsing tool
185192
results back from `ToolMessage`s.
193+
- **Structured output.** Threading `response_schema` through the
194+
request body (native `response_format` if the underlying wire
195+
supports it; prompt-augmentation fallback otherwise) and validating
196+
the response against the schema before returning. Populate
197+
`Response.parsed` with the validated value;
198+
raise `StructuredOutputInvalid` on parse or validation failure.
199+
Use `validate_response_schema` and `strict_mode_supported` from
200+
`openarmature.llm` to share the provider-agnostic boundary checks.
186201
- **Observability spans.** Opt-in `started`/`completed` events
187202
around the wire call so the OTel observer can build LLM spans.
188203
- **Lenient response parsing** under `finish_reason="error"`.

0 commit comments

Comments
 (0)