Skip to content

Commit 37fe7d4

Browse files
docs: complete reference, add middleware page (#57)
* docs: complete reference, add middleware page The mkdocstrings reference was silently omitting public members that had no docstring. Add docstrings to 35 affected members across graph, prompts, and checkpoint so every name in __all__ renders in the reference. Add a new Middleware concept page covering the protocol shape, the four registration sites, composition order, subgraph boundary, error semantics, and the built-in RetryMiddleware and TimingMiddleware. Place it between Composition and Fan-out in the nav since both reference middleware concepts. Export NextCall and default_classifier from openarmature.graph so custom middleware can type its `next_` parameter and extend the retry classifier without reaching into the submodule. Change FanOutNode.run and ParallelBranchesNode.run to raise NotImplementedError instead of RuntimeError. Both methods exist only to satisfy the Node protocol; the engine dispatches these node types through run_with_context. NotImplementedError is the right signal and stays backwards-compatible since it subclasses RuntimeError. Sweep em dashes from public docstrings (155 occurrences across 34 files). Docstrings render in mkdocstrings; comments don't, and are left untouched. * docs: fix em-dash sweep parenthetical artifacts The em-dash sweep replaced double em-dashes used as parenthetical brackets ("X -- inner -- Y") with two semicolons ("X; inner; Y"), producing grammatically broken sentences in five public docstrings. Switch the affected sites to parentheses or a colon/period restructuring as appropriate. Affected docstrings: - llm/messages.py (Message module: list-level invariants) - graph/subgraph.py (module: outer graph sees run(state: ParentT)) - graph/compiled.py (CompiledGraph: observer plumbing) - checkpoint/protocol.py (CheckpointRecord: frozen) - observability/otel/logs.py (install_log_bridge: filters note)
1 parent e910254 commit 37fe7d4

43 files changed

Lines changed: 589 additions & 167 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/concepts/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ the framework, or jump to whichever concept you need.
1010
- [Composition: conditional edges, subgraphs, projection](composition.md):
1111
routing decisions, encapsulated sub-pipelines, the parent ↔ subgraph
1212
data seam.
13+
- [Middleware](middleware.md): wrap node dispatch with retries,
14+
timing, logging, error transformation; per-node, per-graph,
15+
per-branch, and per-fan-out-instance registration.
1316
- [Fan-out](fan-out.md): running the same subgraph many times in
1417
parallel, results merged back deterministically.
1518
- [Parallel branches](parallel-branches.md): dispatching M

docs/concepts/middleware.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Middleware
2+
3+
Middleware wraps the dispatch of a single node. The shape is an async
4+
callable `(state, next) -> partial_update`. Anything you want to happen
5+
around a node, without changing the node itself, lives here: retries,
6+
timing, structured logging, request enrichment, error transformation,
7+
circuit-breaking.
8+
9+
```python
10+
from collections.abc import Mapping
11+
from typing import Any
12+
13+
from openarmature.graph import Middleware, NextCall
14+
15+
16+
class LogAround:
17+
async def __call__(self, state: Any, next_: NextCall) -> Mapping[str, Any]:
18+
print("before")
19+
partial = await next_(state)
20+
print("after")
21+
return partial
22+
23+
24+
_: Middleware = LogAround() # structural conformance check
25+
```
26+
27+
`next` invokes the next layer of the chain (or the wrapped node, at
28+
the innermost end) and returns the partial update from that layer.
29+
Code before `await next(state)` is the pre-node phase (runs on the way
30+
in); code after is the post-node phase (runs on the way out).
31+
32+
## Four registration sites
33+
34+
You can attach middleware at four places. The same `Middleware` shape
35+
works in all of them.
36+
37+
**Per-node**, on a single function node:
38+
39+
```python
40+
builder.add_node("fetch", fetch_fn, middleware=[RetryMiddleware()])
41+
```
42+
43+
**Per-graph**, applied to every node in the graph:
44+
45+
```python
46+
builder.add_middleware(TimingMiddleware(node_name="...", on_complete=record))
47+
```
48+
49+
**Per-branch**, on a single branch of a parallel-branches node:
50+
51+
```python
52+
from openarmature.graph import BranchSpec
53+
54+
branches = {
55+
"sentiment": BranchSpec(
56+
subgraph=sentiment_subgraph,
57+
middleware=(RetryMiddleware(),),
58+
),
59+
"topic": BranchSpec(subgraph=topic_subgraph),
60+
}
61+
builder.add_parallel_branches_node("classify", branches=branches)
62+
```
63+
64+
The branch middleware wraps the whole branch dispatch as one call. A
65+
retry on a branch retries the entire branch from scratch, not an
66+
individual node inside it.
67+
68+
**Per-fan-out-instance**, on the instance dispatch inside a fan-out
69+
node:
70+
71+
```python
72+
builder.add_fan_out_node(
73+
"summarize",
74+
subgraph=summarize_subgraph,
75+
items_field="articles",
76+
item_field="article",
77+
collect_field="article",
78+
target_field="summaries",
79+
instance_middleware=[RetryMiddleware()],
80+
)
81+
```
82+
83+
A retry here retries one instance, not the whole fan-out.
84+
85+
## Composition order
86+
87+
When a node has middleware from multiple sites, per-graph composes
88+
*outside* per-node. The runtime chain at a single function node is:
89+
90+
```
91+
[per_graph_outer_to_inner...] → [per_node_outer_to_inner...] → node
92+
```
93+
94+
The first middleware in `builder.add_middleware()` calls is the
95+
outermost layer; the last is closest to the node. Same rule for
96+
per-node: list order is outer-to-inner.
97+
98+
## The subgraph boundary
99+
100+
Middleware does not cross into a subgraph. The parent's middleware
101+
wraps the `SubgraphNode` dispatch as a single atomic call, and the
102+
subgraph's own middleware (configured on the child builder) wraps the
103+
child's internal nodes independently.
104+
105+
In practical terms: a `RetryMiddleware` on a subgraph-as-node retries
106+
the whole child graph from its entry. A `RetryMiddleware` inside the
107+
child retries one of its individual nodes.
108+
109+
## Error semantics
110+
111+
An exception raised by `next(state)` propagates up through `await
112+
next(state)`. Middleware may:
113+
114+
- **Re-raise**: the simplest case. Don't catch, let it bubble.
115+
- **Catch and recover**: catch the exception and return a partial
116+
update of your own. The rest of the chain continues as if the node
117+
had returned that partial update normally.
118+
- **Catch and transform**: catch one exception type, raise a different
119+
one. The new exception propagates up.
120+
- **Call `next` more than once**: this is what retry middleware does.
121+
122+
A middleware MUST NOT mutate the input `state` object in place. To
123+
hand a transformed state down the chain, pass a new state instance to
124+
`next(...)`.
125+
126+
## Built-in: RetryMiddleware
127+
128+
```python
129+
from openarmature.graph import RetryMiddleware, exponential_jitter_backoff
130+
131+
132+
async def on_retry(exc: Exception, attempt: int) -> None:
133+
log.warning("retrying after %r (attempt %d)", exc, attempt)
134+
135+
136+
retry = RetryMiddleware(
137+
max_attempts=3,
138+
backoff=exponential_jitter_backoff,
139+
on_retry=on_retry,
140+
)
141+
```
142+
143+
Four plug points, all optional:
144+
145+
- **`max_attempts`** is the total attempt count including the first
146+
call. `1` disables retry. Default `3`.
147+
- **`classifier`** is a predicate `(exception, state) -> bool`.
148+
The default (`default_classifier`, importable from
149+
`openarmature.graph`) treats any exception with a `category`
150+
attribute matching the project's `TRANSIENT_CATEGORIES` set as
151+
transient. To retry on additional types, write a classifier that
152+
delegates to `default_classifier` and falls back to your own check.
153+
- **`backoff`** is a callable `(attempt_index) -> seconds`. The default
154+
is exponential with jitter (base 1s, cap 30s, full jitter).
155+
`deterministic_backoff(seconds)` is provided for tests.
156+
- **`on_retry`** is an optional async callback `(exception, attempt)
157+
-> None`. Fires before each sleep. Useful for emitting a structured
158+
"about to retry" event.
159+
160+
A retry's attempt counter propagates as a context variable to every
161+
node event emitted from within the retry, including nodes inside
162+
subgraphs and branches that the retry wraps transitively. So an
163+
observer logging a retried node sees `attempt=1`, `attempt=2`, etc. on
164+
the inner events.
165+
166+
## Built-in: TimingMiddleware
167+
168+
```python
169+
from openarmature.graph import TimingMiddleware, TimingRecord
170+
171+
172+
async def record(rec: TimingRecord) -> None:
173+
metrics.histogram("node_duration_ms", rec.duration_ms, tags={
174+
"node": rec.node_name,
175+
"outcome": rec.outcome,
176+
})
177+
178+
179+
builder.add_node(
180+
"fetch",
181+
fetch_fn,
182+
middleware=[TimingMiddleware(node_name="fetch", on_complete=record)],
183+
)
184+
```
185+
186+
`TimingMiddleware` records the wrapped chain's duration with a
187+
monotonic clock and delivers a `TimingRecord` to your async callback.
188+
The record includes `node_name`, `duration_ms`, `outcome` (`"success"`
189+
or `"exception"`), and `exception_category` (the failing exception's
190+
`category` attribute when present).
191+
192+
Two implementation details worth knowing:
193+
194+
- The callback fires **inline** before the chain's result returns.
195+
Slow callbacks add to the apparent node duration. Keep them fast
196+
(queue work, defer I/O).
197+
- The clock is injectable per instance via the `clock` kwarg.
198+
Test fixtures use this to supply a deterministic stub without
199+
globally patching `time.monotonic` (which would also distort
200+
asyncio's scheduling).
201+
202+
## Related
203+
204+
- [Parallel branches](parallel-branches.md): per-branch middleware
205+
and its interaction with parent-graph middleware.
206+
- [Fan-out](fan-out.md): `instance_middleware` and how it composes
207+
with parent and node-level layers.
208+
- [LLMs](llms.md): how transient-classification flows from provider
209+
errors into `RetryMiddleware`'s default classifier.
210+
- [Observability](observability.md): observer events emitted around
211+
middleware-wrapped nodes carry the retry attempt index.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ nav:
106106
- State and reducers: concepts/state-and-reducers.md
107107
- Graphs: concepts/graphs.md
108108
- Composition: concepts/composition.md
109+
- Middleware: concepts/middleware.md
109110
- Fan-out: concepts/fan-out.md
110111
- Parallel branches: concepts/parallel-branches.md
111112
- LLMs: concepts/llms.md

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""OpenArmature workflow framework for LLM pipelines and tool-calling agents."""
1+
"""OpenArmature: workflow framework for LLM pipelines and tool-calling agents."""
22

33
__version__ = "0.6.0"
44
__spec_version__ = "0.16.1"

src/openarmature/checkpoint/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# internal + fan-out nodes per §10.3.
77
# - Resume via ``invoke(resume_invocation=...)`` restores per §10.4.
88

9-
"""openarmature.checkpoint checkpointing capability.
9+
"""openarmature.checkpoint: checkpointing capability.
1010
1111
Public surface: the typed :class:`Checkpointer` Protocol,
1212
:class:`CheckpointRecord` / :class:`NodePosition` /

src/openarmature/checkpoint/backends/memory.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"""In-memory Checkpointer.
44
55
Keeps records in a Python ``dict`` keyed by ``invocation_id``. NOT
6-
durable across process crashes useful for tests, short-lived runs,
6+
durable across process crashes; useful for tests, short-lived runs,
77
and development. Accepts any state shape (the dict holds the
88
:class:`CheckpointRecord` directly; nothing is serialized).
99
"""
@@ -26,7 +26,7 @@ class InMemoryCheckpointer:
2626
2727
**State shape:** any. The record is held by reference, so the
2828
Pydantic state instance the engine produces is what comes back
29-
from :meth:`load` no serialization round-trip. (This is the
29+
from :meth:`load`; no serialization round-trip. (This is the
3030
feature: tests can assert on the saved state's identity.)
3131
3232
**State-migration eligibility:** none. Per spec §10.12.1, a
@@ -52,14 +52,23 @@ def __init__(self) -> None:
5252
self._lock = asyncio.Lock()
5353

5454
async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
55+
"""Store ``record`` under ``invocation_id``, replacing any
56+
previous record for the same id. Not durable across process
57+
restarts."""
5558
async with self._lock:
5659
self._records[invocation_id] = record
5760

5861
async def load(self, invocation_id: str) -> CheckpointRecord | None:
62+
"""Return the saved record for ``invocation_id`` or ``None``
63+
if nothing has been saved under that id."""
5964
async with self._lock:
6065
return self._records.get(invocation_id)
6166

6267
async def list(self, filter: CheckpointFilter | None = None) -> Iterable[CheckpointSummary]:
68+
"""Enumerate stored invocations as :class:`CheckpointSummary`
69+
rows. With ``filter.correlation_id`` set, restricts the
70+
results to invocations carrying that correlation id;
71+
otherwise returns all rows."""
6372
async with self._lock:
6473
records = list(self._records.values())
6574
summaries = [
@@ -76,6 +85,8 @@ async def list(self, filter: CheckpointFilter | None = None) -> Iterable[Checkpo
7685
return [s for s in summaries if s.correlation_id == filter.correlation_id]
7786

7887
async def delete(self, invocation_id: str) -> None:
88+
"""Remove the record for ``invocation_id``. No-op when nothing
89+
is saved under that id (no error)."""
7990
async with self._lock:
8091
self._records.pop(invocation_id, None)
8192

src/openarmature/checkpoint/backends/sqlite.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,20 @@
44
55
Persists records to a SQLite database with WAL mode enabled. Durable
66
across process crashes within a single host. One row per
7-
``invocation_id`` (upsert retention overwritten on every save).
7+
``invocation_id`` (upsert retention; overwritten on every save).
88
99
**Serialization knobs:**
1010
11-
- ``"pickle"`` (default) accepts any pickleable state shape.
11+
- ``"pickle"`` (default): accepts any pickleable state shape.
1212
Python-only on the read side; a TypeScript reimplementation cannot
1313
decode pickle blobs.
14-
- ``"json"`` accepts only JSON-native state shapes (Pydantic
14+
- ``"json"``: accepts only JSON-native state shapes (Pydantic
1515
``model_dump(mode="json")`` output). Cross-language portable; if
1616
the user wants to read python-written records from a TypeScript
1717
consumer (or vice versa), this is the choice.
1818
1919
Choose deliberately at construction time; the same database file
20-
MUST be read with the same serialization mode it was written with
20+
MUST be read with the same serialization mode it was written with;
2121
mismatches surface as :class:`CheckpointRecordInvalid` on
2222
:meth:`load`.
2323
@@ -87,7 +87,7 @@ def _to_json_native(obj: Any) -> Any:
8787
class SQLiteCheckpointer:
8888
"""SQLite Checkpointer with WAL-mode durability.
8989
90-
**Retention:** upsert one row per ``invocation_id``, overwritten
90+
**Retention:** upsert; one row per ``invocation_id``, overwritten
9191
on every save. Saved records are NOT historical: only the most
9292
recent save for any given ``invocation_id`` is retained.
9393
@@ -167,6 +167,11 @@ def _decode(self, blob: bytes, recorded_mode: str, invocation_id: str) -> Any:
167167
# ------------------------------------------------------------------
168168

169169
async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
170+
"""Upsert ``record`` under ``invocation_id``. The state,
171+
completed positions, and parent-state stack are serialized via
172+
the configured :class:`SerializationMode` and written in a
173+
single statement. Writes are durable on return (WAL mode,
174+
per-write fsync at the SQLite layer)."""
170175
await self._ensure_initialized()
171176
state_blob = self._encode(record.state)
172177
positions_blob = self._encode([asdict(p) for p in record.completed_positions])
@@ -207,6 +212,11 @@ def _do() -> None:
207212
await asyncio.to_thread(_do)
208213

209214
async def load(self, invocation_id: str) -> CheckpointRecord | None:
215+
"""Return the saved record for ``invocation_id`` or ``None``
216+
when no row exists. The serialization mode stored with the
217+
row is used to decode the blobs back, so a database written
218+
with one mode can still be loaded after the backend has been
219+
reconfigured."""
210220
await self._ensure_initialized()
211221

212222
def _do() -> tuple[Any, ...] | None:
@@ -266,6 +276,11 @@ def _do() -> tuple[Any, ...] | None:
266276
)
267277

268278
async def list(self, filter: CheckpointFilter | None = None) -> Iterable[CheckpointSummary]:
279+
"""Enumerate saved invocations as :class:`CheckpointSummary`
280+
rows, ordered by ``last_saved_at`` ascending. With
281+
``filter.correlation_id`` set the SQL query is constrained at
282+
the database (indexed lookup); without a filter the full
283+
table is returned."""
269284
await self._ensure_initialized()
270285

271286
def _do() -> list[tuple[Any, ...]]:
@@ -308,6 +323,8 @@ def _do() -> list[tuple[Any, ...]]:
308323
return summaries
309324

310325
async def delete(self, invocation_id: str) -> None:
326+
"""Remove the row for ``invocation_id``. No-op when no row
327+
exists (no error). The delete is durable on return."""
311328
await self._ensure_initialized()
312329

313330
def _do() -> None:

0 commit comments

Comments
 (0)