Skip to content

Commit 8dde25a

Browse files
Add RetryConfig record for RetryMiddleware (#150)
* Add RetryConfig record for RetryMiddleware Replace RetryMiddleware's individual constructor kwargs with a single frozen RetryConfig (max_attempts / classifier / backoff / on_retry), constructed as RetryMiddleware(RetryConfig(...)). This is the shared record the upcoming call-level complete(retry=...) parameter will take, so one config serves both the per-node and per-call retry layers. The config fields are Optional and resolve to the canonical defaults (default_classifier / exponential_jitter_backoff) once in the consumer, preserving the prior None-means-default behavior so fixture-driven construction stays robust. Breaking change to the RetryMiddleware constructor; all call sites across tests, examples, and docs are migrated. First of two refactor PRs splitting proposal 0050's remaining work; call-level retry follows. * Guard RetryMiddleware against non-RetryConfig args From CoPilot review of PR #150: RetryMiddleware now takes a positional config, so a non-RetryConfig argument (e.g. RetryMiddleware(3)) would construct and then fail with a cryptic AttributeError at retry time. Raise TypeError eagerly in __init__ with the correct-usage idiom, and add a test.
1 parent 795c549 commit 8dde25a

14 files changed

Lines changed: 139 additions & 61 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
1010

1111
- **`FailureIsolationMiddleware`** (proposal 0050, pipeline-utilities §6.3). A third bundled middleware primitive alongside `RetryMiddleware` and `TimingMiddleware`. It catches exceptions escaping the wrapped node's inner chain and returns a configured degraded partial update, so a non-critical node can fail without aborting the whole invocation. Configuration: `degraded_update` (a static mapping or a `state -> partial_update` callable, resolved at catch time), `event_name` (required, no default, since a generic name makes downstream telemetry strictly worse), an optional `predicate` (`Exception -> bool`; only matching exceptions are caught, others propagate), and an optional async `on_caught` hook. It catches `Exception`; `BaseException` (cancellation) propagates, matching `RetryMiddleware`. On a catch it dispatches a new framework-emitted `FailureIsolatedEvent` (a distinct observer-event variant carrying `event_name`, the wrapped node's lineage identity, `pre_state` / `post_state`, and a `CaughtException` record of category plus message) onto the observer delivery queue; the bundled OTel and Langfuse observers render it as a marker span / observation. Compose it OUTER of `RetryMiddleware` for the "retry transients, degrade gracefully on exhaustion" pattern. Additive: existing pipelines see no behavior change, and the spec pin is unchanged (0050 is already within the v0.53.0 pin).
1212

13+
### Changed
14+
15+
- **`RetryMiddleware` now takes a `RetryConfig` record** instead of individual constructor kwargs (proposal 0050 prep). The four retry settings (`max_attempts` / `classifier` / `backoff` / `on_retry`, each optional) move onto a frozen `RetryConfig`; construct as `RetryMiddleware(RetryConfig(max_attempts=...))`, while bare `RetryMiddleware()` still applies the defaults. This is a breaking change to the `RetryMiddleware` constructor. The record is the same shape the upcoming call-level `complete(retry=...)` parameter will accept, so one retry config serves both the per-node and per-call layers. `None` fields resolve to the canonical defaults (`default_classifier` / `exponential_jitter_backoff`) at use, preserving the prior behavior.
16+
1317
## [0.13.0] — 2026-06-09
1418

1519
LLM provider hardening release. The pinned spec advances from v0.46.0 to v0.53.0, absorbing four implemented proposals. Proposal 0049 introduces the first spec-normatively-typed observer event variant, `LlmCompletionEvent`, dispatched on every successful LLM provider call; proposal 0058 adds the failure-side counterpart, `LlmFailedEvent`; proposal 0057 extends the completion variant with eight request-side fields. The bundled `OpenAIProvider` retires its sentinel-namespace `NodeEvent` emission for LLM calls entirely, and the OTel and Langfuse observers now drive their LLM span / Generation from the typed events with back-dated timestamps so durations reflect the adapter boundary. Proposal 0047 closes implicit prefix-cache wire-byte stability: `Response.usage` gains cache-stat fields, the OTel observer emits `openarmature.llm.cache_read` attributes, and the OpenAI Chat Completions request body is byte-stable across equivalent inputs regardless of dict insertion order. Custom observers that filtered LLM calls by sentinel namespace MUST migrate to `isinstance` discrimination; `LLM_NAMESPACE` and `LlmEventPayload` remain as a documented compatibility surface.

docs/concepts/middleware.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,21 +126,23 @@ hand a transformed state down the chain, pass a new state instance to
126126
## Built-in: RetryMiddleware
127127

128128
```python
129-
from openarmature.graph import RetryMiddleware, exponential_jitter_backoff
129+
from openarmature.graph import RetryConfig, RetryMiddleware, exponential_jitter_backoff
130130

131131

132132
async def on_retry(exc: Exception, attempt: int) -> None:
133133
log.warning("retrying after %r (attempt %d)", exc, attempt)
134134

135135

136136
retry = RetryMiddleware(
137-
max_attempts=3,
138-
backoff=exponential_jitter_backoff,
139-
on_retry=on_retry,
137+
RetryConfig(
138+
max_attempts=3,
139+
backoff=exponential_jitter_backoff,
140+
on_retry=on_retry,
141+
)
140142
)
141143
```
142144

143-
Four plug points, all optional:
145+
Configured with a `RetryConfig`; four fields, all optional:
144146

145147
- **`max_attempts`** is the total attempt count including the first
146148
call. `1` disables retry. Default `3`.
@@ -277,7 +279,7 @@ builder.add_node(
277279
degraded_update={"summary": ""},
278280
event_name="summary_degraded",
279281
),
280-
RetryMiddleware(max_attempts=3),
282+
RetryMiddleware(RetryConfig(max_attempts=3)),
281283
],
282284
)
283285
```

examples/fan-out-with-retry/main.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@
8484
append,
8585
)
8686
from openarmature.graph.middleware import (
87+
RetryConfig,
8788
RetryMiddleware,
8889
TimingMiddleware,
8990
TimingRecord,
@@ -261,10 +262,12 @@ def build_graph(error_policy: str = "fail_fast") -> CompiledGraph[BatchState]:
261262
headline_subgraph = build_headline_subgraph()
262263

263264
retry = RetryMiddleware(
264-
max_attempts=3,
265-
# Short fixed delay so the demo isn't slow. A production app would
266-
# use exponential_jitter_backoff (the default).
267-
backoff=deterministic_backoff(0.2),
265+
RetryConfig(
266+
max_attempts=3,
267+
# Short fixed delay so the demo isn't slow. A production app would
268+
# use exponential_jitter_backoff (the default).
269+
backoff=deterministic_backoff(0.2),
270+
)
268271
)
269272
timing = TimingMiddleware(
270273
node_name="headline_run",

examples/parallel-branches/main.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@
7676
append,
7777
)
7878
from openarmature.graph.middleware import (
79+
RetryConfig,
7980
RetryMiddleware,
8081
deterministic_backoff,
8182
)
@@ -268,8 +269,10 @@ def build_graph() -> CompiledGraph[ArticleState]:
268269
# the same policy on a longer summarize call (where a retry doubles
269270
# cost) or on a topic-extract that has different transient profile.
270271
sentiment_retry = RetryMiddleware(
271-
max_attempts=3,
272-
backoff=deterministic_backoff(0.2),
272+
RetryConfig(
273+
max_attempts=3,
274+
backoff=deterministic_backoff(0.2),
275+
)
273276
)
274277

275278
return (

src/openarmature/graph/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
FailureIsolationMiddleware,
5252
Middleware,
5353
NextCall,
54+
RetryConfig,
5455
RetryMiddleware,
5556
TimingMiddleware,
5657
TimingRecord,
@@ -115,6 +116,7 @@
115116
"Reducer",
116117
"ReducerError",
117118
"RemoveHandle",
119+
"RetryConfig",
118120
"RetryMiddleware",
119121
"RoutingError",
120122
"RuntimeGraphError",

src/openarmature/graph/middleware/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
BackoffStrategy,
2525
Classifier,
2626
OnRetryCallback,
27+
RetryConfig,
2728
RetryMiddleware,
2829
default_classifier,
2930
deterministic_backoff,
@@ -41,6 +42,7 @@
4142
"NextCall",
4243
"OnCompleteCallback",
4344
"OnRetryCallback",
45+
"RetryConfig",
4446
"RetryMiddleware",
4547
"TRANSIENT_CATEGORIES",
4648
"TimingMiddleware",

src/openarmature/graph/middleware/retry.py

Lines changed: 52 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
import asyncio
2121
import random
2222
from collections.abc import Awaitable, Callable, Mapping
23+
from dataclasses import dataclass
2324
from typing import Any
2425

2526
from openarmature.llm.errors import TRANSIENT_CATEGORIES
@@ -100,39 +101,63 @@ def fn(_attempt: int) -> float:
100101
OnRetryCallback = Callable[[Exception, int], Awaitable[None]]
101102

102103

103-
class RetryMiddleware:
104-
"""Canonical retry middleware.
105-
106-
Configuration:
104+
@dataclass(frozen=True)
105+
class RetryConfig:
106+
"""Canonical retry configuration record consumed by
107+
:class:`RetryMiddleware`.
107108
108109
- ``max_attempts``: total attempts including the first call. ``1``
109110
disables retry. Default ``3``.
110-
- ``classifier``: predicate ``(exception, state) -> bool``. Default
111-
:func:`default_classifier` (matches ``category`` against
111+
- ``classifier``: predicate ``(exception, state) -> bool`` deciding
112+
whether a failure is retry-eligible. ``None`` (the default)
113+
selects :func:`default_classifier` (matches ``category`` against
112114
``TRANSIENT_CATEGORIES``).
113-
- ``backoff``: callable ``(attempt_index) -> seconds``. Default
114-
:func:`exponential_jitter_backoff` (base 1s, cap 30s, full jitter).
115+
- ``backoff``: callable ``(attempt_index) -> seconds``. ``None``
116+
(the default) selects :func:`exponential_jitter_backoff` (base
117+
1s, cap 30s, full jitter).
115118
- ``on_retry``: optional async callback ``(exception, attempt_index)
116-
-> None``. Fires before each sleep.
119+
-> None`` fired before each backoff sleep.
117120
"""
118121

119-
def __init__(
120-
self,
121-
*,
122-
max_attempts: int = 3,
123-
classifier: Classifier | None = None,
124-
backoff: BackoffStrategy | None = None,
125-
on_retry: OnRetryCallback | None = None,
126-
) -> None:
127-
if max_attempts < 1:
122+
max_attempts: int = 3
123+
classifier: Classifier | None = None
124+
backoff: BackoffStrategy | None = None
125+
on_retry: OnRetryCallback | None = None
126+
127+
def __post_init__(self) -> None:
128+
if self.max_attempts < 1:
128129
raise ValueError("max_attempts must be >= 1")
129-
self.max_attempts = max_attempts
130-
self.classifier: Classifier = classifier or default_classifier
131-
self.backoff: BackoffStrategy = backoff or exponential_jitter_backoff
132-
self.on_retry: OnRetryCallback | None = on_retry
130+
131+
132+
class RetryMiddleware:
133+
"""Canonical retry middleware.
134+
135+
Configured with a :class:`RetryConfig` (or the default
136+
``RetryConfig()`` when omitted). Construct as
137+
``RetryMiddleware(RetryConfig(max_attempts=...))``.
138+
"""
139+
140+
def __init__(self, config: RetryConfig | None = None) -> None:
141+
if config is None:
142+
config = RetryConfig()
143+
# Defensive guard for untyped callers: the static type already
144+
# rules a non-RetryConfig out (pyright flags this as redundant),
145+
# but an eager TypeError beats a cryptic AttributeError when a
146+
# mistyped value (e.g. ``RetryMiddleware(3)``) reaches ``.config``.
147+
if not isinstance(config, RetryConfig): # pyright: ignore[reportUnnecessaryIsInstance]
148+
raise TypeError(
149+
f"RetryMiddleware expects a RetryConfig (or None); got "
150+
f"{type(config).__name__}. Construct as "
151+
f"RetryMiddleware(RetryConfig(max_attempts=...))."
152+
)
153+
self.config = config
133154

134155
async def __call__(self, state: Any, next_: NextCall) -> Mapping[str, Any]:
135156
attempt = 0
157+
# ``None`` config fields select the canonical defaults; resolve
158+
# once here so the loop works against concrete callables.
159+
classifier = self.config.classifier or default_classifier
160+
backoff = self.config.backoff or exponential_jitter_backoff
136161
# Spec observability §3.4 per-attempt scoping: each retry
137162
# attempt sees only the metadata in scope at retry-loop entry
138163
# ("pre-attempt baseline") plus that attempt's own writes;
@@ -176,11 +201,11 @@ async def __call__(self, state: Any, next_: NextCall) -> Mapping[str, Any]:
176201
# metadata for the error span) sees the baseline,
177202
# not the failed attempt's transient state.
178203
_reset_invocation_metadata(metadata_token)
179-
if attempt + 1 >= self.max_attempts or not self.classifier(exc, state):
204+
if attempt + 1 >= self.config.max_attempts or not classifier(exc, state):
180205
raise
181-
if self.on_retry is not None:
182-
await self.on_retry(exc, attempt)
183-
await asyncio.sleep(self.backoff(attempt))
206+
if self.config.on_retry is not None:
207+
await self.config.on_retry(exc, attempt)
208+
await asyncio.sleep(backoff(attempt))
184209
attempt += 1
185210
except BaseException:
186211
# Cancellation path. `CancelledError` (or other
@@ -202,6 +227,7 @@ async def __call__(self, state: Any, next_: NextCall) -> Mapping[str, Any]:
202227
"BackoffStrategy",
203228
"Classifier",
204229
"OnRetryCallback",
230+
"RetryConfig",
205231
"RetryMiddleware",
206232
"TRANSIENT_CATEGORIES",
207233
"default_classifier",

tests/conformance/test_observability.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -678,7 +678,7 @@ async def _run_fixture_007_case(case: Mapping[str, Any]) -> None:
678678
from opentelemetry.trace import StatusCode
679679

680680
from openarmature.graph import RuntimeGraphError
681-
from openarmature.graph.middleware import RetryMiddleware
681+
from openarmature.graph.middleware import RetryConfig, RetryMiddleware
682682
from openarmature.graph.middleware.retry import deterministic_backoff
683683

684684
observer, exporter = _build_observer()
@@ -725,9 +725,11 @@ def _classifier(exc: Exception, _state: Any, _transient: frozenset[str] = transi
725725
classifier_fn = None
726726
node_middleware.setdefault(flaky_node_name, []).append(
727727
RetryMiddleware(
728-
max_attempts=int(mw_spec.get("max_attempts", 3)),
729-
backoff=backoff,
730-
classifier=classifier_fn,
728+
RetryConfig(
729+
max_attempts=int(mw_spec.get("max_attempts", 3)),
730+
backoff=backoff,
731+
classifier=classifier_fn,
732+
)
731733
)
732734
)
733735

tests/conformance/test_pipeline_utilities.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
from openarmature.graph.middleware import (
3131
Middleware,
3232
OnCompleteCallback,
33+
RetryConfig,
3334
RetryMiddleware,
3435
TimingMiddleware,
3536
TimingRecord,
@@ -234,9 +235,11 @@ def _build_middleware(
234235
classifier_cfg = config.get("classifier")
235236
classifier = _build_classifier(classifier_cfg) if classifier_cfg is not None else None
236237
return RetryMiddleware(
237-
max_attempts=int(config.get("max_attempts", 3)),
238-
backoff=backoff,
239-
classifier=classifier,
238+
RetryConfig(
239+
max_attempts=int(config.get("max_attempts", 3)),
240+
backoff=backoff,
241+
classifier=classifier,
242+
)
240243
)
241244
if mw_type == "timing":
242245
on_complete_cfg = cast("dict[str, Any]", config.get("on_complete") or {})

tests/unit/test_failure_isolation_middleware.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
FailureIsolationMiddleware,
2323
GraphBuilder,
2424
ObserverEvent,
25+
RetryConfig,
2526
RetryMiddleware,
2627
State,
2728
append,
@@ -290,7 +291,7 @@ async def _flaky(_s: _DocState) -> Mapping[str, Any]:
290291
degraded_update={"note": "gave_up"},
291292
event_name="flaky_failed",
292293
),
293-
RetryMiddleware(max_attempts=3, backoff=deterministic_backoff(0.0)),
294+
RetryMiddleware(RetryConfig(max_attempts=3, backoff=deterministic_backoff(0.0))),
294295
],
295296
)
296297
.add_edge("flaky", END)

0 commit comments

Comments
 (0)