diff --git a/CHANGELOG.md b/CHANGELOG.md index 50885b4..83f810a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,7 +17,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The - **Failure-isolation events report the originating cause's category at non-node placements** (proposal 0065, pipeline-utilities §6.3). When `FailureIsolationMiddleware` runs as instance middleware (§9.7), branch middleware (§11.7), or parent-node middleware on a fan-out / parallel-branches node, the graph engine has already wrapped the originating error as a `node_exception` carrier before the middleware catches it. `FailureIsolatedEvent.caught_exception.category` now resolves through that carrier (and any nested carriers) to the nearest categorized originating cause and reports its category instead of the masking `node_exception`, so the reported category agrees with what the §6.1 retry classifier acted on. For example, an instance whose retries exhaust on `provider_unavailable` now surfaces `provider_unavailable` rather than `node_exception`. The `message` tracks the resolved cause for category/message coherence. Node-level placement was already faithful and is unchanged, and catch/degrade behavior is unchanged at every site (only the event's reported cause changes). The wrapped-instance/branch lineage SHOULD (`fan_out_index` / `branch_name`) is deferred to a follow-up, since it needs the engine to surface per-instance identity to the wrapping-site middleware. `conformance.toml` marks proposal 0065 `implemented`, and conformance fixture 064 (three cases: the §9.7 instance and §11.7 branch sites plus an uncategorized cause) passes. - **Observer privacy flag `disable_llm_payload` renamed to `disable_provider_payload`** (proposal 0059, observability §5.5.4, spec v0.54.0). The observer-level flag on both bundled observers (`OTelObserver` and `LangfuseObserver`) is renamed, and its scope broadens from LLM-completion payload to any provider-call payload (LLM completion today; embedding and rerank when those land). This is a breaking change to both observer constructors: config passing `disable_llm_payload=True` (or `False`) updates to `disable_provider_payload=...` with no other change. The default stays `True` (payload suppressed), and the gating behavior for `LlmCompletionEvent` / `LlmFailedEvent` rendering is unchanged at every existing site. The rename is the only part of proposal 0059 adopted this cycle: the retrieval-provider capability itself (the `EmbeddingProvider` protocol, the `EmbeddingEvent` / `EmbeddingFailedEvent` typed variants, and the embedding span / observation mapping) is not yet implemented and rides as `not-yet` in `conformance.toml`. The §5.5.4 rename touches existing LLM-payload gating, so it lands with the pin. - **Fan-out failure-isolation degrade contribution implemented** (proposal 0066, pipeline-utilities §9.3 / §9.8 / §11.7, spec v0.56.0). When `FailureIsolationMiddleware` degrades a fan-out instance, that instance is a success whose contribution is its `degraded_update`, read in subgraph-field-name space and never merged onto the failed instance's pre-failure state. This also fixes a latent bug: an instance `degraded_update`'s `extra_outputs` values were previously looked up by the parent field name and silently dropped (`collect_field` was unaffected). A static `degraded_update` that omits the node's `collect_field` is now a compile-time error (`FanOutDegradedUpdateMissingCollectField`); a callable `degraded_update` that omits it yields a graceful null slot rather than raising, preserving one collection slot per item. The parallel-branches counterpart (a branch `degraded_update` omitting a projected `outputs` field skips that field) was already correct as of the parallel-branches fix above and is now pinned by fixture 065. Success-path and resume behavior for correctly-configured fan-outs is unchanged. -- **Pinned spec advances v0.53.0 → v0.56.0 across the v0.14.0 cycle**, in three steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), and v0.56.0 (proposal 0066, the fan-out degrade contribution above). `conformance.toml` records 0065 and 0066 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted). +- **Failure-isolation events carry the full structured cause chain** (proposal 0068, pipeline-utilities §6.3, spec v0.57.0). `FailureIsolatedEvent.caught_exception` gains a `chain`: an ordered list of `CauseLink` records (each carrying `category`, `message`, and a `carrier` flag), from the caught exception (outermost) to the originating raise (innermost), with graph-engine `node_exception` carrier wrappers flagged `carrier=True`. The existing `category` and `message` are retained and redefined as a derivation over the chain: the category of the outermost non-carrier link whose category is a non-empty string (else `category` is `null` and `message` is the outermost non-carrier link's message). This supersedes proposal 0065's single "originating cause" representation, which was ambiguous once the post-carrier chain held more than one non-carrier link; the derivation reproduces 0065's single-carrier values, so fixture 064 is unchanged. A new `CauseLink` type is exported from `openarmature.graph`. The bundled OTel and Langfuse observers continue to render the derived `category`; surfacing the full chain is left to custom observers. The change is additive to the event shape, and catch/degrade behavior is unchanged. Conformance fixture 066 (three cases: an instance-site carrier chain, a node-level single non-carrier link, and an uncategorized null-category cause) passes. +- **Pinned spec advances v0.53.0 → v0.57.0 across the v0.14.0 cycle**, in four steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), v0.56.0 (proposal 0066, the fan-out degrade contribution above), and v0.57.0 (proposal 0068, the failure-isolation cause chain above). `conformance.toml` records 0065, 0066, and 0068 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted). ### Fixed diff --git a/conformance.toml b/conformance.toml index 45a4120..1b02c5d 100644 --- a/conformance.toml +++ b/conformance.toml @@ -32,7 +32,7 @@ [manifest] implementation = "openarmature-python" -spec_pin = "v0.56.0" +spec_pin = "v0.57.0" # Status values: # implemented — shipped behavior matches the proposal's contract @@ -635,3 +635,16 @@ since = "0.14.0" [proposals."0066"] status = "implemented" since = "0.14.0" + +# Spec v0.57.0 (proposal 0068). Failure-isolation event structured cause +# chain (pipeline-utilities §6.3). ``caught_exception`` gains a ``chain`` of +# cause links (``{category, message, carrier}``, outermost->innermost), with +# graph-engine §4 ``node_exception`` carriers flagged. The existing +# ``category`` / ``message`` are redefined as a derivation over the chain (the +# outermost non-carrier link carrying a category), superseding 0065's single +# "originating cause" prose; the derivation reproduces 0065's values, so +# fixture 064 is unchanged. Fixture 066 (three cases: instance-site carrier +# chain, node-level single link, uncategorized null category) passes. +[proposals."0068"] +status = "implemented" +since = "0.14.0" diff --git a/docs/concepts/middleware.md b/docs/concepts/middleware.md index c009fed..857b8a6 100644 --- a/docs/concepts/middleware.md +++ b/docs/concepts/middleware.md @@ -256,9 +256,12 @@ Like `RetryMiddleware`, it catches `Exception` only; `BaseException` On a catch, the middleware dispatches a `FailureIsolatedEvent` onto the observer stream. It is a distinct event variant, not a node event: it carries the `event_name`, the wrapped node's lineage identity, the input -and degraded states, and a `CaughtException` record holding the -exception's `category` (when it has one) and message. Observers narrow -on it with `isinstance(event, FailureIsolatedEvent)`. The bundled OTel +and degraded states, and a `CaughtException` record. That record holds a +derived `category` (when the cause has one) and `message` for simple +consumers, plus a `chain` of cause links (`CauseLink`) from the caught +exception down to the originating raise, with graph-engine carrier +wrappers flagged so a consumer can skip them. Observers narrow on it with +`isinstance(event, FailureIsolatedEvent)`. The bundled OTel and Langfuse observers render it as a marker span / observation so the catch shows up alongside the node's own span. The default emission path is the observer stream only, with no logging-library dependency; diff --git a/openarmature-spec b/openarmature-spec index 1e68282..f14d615 160000 --- a/openarmature-spec +++ b/openarmature-spec @@ -1 +1 @@ -Subproject commit 1e68282ce4cdcc8236be886d2143b0e7867f69bf +Subproject commit f14d6158584999318a351358909e3b96e8addece diff --git a/pyproject.toml b/pyproject.toml index e13cbc0..9bf65d3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec" openarmature = "openarmature.cli:main" [tool.openarmature] -spec_version = "0.56.0" +spec_version = "0.57.0" [dependency-groups] dev = [ diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md index 4ccdfb9..80c0079 100644 --- a/src/openarmature/AGENTS.md +++ b/src/openarmature/AGENTS.md @@ -1,6 +1,6 @@ # OpenArmature — Agent documentation -*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.56.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.* +*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.57.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.* ## TL;DR @@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents: ## Capability contracts -_Sourced from openarmature-spec v0.56.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._ +_Sourced from openarmature-spec v0.57.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._ ### Capability: `graph-engine` diff --git a/src/openarmature/__init__.py b/src/openarmature/__init__.py index 538bfae..4498c6d 100644 --- a/src/openarmature/__init__.py +++ b/src/openarmature/__init__.py @@ -25,7 +25,7 @@ """ __version__ = "0.13.0" -__spec_version__ = "0.56.0" +__spec_version__ = "0.57.0" # Proposal 0052 (spec observability §5.1 / §8.4.1): canonical # package-registry name for this implementation. Surfaces on every # OTel invocation span as ``openarmature.implementation.name`` and on diff --git a/src/openarmature/graph/__init__.py b/src/openarmature/graph/__init__.py index 8a5f3c9..6d071f8 100644 --- a/src/openarmature/graph/__init__.py +++ b/src/openarmature/graph/__init__.py @@ -38,6 +38,7 @@ ) from .events import ( CaughtException, + CauseLink, FailureIsolatedEvent, InvocationCompletedEvent, InvocationStartedEvent, @@ -71,6 +72,7 @@ __all__ = [ "END", "CaughtException", + "CauseLink", "CompileError", "CompiledGraph", "ConditionalEdge", diff --git a/src/openarmature/graph/events.py b/src/openarmature/graph/events.py index e8eaf52..462c883 100644 --- a/src/openarmature/graph/events.py +++ b/src/openarmature/graph/events.py @@ -659,20 +659,52 @@ class LlmFailedEvent: caller_invocation_metadata: Mapping[str, AttributeValue] | None = None +# Spec: pipeline-utilities §6.3 cause chain (proposal 0068). A ``carrier`` +# link is a graph-engine §4 ``node_exception`` wrapper the engine applies at a +# non-node placement (§9.7 instance / §11.7 branch / §9.6 / §11.6 parent-node +# middleware); consumers grouping by the originating failure skip carriers via +# the flag. +@dataclass(frozen=True) +class CauseLink: + """One link in a caught exception's resolved cause chain. + + - ``category``: the link's failure category when it carries one (a + string), else ``None``. + - ``message``: the link's own message (the ``str`` of the exception). + - ``carrier``: ``True`` when the link is an engine-applied + ``node_exception`` carrier wrapper, ``False`` for an ordinary + (non-carrier) exception. + """ + + category: str | None + message: str + carrier: bool + + +# Spec: pipeline-utilities §6.3 (proposals 0050, 0065, 0068). ``chain`` is the +# full ordered cause chain; ``category`` / ``message`` are a derivation over it +# — the outermost non-carrier link whose category is a non-empty string (else +# ``None`` and the outermost non-carrier link's message). The derivation +# reproduces 0065's single-value results; the chain adds the full provenance. @dataclass(frozen=True) class CaughtException: """Structured record of an exception caught by ``FailureIsolationMiddleware``. - - ``category``: the exception's failure category when it carries - one (e.g. an llm-provider error's ``category`` attribute), else - ``None`` for a bare exception that carries no category. - - ``message``: the human-readable exception message (``str(exc)``); - the empty string when the exception carried no message. + - ``category``: the caught failure's category (the derived single + value for simple consumers), or ``None`` when no non-carrier link + in the chain carries a category. + - ``message``: the message of the link ``category`` is derived from, + or (when no link carries a category) of the outermost non-carrier + link. + - ``chain``: the ordered cause chain, outermost (the caught + exception, index 0) to innermost (the originating raise), one + :class:`CauseLink` per exception. """ category: str | None message: str + chain: tuple[CauseLink, ...] # Spec: realizes pipeline-utilities §6.3 failure-isolation middleware @@ -706,7 +738,8 @@ class FailureIsolatedEvent: - ``post_state``: the degraded partial update the middleware returned in place of the node's output. - ``caught_exception``: a :class:`CaughtException` record of the - caught exception (category + message). + caught exception (its derived category / message and the full + cause ``chain``). """ event_name: str @@ -721,6 +754,7 @@ class FailureIsolatedEvent: __all__ = [ "CaughtException", + "CauseLink", "FailureIsolatedEvent", "FanOutEventConfig", "InvocationCompletedEvent", diff --git a/src/openarmature/graph/middleware/failure_isolation.py b/src/openarmature/graph/middleware/failure_isolation.py index a47f0f9..a146b38 100644 --- a/src/openarmature/graph/middleware/failure_isolation.py +++ b/src/openarmature/graph/middleware/failure_isolation.py @@ -37,7 +37,7 @@ import warnings from collections.abc import Awaitable, Callable, Mapping -from typing import Any +from typing import TYPE_CHECKING, Any from openarmature.observability.correlation import ( _current_terminal_attempt_index, @@ -52,47 +52,72 @@ from ._core import NextCall +if TYPE_CHECKING: + # Annotation-only import; the runtime construction in ``_build_cause_chain`` + # uses a deferred local import to keep ``events`` off the module-load path. + from openarmature.graph.events import CauseLink + # A degraded update is either a static partial-update mapping or a # callable resolving one from the pre-call state. Resolved at catch # time; the callable form covers input-state-dependent degraded shapes. DegradedUpdate = Mapping[str, Any] | Callable[[Any], Mapping[str, Any]] -def _resolve_cause(exc: Exception) -> BaseException: - # Cause fidelity (proposal 0065 / §6.3, plus the python "nearest - # categorized" refinement). Walk the ``__cause__`` chain to the most - # actionable cause, skipping graph-engine §4 ``node_exception`` carrier - # wrappers (``NodeException`` and subtypes such as - # ``ParallelBranchesBranchFailed``) the engine applies at a non-node - # placement (§9.7 instance, §11.7 branch, §9.6 / §11.6 parent-node - # middleware). Returns the FIRST non-carrier exception that carries a - # string ``category`` — so a deliberately re-categorized surface error - # wins, while an uncategorized surface error resolves to the categorized - # cause beneath it (the same chain §6.1's default classifier consults - # for retryability, so the reported category agrees with what retry - # acted on). When nothing in the chain carries a category, returns the - # originating non-carrier raise (its own message, null category). - # Node-level placement has no carrier, so ``exc`` itself is the - # originating raise. The local import keeps ``errors`` off the - # middleware module-load path, matching the deferred ``events`` import - # in ``_emit_event``. +def _build_cause_chain(exc: Exception) -> tuple[CauseLink, ...]: + # Cause chain (proposal 0068 / §6.3, superseding 0065's single + # "originating cause" prose). Walk the ``__cause__`` chain from the caught + # exception (outermost) to the originating raise (innermost), recording one + # ``CauseLink`` per exception. A graph-engine §4 ``node_exception`` carrier + # (``NodeException`` and subtypes such as ``ParallelBranchesBranchFailed``) + # the engine applies at a non-node placement (§9.7 instance, §11.7 branch, + # §9.6 / §11.6 parent-node middleware) is flagged ``carrier=True``. Traverse + # only BaseException instances (a non-exception ``__cause__`` ends the walk, + # per §6.3) and guard against a cyclic ``__cause__`` chain so a malformed + # chain can't hang or crash the degrade path. The local imports keep + # ``errors`` / ``events`` off the middleware module-load path, matching the + # deferred imports in ``_emit_event``. from openarmature.graph.errors import NodeException + from openarmature.graph.events import CauseLink - origin: BaseException | None = None + links: list[CauseLink] = [] current: BaseException | None = exc seen: set[int] = set() - # Traverse only BaseException instances (a non-exception ``__cause__`` - # ends the walk) and guard against a cyclic ``__cause__`` chain so a - # malformed chain can't hang or crash the degrade path. while isinstance(current, BaseException) and id(current) not in seen: seen.add(id(current)) - if not isinstance(current, NodeException): - if origin is None: - origin = current - if isinstance(getattr(current, "category", None), str): - return current + category = getattr(current, "category", None) + links.append( + CauseLink( + category=category if isinstance(category, str) and category else None, + message=str(current), + carrier=isinstance(current, NodeException), + ) + ) current = current.__cause__ - return origin if origin is not None else exc + return tuple(links) + + +def _derive_cause(chain: tuple[CauseLink, ...]) -> tuple[str | None, str]: + # Derived single ``category`` / ``message`` (proposal 0068 / §6.3): the + # OUTERMOST non-carrier link whose ``category`` is a non-empty string — so a + # deliberately re-categorized surface error wins, while an uncategorized + # surface error resolves to the categorized cause beneath it (the same chain + # §6.1's default classifier consults, so the reported category agrees with + # what retry acted on). When no non-carrier link carries a category, the + # category is null and the message is the outermost non-carrier link's (the + # surface error). Reproduces 0065's single-carrier results. The all-carrier + # fallback is defensive — failure isolation always catches a non-carrier or + # wraps one, so a chain with no non-carrier link should not arise. + surface: CauseLink | None = None + for link in chain: + if link.carrier: + continue + if surface is None: + surface = link + if isinstance(link.category, str) and link.category: + return link.category, link.message + if surface is not None: + return None, surface.message + return None, chain[0].message if chain else "" class FailureIsolationMiddleware: @@ -196,16 +221,15 @@ def _emit_event(self, state: Any, exc: Exception, degraded: Mapping[str, Any]) - # path and defers it until the first catch. from openarmature.graph.events import CaughtException, FailureIsolatedEvent - # Cause fidelity (proposal 0065 / §6.3). ``_resolve_cause`` walks - # past graph-engine ``node_exception`` carrier wrappers to the - # nearest categorized originating cause (see its comment); the - # reported ``category`` and ``message`` both come from it so they - # describe one exception — NOT the masking ``node_exception``. A - # bare / uncategorized cause yields a null category. Node-level - # placement has no carrier, so this is the caught exception itself. - cause = _resolve_cause(exc) - cause_category = getattr(cause, "category", None) - category = cause_category if isinstance(cause_category, str) else None + # Cause chain + derivation (proposal 0068 / §6.3). ``_build_cause_chain`` + # records every link from the caught exception to the originating raise + # (carriers flagged); ``_derive_cause`` resolves the single reported + # ``category`` / ``message`` from it — the outermost non-carrier link + # carrying a category, NOT the masking ``node_exception``. Both ride on + # ``caught_exception`` so a simple consumer reads one value while the + # full provenance stays visible in the chain. + chain = _build_cause_chain(exc) + category, message = _derive_cause(chain) # ``attempt_index`` is the wrapped node's final / exhausting # attempt (proposal 0050 §6.3: "the same lineage tuple NodeEvent # carries, for correlation with the wrapped node's other events"). @@ -230,9 +254,7 @@ def _emit_event(self, state: Any, exc: Exception, degraded: Mapping[str, Any]) - branch_name=current_branch_name(), pre_state=state, post_state=degraded, - # ``message`` tracks the resolved cause (§6.3 SHOULD) so - # the reported category and message describe one exception. - caught_exception=CaughtException(category=category, message=str(cause)), + caught_exception=CaughtException(category=category, message=message, chain=chain), ) ) diff --git a/tests/conformance/test_pipeline_utilities.py b/tests/conformance/test_pipeline_utilities.py index 53bcdd0..c22fab8 100644 --- a/tests/conformance/test_pipeline_utilities.py +++ b/tests/conformance/test_pipeline_utilities.py @@ -91,7 +91,7 @@ def _load(path: Path) -> dict[str, Any]: # / test_checkpoint.py), not because this runner can't drive them. Fixture # 065 (fan-out degrade contribution, proposal 0066) joined when the spec pin # advanced to v0.56.0. -_FAILURE_ISOLATION_FIXTURES = frozenset(range(58, 66)) +_FAILURE_ISOLATION_FIXTURES = frozenset(range(58, 67)) def _fixture_paths() -> list[Path]: @@ -189,6 +189,18 @@ def _unsupported_middleware(spec: dict[str, Any]) -> str | None: for entry in node_entries or []: if entry.get("type") not in known: return f"per_node.{entry.get('type')}" + # Node-nested ``middleware:`` on plain nodes (the shape + # ``_translate_node_level_middleware`` lifts) is gated symmetrically — same + # plain-node scoping, so the skip-gate and the translator agree on which + # nodes carry liftable node middleware. (Reaches only the single-graph + # path; cases-shape fixtures are dispatched without this gate.) + nodes = cast("dict[str, dict[str, Any]]", spec.get("nodes") or {}) + for _name, node_spec in nodes.items(): + if any(k in node_spec for k in ("fan_out", "parallel_branches", "subgraph")): + continue + for entry in cast("list[dict[str, Any]]", node_spec.get("middleware") or []): + if entry.get("type") not in known: + return f"node.{entry.get('type')}" return None @@ -353,6 +365,32 @@ def _translate_fan_out_instance_middleware( return out +def _translate_node_level_middleware( + spec: Mapping[str, Any], + sinks: CaptureSinks, + clock: Callable[[], float] | None = None, +) -> dict[str, list[Middleware]]: + """Walk ``spec.nodes`` for a node-nested ``middleware:`` list on a plain + function node (the graph-engine per-node middleware shape that cases-style + fixtures use, e.g. fixture 066 Case 2's node-level failure isolation) and + translate each into Middleware instances, keyed by node name. Composite + nodes are skipped because their middleware placements have dedicated + translators: fan-out instance (``_translate_fan_out_instance_middleware``), + parallel-branches branch (``_translate_parallel_branches_branch_middleware``), + and subgraph parent-node middleware (the top-level ``middleware.per_node`` + block via ``_translate_middleware_block``).""" + out: dict[str, list[Middleware]] = {} + nodes = cast("dict[str, dict[str, Any]]", spec.get("nodes") or {}) + for node_name, node_spec in nodes.items(): + if any(k in node_spec for k in ("fan_out", "parallel_branches", "subgraph")): + continue + entries = cast("list[dict[str, Any]]", node_spec.get("middleware") or []) + if not entries: + continue + out[node_name] = [_build_middleware(cfg, sinks, clock) for cfg in entries] + return out + + # --------------------------------------------------------------------------- # Clock stub — monkeypatch time.monotonic for deterministic timing fixtures. # --------------------------------------------------------------------------- @@ -545,6 +583,12 @@ async def _capture_isolation(event: ObserverEvent) -> None: captured_isolation.append(event) graph_mw, node_mw = _translate_middleware_block(spec.get("middleware"), sinks, clock) + # Node-nested ``middleware:`` (e.g. fixture 066 Case 2's node-level failure + # isolation) merges into the per-node map alongside any top-level + # ``middleware.per_node`` entries. Single-run fixtures (run_count == 1) + # reuse this ``node_mw`` directly below. + for nl_node, nl_mws in _translate_node_level_middleware(spec, sinks, clock).items(): + node_mw.setdefault(nl_node, []).extend(nl_mws) fan_out_inst_mw = _translate_fan_out_instance_middleware(spec, sinks, clock) del monkeypatch # retained in signature for future stubs that need it @@ -670,11 +714,12 @@ async def _capture_isolation(event: ObserverEvent) -> None: observer_fixtures: dict[str, ObserverFixture] = {} for run_idx in range(run_count): run_sinks = sinks if run_count == 1 else CaptureSinks() - run_graph_mw, run_node_mw = ( - (graph_mw, node_mw) - if run_count == 1 - else _translate_middleware_block(spec.get("middleware"), run_sinks, clock) - ) + if run_count == 1: + run_graph_mw, run_node_mw = graph_mw, node_mw + else: + run_graph_mw, run_node_mw = _translate_middleware_block(spec.get("middleware"), run_sinks, clock) + for nl_node, nl_mws in _translate_node_level_middleware(spec, run_sinks, clock).items(): + run_node_mw.setdefault(nl_node, []).extend(nl_mws) run_fan_out_inst_mw = ( fan_out_inst_mw if run_count == 1 @@ -845,6 +890,23 @@ def _assert_failure_isolation_event( assert ev.caught_exception.category == ce["category"] if "message" in ce: assert ev.caught_exception.message == ce["message"] + # Cause chain (proposal 0068). Each expected link is subset-matched on + # the keys it supplies — carrier links pin only {carrier, category} + # (their engine-internal message is not asserted), non-carrier links + # pin {carrier, category, message}. + if "chain" in ce: + expected_chain = cast("list[Mapping[str, Any]]", ce["chain"]) + actual_chain = ev.caught_exception.chain + assert len(actual_chain) == len(expected_chain), ( + f"chain length mismatch: actual={actual_chain}, expected={expected_chain}" + ) + for actual_link, expected_link in zip(actual_chain, expected_chain, strict=True): + if "carrier" in expected_link: + assert actual_link.carrier == expected_link["carrier"] + if "category" in expected_link: + assert actual_link.category == expected_link["category"] + if "message" in expected_link: + assert actual_link.message == expected_link["message"] def _assert_final_state( diff --git a/tests/test_smoke.py b/tests/test_smoke.py index 7a6a1a6..dc9449e 100644 --- a/tests/test_smoke.py +++ b/tests/test_smoke.py @@ -9,7 +9,7 @@ def test_package_versions() -> None: assert openarmature.__version__ == "0.13.0" - assert openarmature.__spec_version__ == "0.56.0" + assert openarmature.__spec_version__ == "0.57.0" def test_spec_version_matches_pyproject() -> None: diff --git a/tests/unit/test_failure_isolation_middleware.py b/tests/unit/test_failure_isolation_middleware.py index d138a97..10b84a4 100644 --- a/tests/unit/test_failure_isolation_middleware.py +++ b/tests/unit/test_failure_isolation_middleware.py @@ -18,6 +18,7 @@ from openarmature.graph import ( END, CaughtException, + CauseLink, FailureIsolatedEvent, FailureIsolationMiddleware, GraphBuilder, @@ -348,6 +349,95 @@ async def test_cyclic_cause_chain_terminates() -> None: assert len(events) == 1 +# --------------------------------------------------------------------------- +# Cause chain (proposal 0068) +# --------------------------------------------------------------------------- + + +async def test_chain_records_carrier_then_categorized_cause() -> None: + # Instance-style placement: one engine carrier over a categorized + # originating cause. The chain records the carrier (flagged) then the + # non-carrier cause; the derived category is the non-carrier's. + events: list[Any] = [] + disp_token = _set_active_dispatch(lambda e: events.append(e)) + carrier = NodeException(node_name="work", cause=_TransientError("rate limited"), recoverable_state={}) + try: + mw = FailureIsolationMiddleware(degraded_update={"result": []}, event_name="iso") + await mw("s", _raises(carrier)) + finally: + _reset_active_dispatch(disp_token) + + assert events[0].caught_exception.chain == ( + CauseLink(category="node_exception", message=str(carrier), carrier=True), + CauseLink(category="provider_rate_limit", message="rate limited", carrier=False), + ) + + +async def test_chain_node_level_is_single_non_carrier_link() -> None: + # Node-level placement: the middleware catches the raw error (no carrier), + # so the chain is a single non-carrier link. + events: list[Any] = [] + disp_token = _set_active_dispatch(lambda e: events.append(e)) + try: + mw = FailureIsolationMiddleware(degraded_update={"result": []}, event_name="iso") + await mw("s", _raises(_TransientError("rate limited"))) + finally: + _reset_active_dispatch(disp_token) + + assert events[0].caught_exception.chain == ( + CauseLink(category="provider_rate_limit", message="rate limited", carrier=False), + ) + + +async def test_chain_records_nested_carriers_then_cause() -> None: + # Nested carriers (carrier -> carrier -> categorized cause), the case the + # pinned fixture 066 does not cover: both carriers are recorded and + # flagged, then the originating link, and the derived category is the + # originating one. + events: list[Any] = [] + disp_token = _set_active_dispatch(lambda e: events.append(e)) + inner = NodeException(node_name="inner", cause=_TransientError("rate limited"), recoverable_state={}) + outer = NodeException(node_name="outer", cause=inner, recoverable_state={}) + try: + mw = FailureIsolationMiddleware(degraded_update={"result": []}, event_name="iso") + await mw("s", _raises(outer)) + finally: + _reset_active_dispatch(disp_token) + + chain = events[0].caught_exception.chain + assert [(link.category, link.carrier) for link in chain] == [ + ("node_exception", True), + ("node_exception", True), + ("provider_rate_limit", False), + ] + assert events[0].caught_exception.category == "provider_rate_limit" + + +async def test_chain_carries_both_non_carrier_links_on_recategorization() -> None: + # Two non-carrier links (a re-categorized surface over a deeper cause): the + # chain carries both, and the derived category is the OUTERMOST non-carrier + # (proposal 0068's surface-wins derivation). + events: list[Any] = [] + disp_token = _set_active_dispatch(lambda e: events.append(e)) + surface = _NonTransientError("misconfigured") + surface.__cause__ = _TransientError("rate limited") + carrier = NodeException(node_name="work", cause=surface, recoverable_state={}) + try: + mw = FailureIsolationMiddleware(degraded_update={"result": []}, event_name="iso") + await mw("s", _raises(carrier)) + finally: + _reset_active_dispatch(disp_token) + + chain = events[0].caught_exception.chain + assert [(link.category, link.carrier) for link in chain] == [ + ("node_exception", True), + ("provider_invalid_request", False), + ("provider_rate_limit", False), + ] + assert events[0].caught_exception.category == "provider_invalid_request" + assert events[0].caught_exception.message == "misconfigured" + + async def test_no_event_outside_invocation() -> None: # current_dispatch() is None outside an invocation; the degrade still # happens, no event is emitted, and nothing raises.