Skip to content

Commit 0c8564c

Browse files
Add cause chain to failure-isolation event (0068) (#162)
* Add cause chain to failure-isolation event (0068) FailureIsolatedEvent.caught_exception gains a structured chain: an ordered list of CauseLink records (category, message, and a carrier flag) from the caught exception to the originating raise, with engine node_exception wrappers flagged. The single category and message are retained and redefined as a derivation over the chain, reproducing the prior 0065 values, so the change is additive: existing consumers and the bundled OTel and Langfuse observers are unaffected. This supersedes 0065's single originating-cause representation, which was ambiguous when the post-carrier chain held more than one non-carrier link. Advance the pinned spec to v0.57.0 with conformance fixture 066 plus unit tests for the carrier, nested-carrier, and re-categorization cases. * Regenerate AGENTS.md for spec v0.57.0 pin The bundled agent guide embeds the spec-pin version stamps, which the test_agents_md_drift check verifies against a fresh regeneration. The v0.57.0 submodule bump left them reading v0.56.0; regenerate via scripts/build_agents_md.py. Generated artifact only, no behavior change. * Coerce empty-string cause category to None (0068) `_build_cause_chain` recorded an empty-string `category` verbatim, but `CauseLink.category` is documented as a non-empty string or None and `_derive_cause` already treats an empty string as no-category. Coerce it to None so the chain representation matches. No exception carries an empty-string category in practice; addresses PR review feedback.
1 parent 2c2ba1f commit 0c8564c

13 files changed

Lines changed: 292 additions & 65 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
1717
- **Failure-isolation events report the originating cause's category at non-node placements** (proposal 0065, pipeline-utilities §6.3). When `FailureIsolationMiddleware` runs as instance middleware (§9.7), branch middleware (§11.7), or parent-node middleware on a fan-out / parallel-branches node, the graph engine has already wrapped the originating error as a `node_exception` carrier before the middleware catches it. `FailureIsolatedEvent.caught_exception.category` now resolves through that carrier (and any nested carriers) to the nearest categorized originating cause and reports its category instead of the masking `node_exception`, so the reported category agrees with what the §6.1 retry classifier acted on. For example, an instance whose retries exhaust on `provider_unavailable` now surfaces `provider_unavailable` rather than `node_exception`. The `message` tracks the resolved cause for category/message coherence. Node-level placement was already faithful and is unchanged, and catch/degrade behavior is unchanged at every site (only the event's reported cause changes). The wrapped-instance/branch lineage SHOULD (`fan_out_index` / `branch_name`) is deferred to a follow-up, since it needs the engine to surface per-instance identity to the wrapping-site middleware. `conformance.toml` marks proposal 0065 `implemented`, and conformance fixture 064 (three cases: the §9.7 instance and §11.7 branch sites plus an uncategorized cause) passes.
1818
- **Observer privacy flag `disable_llm_payload` renamed to `disable_provider_payload`** (proposal 0059, observability §5.5.4, spec v0.54.0). The observer-level flag on both bundled observers (`OTelObserver` and `LangfuseObserver`) is renamed, and its scope broadens from LLM-completion payload to any provider-call payload (LLM completion today; embedding and rerank when those land). This is a breaking change to both observer constructors: config passing `disable_llm_payload=True` (or `False`) updates to `disable_provider_payload=...` with no other change. The default stays `True` (payload suppressed), and the gating behavior for `LlmCompletionEvent` / `LlmFailedEvent` rendering is unchanged at every existing site. The rename is the only part of proposal 0059 adopted this cycle: the retrieval-provider capability itself (the `EmbeddingProvider` protocol, the `EmbeddingEvent` / `EmbeddingFailedEvent` typed variants, and the embedding span / observation mapping) is not yet implemented and rides as `not-yet` in `conformance.toml`. The §5.5.4 rename touches existing LLM-payload gating, so it lands with the pin.
1919
- **Fan-out failure-isolation degrade contribution implemented** (proposal 0066, pipeline-utilities §9.3 / §9.8 / §11.7, spec v0.56.0). When `FailureIsolationMiddleware` degrades a fan-out instance, that instance is a success whose contribution is its `degraded_update`, read in subgraph-field-name space and never merged onto the failed instance's pre-failure state. This also fixes a latent bug: an instance `degraded_update`'s `extra_outputs` values were previously looked up by the parent field name and silently dropped (`collect_field` was unaffected). A static `degraded_update` that omits the node's `collect_field` is now a compile-time error (`FanOutDegradedUpdateMissingCollectField`); a callable `degraded_update` that omits it yields a graceful null slot rather than raising, preserving one collection slot per item. The parallel-branches counterpart (a branch `degraded_update` omitting a projected `outputs` field skips that field) was already correct as of the parallel-branches fix above and is now pinned by fixture 065. Success-path and resume behavior for correctly-configured fan-outs is unchanged.
20-
- **Pinned spec advances v0.53.0 → v0.56.0 across the v0.14.0 cycle**, in three steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), and v0.56.0 (proposal 0066, the fan-out degrade contribution above). `conformance.toml` records 0065 and 0066 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted).
20+
- **Failure-isolation events carry the full structured cause chain** (proposal 0068, pipeline-utilities §6.3, spec v0.57.0). `FailureIsolatedEvent.caught_exception` gains a `chain`: an ordered list of `CauseLink` records (each carrying `category`, `message`, and a `carrier` flag), from the caught exception (outermost) to the originating raise (innermost), with graph-engine `node_exception` carrier wrappers flagged `carrier=True`. The existing `category` and `message` are retained and redefined as a derivation over the chain: the category of the outermost non-carrier link whose category is a non-empty string (else `category` is `null` and `message` is the outermost non-carrier link's message). This supersedes proposal 0065's single "originating cause" representation, which was ambiguous once the post-carrier chain held more than one non-carrier link; the derivation reproduces 0065's single-carrier values, so fixture 064 is unchanged. A new `CauseLink` type is exported from `openarmature.graph`. The bundled OTel and Langfuse observers continue to render the derived `category`; surfacing the full chain is left to custom observers. The change is additive to the event shape, and catch/degrade behavior is unchanged. Conformance fixture 066 (three cases: an instance-site carrier chain, a node-level single non-carrier link, and an uncategorized null-category cause) passes.
21+
- **Pinned spec advances v0.53.0 → v0.57.0 across the v0.14.0 cycle**, in four steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), v0.56.0 (proposal 0066, the fan-out degrade contribution above), and v0.57.0 (proposal 0068, the failure-isolation cause chain above). `conformance.toml` records 0065, 0066, and 0068 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted).
2122

2223
### Fixed
2324

conformance.toml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
[manifest]
3434
implementation = "openarmature-python"
35-
spec_pin = "v0.56.0"
35+
spec_pin = "v0.57.0"
3636

3737
# Status values:
3838
# implemented — shipped behavior matches the proposal's contract
@@ -635,3 +635,16 @@ since = "0.14.0"
635635
[proposals."0066"]
636636
status = "implemented"
637637
since = "0.14.0"
638+
639+
# Spec v0.57.0 (proposal 0068). Failure-isolation event structured cause
640+
# chain (pipeline-utilities §6.3). ``caught_exception`` gains a ``chain`` of
641+
# cause links (``{category, message, carrier}``, outermost->innermost), with
642+
# graph-engine §4 ``node_exception`` carriers flagged. The existing
643+
# ``category`` / ``message`` are redefined as a derivation over the chain (the
644+
# outermost non-carrier link carrying a category), superseding 0065's single
645+
# "originating cause" prose; the derivation reproduces 0065's values, so
646+
# fixture 064 is unchanged. Fixture 066 (three cases: instance-site carrier
647+
# chain, node-level single link, uncategorized null category) passes.
648+
[proposals."0068"]
649+
status = "implemented"
650+
since = "0.14.0"

docs/concepts/middleware.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -256,9 +256,12 @@ Like `RetryMiddleware`, it catches `Exception` only; `BaseException`
256256
On a catch, the middleware dispatches a `FailureIsolatedEvent` onto the
257257
observer stream. It is a distinct event variant, not a node event: it
258258
carries the `event_name`, the wrapped node's lineage identity, the input
259-
and degraded states, and a `CaughtException` record holding the
260-
exception's `category` (when it has one) and message. Observers narrow
261-
on it with `isinstance(event, FailureIsolatedEvent)`. The bundled OTel
259+
and degraded states, and a `CaughtException` record. That record holds a
260+
derived `category` (when the cause has one) and `message` for simple
261+
consumers, plus a `chain` of cause links (`CauseLink`) from the caught
262+
exception down to the originating raise, with graph-engine carrier
263+
wrappers flagged so a consumer can skip them. Observers narrow on it with
264+
`isinstance(event, FailureIsolatedEvent)`. The bundled OTel
262265
and Langfuse observers render it as a marker span / observation so the
263266
catch shows up alongside the node's own span. The default emission path
264267
is the observer stream only, with no logging-library dependency;

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
6363
openarmature = "openarmature.cli:main"
6464

6565
[tool.openarmature]
66-
spec_version = "0.56.0"
66+
spec_version = "0.57.0"
6767

6868
[dependency-groups]
6969
dev = [

src/openarmature/AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenArmature — Agent documentation
22

3-
*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.56.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
3+
*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.57.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
44

55
## TL;DR
66

@@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:
1010

1111
## Capability contracts
1212

13-
_Sourced from openarmature-spec v0.56.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
13+
_Sourced from openarmature-spec v0.57.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
1414

1515
### Capability: `graph-engine`
1616

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"""
2626

2727
__version__ = "0.13.0"
28-
__spec_version__ = "0.56.0"
28+
__spec_version__ = "0.57.0"
2929
# Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
3030
# package-registry name for this implementation. Surfaces on every
3131
# OTel invocation span as ``openarmature.implementation.name`` and on

src/openarmature/graph/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
)
3939
from .events import (
4040
CaughtException,
41+
CauseLink,
4142
FailureIsolatedEvent,
4243
InvocationCompletedEvent,
4344
InvocationStartedEvent,
@@ -71,6 +72,7 @@
7172
__all__ = [
7273
"END",
7374
"CaughtException",
75+
"CauseLink",
7476
"CompileError",
7577
"CompiledGraph",
7678
"ConditionalEdge",

src/openarmature/graph/events.py

Lines changed: 40 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -659,20 +659,52 @@ class LlmFailedEvent:
659659
caller_invocation_metadata: Mapping[str, AttributeValue] | None = None
660660

661661

662+
# Spec: pipeline-utilities §6.3 cause chain (proposal 0068). A ``carrier``
663+
# link is a graph-engine §4 ``node_exception`` wrapper the engine applies at a
664+
# non-node placement (§9.7 instance / §11.7 branch / §9.6 / §11.6 parent-node
665+
# middleware); consumers grouping by the originating failure skip carriers via
666+
# the flag.
667+
@dataclass(frozen=True)
668+
class CauseLink:
669+
"""One link in a caught exception's resolved cause chain.
670+
671+
- ``category``: the link's failure category when it carries one (a
672+
string), else ``None``.
673+
- ``message``: the link's own message (the ``str`` of the exception).
674+
- ``carrier``: ``True`` when the link is an engine-applied
675+
``node_exception`` carrier wrapper, ``False`` for an ordinary
676+
(non-carrier) exception.
677+
"""
678+
679+
category: str | None
680+
message: str
681+
carrier: bool
682+
683+
684+
# Spec: pipeline-utilities §6.3 (proposals 0050, 0065, 0068). ``chain`` is the
685+
# full ordered cause chain; ``category`` / ``message`` are a derivation over it
686+
# — the outermost non-carrier link whose category is a non-empty string (else
687+
# ``None`` and the outermost non-carrier link's message). The derivation
688+
# reproduces 0065's single-value results; the chain adds the full provenance.
662689
@dataclass(frozen=True)
663690
class CaughtException:
664691
"""Structured record of an exception caught by
665692
``FailureIsolationMiddleware``.
666693
667-
- ``category``: the exception's failure category when it carries
668-
one (e.g. an llm-provider error's ``category`` attribute), else
669-
``None`` for a bare exception that carries no category.
670-
- ``message``: the human-readable exception message (``str(exc)``);
671-
the empty string when the exception carried no message.
694+
- ``category``: the caught failure's category (the derived single
695+
value for simple consumers), or ``None`` when no non-carrier link
696+
in the chain carries a category.
697+
- ``message``: the message of the link ``category`` is derived from,
698+
or (when no link carries a category) of the outermost non-carrier
699+
link.
700+
- ``chain``: the ordered cause chain, outermost (the caught
701+
exception, index 0) to innermost (the originating raise), one
702+
:class:`CauseLink` per exception.
672703
"""
673704

674705
category: str | None
675706
message: str
707+
chain: tuple[CauseLink, ...]
676708

677709

678710
# Spec: realizes pipeline-utilities §6.3 failure-isolation middleware
@@ -706,7 +738,8 @@ class FailureIsolatedEvent:
706738
- ``post_state``: the degraded partial update the middleware
707739
returned in place of the node's output.
708740
- ``caught_exception``: a :class:`CaughtException` record of the
709-
caught exception (category + message).
741+
caught exception (its derived category / message and the full
742+
cause ``chain``).
710743
"""
711744

712745
event_name: str
@@ -721,6 +754,7 @@ class FailureIsolatedEvent:
721754

722755
__all__ = [
723756
"CaughtException",
757+
"CauseLink",
724758
"FailureIsolatedEvent",
725759
"FanOutEventConfig",
726760
"InvocationCompletedEvent",

0 commit comments

Comments
 (0)