Skip to content

Commit 22aad2a

Browse files
Add adapter crash-injection and mock cause (0070) (#163)
* Add adapter crash-injection and mock cause (0070) Implement proposal 0070's two conformance-adapter capabilities: a `crash_injection` directive (after_fan_out_instance / after_node) that simulates a checkpoint-boundary crash with no asserted first-run outcome, and a recursive mock `cause` that chains a flaky failure to an originating error. Wire conformance fixtures 067 (crash-injection fan-out resume) and 068 (failure-mock cause chain). A general per-instance execution recorder lets plain-node fan-outs report executed / skipped on resume, consulted as a fallback so the existing flaky_per_index fixtures are unchanged. The after_node boundary has a unit test (no fixture exercises it). Advance the pinned spec to v0.58.0; test-vocabulary only, no library behavior change. * Harden crash-injection directive handling (0070) Address PR review on the crash_injection plumbing: - crash_injection now defines the crash boundary exclusively; when set, the legacy fan_out.abort_after_instance directive is ignored, so an instance-boundary and a node-boundary abort cannot both be active. - A non-dict crash_injection is coerced to None in the first-run handler, matching _find_crash_injection's mapping check, so a malformed directive no longer triggers the swallow path. - after_fan_out_instance now honors its documented node field: the parser threads the node name and _maybe_abort filters fan_out_progress by it, so a multi-fan-out graph aborts after the named node. Single-fan-out fixtures and the legacy path are unchanged.
1 parent 0c8564c commit 22aad2a

10 files changed

Lines changed: 309 additions & 75 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
1818
- **Observer privacy flag `disable_llm_payload` renamed to `disable_provider_payload`** (proposal 0059, observability §5.5.4, spec v0.54.0). The observer-level flag on both bundled observers (`OTelObserver` and `LangfuseObserver`) is renamed, and its scope broadens from LLM-completion payload to any provider-call payload (LLM completion today; embedding and rerank when those land). This is a breaking change to both observer constructors: config passing `disable_llm_payload=True` (or `False`) updates to `disable_provider_payload=...` with no other change. The default stays `True` (payload suppressed), and the gating behavior for `LlmCompletionEvent` / `LlmFailedEvent` rendering is unchanged at every existing site. The rename is the only part of proposal 0059 adopted this cycle: the retrieval-provider capability itself (the `EmbeddingProvider` protocol, the `EmbeddingEvent` / `EmbeddingFailedEvent` typed variants, and the embedding span / observation mapping) is not yet implemented and rides as `not-yet` in `conformance.toml`. The §5.5.4 rename touches existing LLM-payload gating, so it lands with the pin.
1919
- **Fan-out failure-isolation degrade contribution implemented** (proposal 0066, pipeline-utilities §9.3 / §9.8 / §11.7, spec v0.56.0). When `FailureIsolationMiddleware` degrades a fan-out instance, that instance is a success whose contribution is its `degraded_update`, read in subgraph-field-name space and never merged onto the failed instance's pre-failure state. This also fixes a latent bug: an instance `degraded_update`'s `extra_outputs` values were previously looked up by the parent field name and silently dropped (`collect_field` was unaffected). A static `degraded_update` that omits the node's `collect_field` is now a compile-time error (`FanOutDegradedUpdateMissingCollectField`); a callable `degraded_update` that omits it yields a graceful null slot rather than raising, preserving one collection slot per item. The parallel-branches counterpart (a branch `degraded_update` omitting a projected `outputs` field skips that field) was already correct as of the parallel-branches fix above and is now pinned by fixture 065. Success-path and resume behavior for correctly-configured fan-outs is unchanged.
2020
- **Failure-isolation events carry the full structured cause chain** (proposal 0068, pipeline-utilities §6.3, spec v0.57.0). `FailureIsolatedEvent.caught_exception` gains a `chain`: an ordered list of `CauseLink` records (each carrying `category`, `message`, and a `carrier` flag), from the caught exception (outermost) to the originating raise (innermost), with graph-engine `node_exception` carrier wrappers flagged `carrier=True`. The existing `category` and `message` are retained and redefined as a derivation over the chain: the category of the outermost non-carrier link whose category is a non-empty string (else `category` is `null` and `message` is the outermost non-carrier link's message). This supersedes proposal 0065's single "originating cause" representation, which was ambiguous once the post-carrier chain held more than one non-carrier link; the derivation reproduces 0065's single-carrier values, so fixture 064 is unchanged. A new `CauseLink` type is exported from `openarmature.graph`. The bundled OTel and Langfuse observers continue to render the derived `category`; surfacing the full chain is left to custom observers. The change is additive to the event shape, and catch/degrade behavior is unchanged. Conformance fixture 066 (three cases: an instance-site carrier chain, a node-level single non-carrier link, and an uncategorized null-category cause) passes.
21-
- **Pinned spec advances v0.53.0 → v0.57.0 across the v0.14.0 cycle**, in four steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), v0.56.0 (proposal 0066, the fan-out degrade contribution above), and v0.57.0 (proposal 0068, the failure-isolation cause chain above). `conformance.toml` records 0065, 0066, and 0068 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted).
21+
- **Pinned spec advances v0.53.0 → v0.58.0 across the v0.14.0 cycle**, in five steps: v0.54.0 (proposal 0059, the observer-flag rename above), v0.55.1 (proposal 0065 above; the v0.55.1 patch also carries an observability §11 span-links text reconciliation that narrows an *Out of scope* bullet, with no python-observable change), v0.56.0 (proposal 0066, the fan-out degrade contribution above), v0.57.0 (proposal 0068, the failure-isolation cause chain above), and v0.58.0 (proposal 0070, conformance-adapter crash-injection and cause-chaining test vocabulary: a `crash_injection` directive and a recursive mock `cause`, with conformance fixtures 067 and 068, no library behavior change). `conformance.toml` records 0065, 0066, 0068, and 0070 as `implemented` and 0059 as `not-yet` (only its cross-spec flag rename was adopted).
2222

2323
### Fixed
2424

conformance.toml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
[manifest]
3434
implementation = "openarmature-python"
35-
spec_pin = "v0.57.0"
35+
spec_pin = "v0.58.0"
3636

3737
# Status values:
3838
# implemented — shipped behavior matches the proposal's contract
@@ -648,3 +648,18 @@ since = "0.14.0"
648648
[proposals."0068"]
649649
status = "implemented"
650650
since = "0.14.0"
651+
652+
# Spec v0.58.0 (proposal 0070). Conformance-adapter crash/resume vocabulary,
653+
# crash-injection, and cause-chaining (conformance-adapter §5.1 / §5.6 / §5.8).
654+
# Two new adapter capabilities: ``crash_injection`` (``after_fan_out_instance``
655+
# + ``after_node``) simulates a crash at a checkpoint boundary independent of
656+
# an instance failure, and a recursive mock ``cause`` chains a failure mock's
657+
# raised error to an originating cause. The crash/resume + saved-record +
658+
# resume-outcome directives the proposal formalizes were already implemented.
659+
# Fixture 067 (crash-injection fan-out resume) drives after_fan_out_instance
660+
# end-to-end; after_node has a unit test (no fixture exercises it). Fixture
661+
# 068 (failure-mock cause chain) pins 0068's outermost-wins derivation via the
662+
# mock ``cause``.
663+
[proposals."0070"]
664+
status = "implemented"
665+
since = "0.14.0"

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
6363
openarmature = "openarmature.cli:main"
6464

6565
[tool.openarmature]
66-
spec_version = "0.57.0"
66+
spec_version = "0.58.0"
6767

6868
[dependency-groups]
6969
dev = [

src/openarmature/AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenArmature — Agent documentation
22

3-
*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.57.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
3+
*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.58.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
44

55
## TL;DR
66

@@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:
1010

1111
## Capability contracts
1212

13-
_Sourced from openarmature-spec v0.57.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
13+
_Sourced from openarmature-spec v0.58.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
1414

1515
### Capability: `graph-engine`
1616

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"""
2626

2727
__version__ = "0.13.0"
28-
__spec_version__ = "0.57.0"
28+
__spec_version__ = "0.58.0"
2929
# Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
3030
# package-registry name for this implementation. Surfaces on every
3131
# OTel invocation span as ``openarmature.implementation.name`` and on

tests/conformance/adapter.py

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,31 @@ def __init__(self, message: str, category: str) -> None:
210210
self.category = category
211211

212212

213+
# Conformance-adapter §5.1 ``cause`` (proposal 0070): a failure mock's raised
214+
# error MAY chain to an originating cause, recursively, so a consumer walking
215+
# the cause chain (pipeline-utilities §6.3 failure isolation) observes each
216+
# link's category / message.
217+
def _build_mock_cause(cause_spec: Mapping[str, Any] | None) -> Exception | None:
218+
"""Build the chained originating exception from a failure mock's ``cause``
219+
directive. ``cause: {category, message, cause: {...}}`` nests recursively;
220+
each link becomes a ``_CategorizedException`` (or a bare ``Exception`` when
221+
its category is null) linked via ``__cause__``. Returns ``None`` when no
222+
cause is configured."""
223+
if cause_spec is None:
224+
return None
225+
inner = _build_mock_cause(cause_spec.get("cause"))
226+
message = str(cause_spec.get("message", ""))
227+
category = cause_spec.get("category")
228+
exc: Exception = (
229+
_CategorizedException(message=message, category=category)
230+
if isinstance(category, str) and category
231+
else Exception(message)
232+
)
233+
if inner is not None:
234+
exc.__cause__ = inner
235+
return exc
236+
237+
213238
def _make_pure_update_fn(
214239
node_name: str,
215240
update: Mapping[str, Any],
@@ -512,10 +537,16 @@ async def fn(_state: Any) -> Mapping[str, Any]:
512537
entry = sequence[idx]
513538
if entry is None:
514539
return copy.deepcopy(success_update)
515-
raise _CategorizedException(
540+
# An entry MAY carry a recursive ``cause`` (proposal 0070 §5.1)
541+
# that chains the raised error to an originating cause.
542+
cause_exc = _build_mock_cause(entry.get("cause"))
543+
exc = _CategorizedException(
516544
message=entry.get("message", "flaky"),
517545
category=entry.get("category", "provider_unavailable"),
518546
)
547+
if cause_exc is not None:
548+
raise exc from cause_exc
549+
raise exc
519550
return copy.deepcopy(success_update)
520551

521552
return fn
@@ -538,6 +569,32 @@ async def fn_with_sleep(state: Any) -> Mapping[str, Any]:
538569
return fn_with_sleep
539570

540571

572+
def _wrap_with_execution_recorder(
573+
fn: Callable[[Any], Awaitable[Mapping[str, Any]]],
574+
node_name: str,
575+
recorders: dict[str, dict[int, list[int]]],
576+
) -> Callable[[Any], Awaitable[Mapping[str, Any]]]:
577+
"""Wrap a node body so that, when it runs inside a fan-out instance, it
578+
records the executing instance's ``current_fan_out_index()`` into
579+
``recorders`` (keyed by node name then index). Lets the checkpoint resume
580+
driver tell which fan-out instances executed vs. rolled forward for a
581+
plain-node fan-out (e.g. the crash_injection fixture 067), where no
582+
``flaky_per_index`` body records execution. Records at body entry, so an
583+
instance whose body ran counts as executed even if it then fails."""
584+
from openarmature.observability.correlation import ( # noqa: PLC0415
585+
current_attempt_index,
586+
current_fan_out_index,
587+
)
588+
589+
async def fn_recording(state: Any) -> Mapping[str, Any]:
590+
idx = current_fan_out_index()
591+
if idx is not None:
592+
recorders.setdefault(node_name, {}).setdefault(idx, []).append(current_attempt_index())
593+
return await fn(state)
594+
595+
return fn_recording
596+
597+
541598
@dataclass(frozen=True)
542599
class _TracingFanOutNode(FanOutNode[State, State]):
543600
"""Conformance helper: a FanOutNode that appends its name to a shared
@@ -658,6 +715,7 @@ def build_graph(
658715
fan_out_instance_middleware: Mapping[str, Sequence[Any]] | None = None,
659716
parallel_branches_branch_middleware: Mapping[str, Mapping[str, Sequence[Any]]] | None = None,
660717
flaky_per_index_attempt_recorders: dict[str, dict[int, list[int]]] | None = None,
718+
instance_execution_recorders: dict[str, dict[int, list[int]]] | None = None,
661719
) -> BuiltGraph:
662720
"""Translate a graph-shaped fixture block into a `BuiltGraph`.
663721
@@ -767,6 +825,13 @@ def build_graph(
767825
if sleep_ms is not None:
768826
body = _wrap_with_sleep(body, int(sleep_ms))
769827

828+
# Record per-instance execution for plain-node fan-outs so the
829+
# checkpoint resume driver can tell executed from rolled-forward
830+
# instances. flaky_per_index records its own per-instance attempts,
831+
# so it is skipped here; this covers the rest.
832+
if instance_execution_recorders is not None and "flaky_per_index" not in node_spec:
833+
body = _wrap_with_execution_recorder(body, node_name, instance_execution_recorders)
834+
770835
builder.add_node(node_name, body, middleware=per_node_mw)
771836

772837
for edge_spec in spec.get("edges", []):

0 commit comments

Comments
 (0)