Skip to content

Commit d2d387a

Browse files
Implement LlmFailedEvent typed variant (proposal 0058) (#144)
* Implement LlmFailedEvent typed variant (proposal 0058) Carve LLM provider failures into a spec-normatively-typed event variant alongside LlmCompletionEvent. Field set mirrors the success variant's identity / scoping / request-side surface 1:1 (17 fields) plus three failure-specific fields: error_category (always-present, from the §7 normative category enumeration), error_type (optional upstream class name or vendor code), error_message (always-present human-readable from the raised exception). OpenAIProvider.complete() restructures around the failure-event emission: pre-send validation (validate_message_list / validate_tools / _normalize_response_schema) moves inside the try-block so any §7 category exception — pre-send OR adapter-caught — flows through the same LlmFailedEvent path. The exception still raises out of complete() unchanged; the typed event is dispatched on the observer queue alongside the exception per proposal 0058's §6 dispatch contract. Both bundled observers (OTel + Langfuse) consume LlmFailedEvent directly with the same openarmature.llm.complete span / Generation shape as the success path plus ERROR status / level and the openarmature.error.category attribute. Sentinel-namespace NodeEvent emission for LLM events retires entirely from the bundled provider; _make_llm_event is removed. LlmEventPayload + LLM_NAMESPACE remain in observability/llm_event.py as a documented compatibility surface for custom providers. Spec pin advances from v0.51.0 to v0.53.0; proposal 0023 (canonical state reducers) marked not-yet with fixtures 034-038 parser- deferred. Fixtures 069-073 (the 0058 conformance set) deferred pending typed_event_collector schema + the event_counts list directive in the harness; unit tests pin the contract end-to-end: 9-category field-mapping lockdown, pre-send validation raise, mutual-exclusion between LlmCompletionEvent and LlmFailedEvent on the same call. * Address PR 144 review Two stale-content fixes flagged by CoPilot: 1. CHANGELOG line-17 and line-18 bullets carried "Failure paths continue to fire from the sentinel NodeEvent" framing from the 3b/3c era, which contradicts this PR's LlmFailedEvent migration and full sentinel retirement. Trimmed both fragments and added a forward-reference to the proposal 0058 entry that documents the cycle-final state. 2. AGENTS.md's reducer baseline reproduced proposal 0023's factory reducers verbatim from the spec, but Python doesn't ship them in this cycle (manifest 0023 = not-yet). The text is auto- generated by build_agents_md.py from the pinned spec submodule; updated the generator's lead paragraph to flag that capability summaries reproduce spec content verbatim — including additions from accepted proposals this implementation may not yet ship — and point readers at conformance.toml for per-proposal impl status. Generalizes to any future not-yet proposal landing in spec text before Python catches up. Regenerated AGENTS.md.
1 parent aaa3cc3 commit d2d387a

21 files changed

Lines changed: 786 additions & 533 deletions

CHANGELOG.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,16 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
66

77
## [Unreleased]
88

9+
### Added
10+
11+
- **`LlmFailedEvent` typed event variant** (proposal 0058, spec v0.53.0). Carves LLM provider failures into a spec-normatively-typed event variant alongside `LlmCompletionEvent`. 17 mirrored identity / scoping / request-side fields + 3 failure-specific fields (`error_category` always-present from the llm-provider §7 normative category enumeration; optional `error_type` for vendor-specific detail or upstream exception class name; always-present `error_message`). `OpenAIProvider.complete()` emits the typed event alongside the §7 exception on both raise paths — adapter-caught provider exceptions AND pre-send validation raises. Caller-side exception flow unchanged; the exception still raises out of `complete()`. Mutually exclusive with `LlmCompletionEvent` on the same call. Both bundled observers (OTel + Langfuse) consume `LlmFailedEvent` directly: same `openarmature.llm.complete` span / Generation shape as the success path with ERROR status / level + `openarmature.error.category` attribute (OTel) / `error_category` as statusMessage (Langfuse), `start_time` back-dated by `latency_ms` so the failure duration reflects the time-to-raise.
12+
913
### Changed
1014

11-
- **OTel and Langfuse observers drive the `openarmature.llm.complete` span / Generation observation lifecycle from the typed `LlmCompletionEvent`** (proposal 0049 + 0057, observability §5.5.7). Successful LLM-provider calls now open + close the OTel span and the Langfuse Generation in one shot at typed-event arrival, with `start_time` back-dated by `LlmCompletionEvent.latency_ms` so duration reflects the adapter-boundary measurement rather than dispatcher queue delay. Failure paths continue to fire from the sentinel `NodeEvent` (the typed event is success-only per the proposal). The §5.5 attribute set and §8.4 Generation metadata are unchanged.
12-
- **`OpenAIProvider.complete()` no longer emits the sentinel `NodeEvent` pair on the success path** (v0.13.0 cleanup). The bundled OTel and Langfuse observers now consume the typed `LlmCompletionEvent` directly; the sentinel pair was kept on the success path through earlier releases for compatibility with pre-typed-event observers. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` to continue seeing successful LLM calls. The sentinel `completed` event still fires on the failure path until the spec extends `LlmCompletionEvent` with error semantics; the sentinel `started` event is no longer emitted on either path.
15+
- **Sentinel-namespace `NodeEvent` emission for LLM events retired entirely from `OpenAIProvider`** (proposal 0058 cleanup). The provider no longer dispatches the `("openarmature.llm.complete",)`-namespaced `NodeEvent`s on either outcome path; both success and failure flow through their respective typed variants exclusively. The `_make_llm_event` helper is removed. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` for success and `isinstance(event, LlmFailedEvent)` for failure to keep receiving LLM-call notifications. `LlmEventPayload` and `LLM_NAMESPACE` remain in `openarmature.observability.llm_event` as a documented compatibility surface for custom providers that haven't migrated; neither is referenced by the bundled provider or observers anymore.
16+
- **Pinned spec advances from v0.51.0 to v0.53.0** (absorbs proposals 0023 + 0058). Proposal 0023 (canonical state reducers) ships in spec v0.52.0 but is not implemented this cycle — `conformance.toml` marks 0023 as `not-yet`; fixtures 034–038 stay parser-deferred.
17+
- **OTel and Langfuse observers drive the `openarmature.llm.complete` span / Generation observation lifecycle from the typed `LlmCompletionEvent`** (proposal 0049 + 0057, observability §5.5.7). Successful LLM-provider calls now open + close the OTel span and the Langfuse Generation in one shot at typed-event arrival, with `start_time` back-dated by `LlmCompletionEvent.latency_ms` so duration reflects the adapter-boundary measurement rather than dispatcher queue delay. The §5.5 attribute set and §8.4 Generation metadata are unchanged. (Failure paths land on `LlmFailedEvent` later in the same cycle — see the proposal 0058 entry above.)
18+
- **`OpenAIProvider.complete()` no longer emits the sentinel `NodeEvent` pair on the success path** (v0.13.0 cleanup). The bundled OTel and Langfuse observers now consume the typed `LlmCompletionEvent` directly; the sentinel pair was kept on the success path through earlier releases for compatibility with pre-typed-event observers. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` to continue seeing successful LLM calls. (The failure-path sentinel emission is retired entirely later in the same cycle — see the proposal 0058 entry above.)
1319
- **`LangfuseClient` Protocol gains optional `start_time` / `end_time` timestamps** on `generation(...)` and the Generation/Span handles' `end(...)`. The Langfuse observer passes back-dated timestamps on the typed-event success path so the Langfuse UI shows the actual adapter-boundary duration. The SDK adapter handles v4 Langfuse SDK quirks transparently: `Langfuse.start_observation()` does NOT accept `start_time`, so back-dated generations are routed through the private `_otel_tracer.start_span(name=..., start_time=int_ns)` API (mirroring the SDK's own `create_event` precedent) and the resulting OTel span is wrapped in `LangfuseGeneration` directly; the non-back-dated path still uses `start_observation`. `LangfuseSpan.end()` is typed `Optional[int]` (nanoseconds), so the adapter converts the Protocol's `datetime` surface to int nanoseconds before forwarding. The `InMemoryLangfuseClient` stores both fields verbatim on `LangfuseObservation` for test assertions.
1420
- **`OpenAIProvider(populate_caller_metadata=...)` default flipped from `False` to `True`.** The python implementation now populates `LlmCompletionEvent.caller_invocation_metadata` by default so the bundled OTel and Langfuse observers can emit the §5.6 `openarmature.user.<key>` span-attribute family without a separate opt-in. Pass `populate_caller_metadata=False` to suppress the snapshot when no downstream consumer needs it. The spec-defined opt-in mechanism is unchanged; only the python default flips.
1521

conformance.toml

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
[manifest]
3434
implementation = "openarmature-python"
35-
spec_pin = "v0.51.0"
35+
spec_pin = "v0.53.0"
3636

3737
# Status values:
3838
# implemented — shipped behavior matches the proposal's contract
@@ -217,6 +217,15 @@ status = "not-yet"
217217
[proposals."0022"]
218218
status = "not-yet"
219219

220+
# Spec v0.52.0 (proposal 0023). Canonical state reducers — three
221+
# new factory-style reducers (``bounded_append``, ``dedupe_append``,
222+
# ``merge_by_key``) extending the graph-engine §2 baseline set.
223+
# Python has not yet shipped the new reducers; v0.13.0 leaves the
224+
# capability not-yet-implemented. Conformance fixtures 035-038
225+
# stay parser-deferred until the implementation lands.
226+
[proposals."0023"]
227+
status = "not-yet"
228+
220229
[proposals."0042"]
221230
status = "implemented"
222231
since = "0.11.0"
@@ -509,3 +518,20 @@ status = "not-yet"
509518
[proposals."0057"]
510519
status = "implemented"
511520
since = "0.13.0"
521+
522+
# Spec v0.53.0 (proposal 0058). Typed LLM failure event — second
523+
# spec-normatively-typed event variant on the observer event union
524+
# alongside LlmCompletionEvent. Field set mirrors LlmCompletionEvent's
525+
# identity / scoping / request-side surface (17 fields) plus three
526+
# failure-specific fields (error_category from the §7 normative
527+
# category enumeration, optional vendor-specific error_type, always-
528+
# present error_message). Dispatched alongside the §7 exception on
529+
# the observer queue — caller-side exception flow unchanged.
530+
# Mutually exclusive with LlmCompletionEvent on the same call. Python
531+
# lands the typed variant + provider emission + OTel/Langfuse
532+
# consumer migration in v0.13.0; same PR also drops sentinel-namespace
533+
# NodeEvent emission for LLM events entirely from the bundled
534+
# OpenAIProvider.
535+
[proposals."0058"]
536+
status = "implemented"
537+
since = "0.13.0"

openarmature-spec

Submodule openarmature-spec updated 33 files

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
6363
openarmature = "openarmature.cli:main"
6464

6565
[tool.openarmature]
66-
spec_version = "0.51.0"
66+
spec_version = "0.53.0"
6767

6868
[dependency-groups]
6969
dev = [

scripts/build_agents_md.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,11 @@ def _capability_summaries(spec_tag: str) -> str:
204204
(
205205
f"_Sourced from openarmature-spec {spec_tag}. Each entry below "
206206
+ "reproduces §1 (Purpose) and §2 (Concepts) of the capability's "
207-
+ "`spec.md`. For the full spec text (execution model, error semantics, "
207+
+ "`spec.md` verbatim — including additions from accepted proposals "
208+
+ "that this Python implementation may not yet ship. For per-proposal "
209+
+ "implementation status (implemented / partial / textual-only / "
210+
+ "not-yet), see the `conformance.toml` manifest at the repo root. "
211+
+ "For the full spec text (execution model, error semantics, "
208212
+ "determinism, observer hooks, etc.) see the linked docs site._"
209213
),
210214
]

src/openarmature/AGENTS.md

Lines changed: 67 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenArmature — Agent documentation
22

3-
*This is the agent guide bundled with the openarmature Python package, version 0.12.0 (spec v0.51.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
3+
*This is the agent guide bundled with the openarmature Python package, version 0.12.0 (spec v0.53.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
44

55
## TL;DR
66

@@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:
1010

1111
## Capability contracts
1212

13-
_Sourced from openarmature-spec v0.51.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md`. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
13+
_Sourced from openarmature-spec v0.53.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
1414

1515
### Capability: `graph-engine`
1616

@@ -46,11 +46,15 @@ engine constant, not a reserved node name, so a user node may happen to be named
4646

4747
**Reducer.** A function that merges a node's partial update into the prior state for a given field. Each state
4848
field has exactly one reducer. The default reducer is _last-write-wins_ (the new value replaces the old).
49-
Implementations MUST provide at least: `last_write_wins`, `append` (for list-typed fields), `merge`
50-
(for mapping-typed fields), `concat_flatten` (for list-typed fields whose updates are lists of lists —
51-
e.g., fan-out target fields collecting list-emitting per-instance values), and `merge_all` (for
52-
mapping-typed fields whose updates are lists of mappings — e.g., fan-out target fields collecting
53-
dict-emitting per-instance values). Users MAY register custom reducers per field.
49+
Implementations MUST provide at least the following eight canonical reducers: `last_write_wins`, `append`
50+
(for list-typed fields), `merge` (for mapping-typed fields), `concat_flatten` (for list-typed fields whose
51+
updates are lists of lists — e.g., fan-out target fields collecting list-emitting per-instance values),
52+
`merge_all` (for mapping-typed fields whose updates are lists of mappings — e.g., fan-out target fields
53+
collecting dict-emitting per-instance values), `bounded_append(max_len)` (factory; `append` capped at
54+
`max_len` entries with front-drop on overflow), `dedupe_append(key=None)` (factory; `append` skipping
55+
items whose key already appears in the existing list), and `merge_by_key(key)` (factory; list-of-records
56+
keyed merge — entries with a key matching an existing entry replace the existing entry in place; entries
57+
with novel keys are appended). Users MAY register custom reducers per field.
5458

5559
**`concat_flatten` semantics.** `concat_flatten(prior, update)` returns the concatenation of `prior` with the
5660
one-level flattening of `update`. Both `prior` and `update` MUST be lists, and every element of `update` MUST
@@ -72,6 +76,57 @@ inside `update` contribute zero keys. Implementations MUST NOT auto-detect wheth
7276
mappings vs. a single mapping — `merge_all` is strictly the list-of-mappings reducer; callers needing both
7377
behaviors on the same field MUST register a custom reducer rather than rely on shape-dependent behavior.
7478

79+
**`bounded_append(max_len)` semantics.** A factory returning a reducer that extends a list with the update's
80+
items and truncates from the front (oldest entries dropped first) if the post-merge length exceeds `max_len`.
81+
`max_len` MUST be a positive integer (≥ 1); a factory call with `max_len ≤ 0` raises
82+
`reducer_configuration_invalid` at field registration time. Behavior: concatenate prior + update, then if
83+
the concatenated list's length exceeds `max_len`, drop entries from the front until the length equals
84+
`max_len`. The bound applies to the post-merge length, not to the update's individual size — an update
85+
larger than `max_len` keeps only the last `max_len` items of the update and the prior list is fully evicted. Both `prior` and `update` MUST be lists;
86+
violations raise `ReducerError` per §4. Empty `update` is a no-op (returns `prior` unchanged) — the bound
87+
applies to merge-time transformations, not as a prior-validation pass; `prior` is returned as-is even if
88+
it somehow already exceeds `max_len` (matching the established `concat_flatten` / `merge_all` empty-update
89+
pattern). Truncation MUST be from the front (oldest-first eviction) for cross-impl consistency; back-drop
90+
is recoverable via a
91+
custom reducer if needed. `bounded_append` is for cases where silent drop of evicted data is acceptable
92+
(recent-events buffers, debug log windows, sliding metric caches); for cases where dropped data must be
93+
summarized or transformed first (the canonical chat-history-with-LLM-summarization shape), use unbounded
94+
`append` plus a separate compaction node or middleware — reducers are pure synchronous functions per the
95+
contract above and cannot perform the IO that real compaction requires.
96+
97+
**`dedupe_append(key=None)` semantics.** A factory returning a reducer that extends a list with items from
98+
the update that are not already present (by key) in the existing list. The `key` parameter is an optional
99+
callable mapping an item to its dedup key; if omitted, the item itself is used as the key (requires hashable
100+
items). Behavior: initialize a seen-keys set from `prior` (preserving `prior` unchanged in the result),
101+
iterate `update` in order, and for each item compute its key — if the key is NOT yet in seen-keys, append
102+
the item to the result and record its key; otherwise skip. Existing items appear before update items;
103+
within each, original order is maintained. Duplicates within the update itself are filtered alongside
104+
matches against `prior` — first occurrence wins (preserves left-to-right precedence consistent with
105+
`append`). The computed key (the item itself when no `key` callable is supplied, or the value returned by
106+
the callable) MUST be hashable; a non-hashable key raises `ReducerError` per §4 at merge time. A `key`
107+
callable that raises on any item propagates as `ReducerError`. The reducer does NOT mutate existing items
108+
(no in-place dedup of `prior`); only the update is filtered.
109+
110+
**`merge_by_key(key)` semantics.** A factory returning a reducer for list-of-records fields. Items in the
111+
update with a key matching an existing item REPLACE the existing item in place; items with novel keys are
112+
appended at the end of the list in the order they appear in the update. The `key` parameter is a required
113+
callable mapping an item to its merge key — the spec does NOT default this; keyed merge without a key
114+
function is meaningless and a factory call with `key=None` raises `reducer_configuration_invalid` at field
115+
registration time. Behavior: build a `key_to_idx` index from `prior` (when `prior` contains duplicate keys,
116+
the index MUST hold the LAST index for each duplicate key — implementations whose native dict construction
117+
uses first-wins semantics MUST iterate explicitly to enforce last-wins); for each item in `update`, if its
118+
key is in the index, replace the prior entry at that index with the update item; otherwise append the
119+
update item to the result and register its key. Existing entry order MUST be preserved (replacements are
120+
in-place); novel entries are appended in update order. Duplicate keys within the update collapse to
121+
last-occurrence-wins (consistent with how dict updates work for repeated keys). Earlier duplicates in
122+
`prior` are preserved in place — the reducer does NOT in-place dedupe existing entries (parallel to
123+
`dedupe_append`'s "no in-place dedup of existing" rule). The value returned by the `key` callable MUST
124+
be hashable (required by the index-build step); a non-hashable return value raises `ReducerError` per §4
125+
at merge time. The `key` callable raising on any item propagates as `ReducerError`. Empty `update` is a
126+
no-op. `merge_by_key` is NOT a substitute for `merge``merge`
127+
operates on dict-typed fields with shallow key-value semantics; `merge_by_key` operates on list-of-records
128+
fields with item-key semantics. The qualifier `_by_key` distinguishes the two shapes.
129+
75130
**Subgraph.** A compiled graph used as a node inside another graph. A subgraph executes against its own state
76131
schema and produces a partial update that is merged into the parent's state. The merge uses the same reducer
77132
rules as ordinary nodes — parent reducers, applied to parent fields.
@@ -136,6 +191,11 @@ identifiers (as an error class, error code, or tagged discriminant, per the lang
136191
- `conflicting_reducers` — a state field has more than one declared reducer.
137192
- `mapping_references_undeclared_field` — a subgraph-as-node `inputs` or `outputs` mapping names a field
138193
not declared in the relevant state schema.
194+
- `reducer_configuration_invalid` — a reducer factory was supplied invalid construction parameters
195+
(e.g., `bounded_append(max_len=0)`, `merge_by_key(key=None)`). Raised at field registration / graph
196+
compilation time, before any node body runs. Distinct from `conflicting_reducers`, which is about
197+
the reducer-declaration shape across multiple reducers on the same field; `reducer_configuration_invalid`
198+
is about parameters supplied to a single reducer factory.
139199

140200
### Capability: `pipeline-utilities`
141201

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"""
2626

2727
__version__ = "0.12.0"
28-
__spec_version__ = "0.51.0"
28+
__spec_version__ = "0.53.0"
2929
# Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
3030
# package-registry name for this implementation. Surfaces on every
3131
# OTel invocation span as ``openarmature.implementation.name`` and on

src/openarmature/graph/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
InvocationCompletedEvent,
4040
InvocationStartedEvent,
4141
LlmCompletionEvent,
42+
LlmFailedEvent,
4243
MetadataAugmentationEvent,
4344
NodeEvent,
4445
)
@@ -86,6 +87,7 @@
8687
"InvocationCompletedEvent",
8788
"InvocationStartedEvent",
8889
"LlmCompletionEvent",
90+
"LlmFailedEvent",
8991
"MappingReferencesUndeclaredField",
9092
"MetadataAugmentationEvent",
9193
"Middleware",

0 commit comments

Comments
 (0)