Skip to content

Commit 7224e30

Browse files
Add per-attempt LLM spans under call-level retry (0050) (#170)
* Add LlmRetryAttemptEvent for per-attempt LLM spans (0050) * Emit per-attempt LlmRetryAttemptEvent from complete() (0050) * Render per-attempt LLM spans from LlmRetryAttemptEvent (0050) * Activate obs-057 (single-attempt llm.attempt_index) (0050) * Add N-span call-level-retry integration test (0050) * Activate call-level-retry per-attempt span fixtures (0050) * Document per-attempt LLM spans and flip 0050 Flip conformance.toml [proposals."0050"] partial -> implemented (since 0.15.0): the call-level-retry per-attempt span surface now ships. Document the openarmature.llm.attempt_index attribute and the per-attempt span behavior in the observability concepts page, plus notes that span enrichers receive LlmRetryAttemptEvent on the LLM span and that the bundled provider dispatches that internal event alongside the unchanged terminal events. Add the 0.15.0 changelog section covering this work and backfilling the 0061 detached-trace invocation span (which landed without an entry), plus the v0.60.0 -> v0.61.0 spec-pin bullet. * Deduplicate per-attempt LLM event builder _build_llm_retry_attempt_event constructed a full LlmRetryAttemptEvent twice, repeating ~18 shared identity, scoping, and request-side fields across the success and failure branches. Hoist them into one base dict and splat it, leaving each branch to add only its outcome fields. No behavior change. * Test OTel observer ignores terminal LLM events The OTel observer now renders the LLM span solely from the per-attempt LlmRetryAttemptEvent; terminal LlmCompletionEvent / LlmFailedEvent are ignored. Add a regression test feeding both terminal events and asserting zero openarmature.llm.complete spans, guarding against reintroducing the terminal-event span path. Also fix a stale docstring in _drive_llm_span_with_cached_tokens that still referenced "typed LlmCompletionEvent". * Re-export LlmRetryAttemptEvent; isinstance filter PR #170 CoPilot review: - Re-export LlmRetryAttemptEvent from the openarmature.graph package (import block + __all__), matching the sibling LlmCompletionEvent / LlmFailedEvent so the documented observer import path works. - Replace the brittle type(event).__name__ name match with an isinstance check in the conformance _TypedEventCollector; the filter_event_type string comparison stays as-is.
1 parent fa41e1d commit 7224e30

15 files changed

Lines changed: 589 additions & 188 deletions

File tree

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,17 @@ All notable changes to `openarmature-python` are documented in this file.
44

55
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).
66

7+
## [0.15.0] — 2026-06-18
8+
9+
### Added
10+
11+
- **Detached-trace invocation span** (proposal 0061, observability §4.4, spec v0.61.0). The OTel observer now synthesizes an `openarmature.invocation` span at the root of each detached trace (a detached subgraph and each detached fan-out instance), carrying the parent's shared `invocation_id` (detached mode is observer-side trace rendering, not a new run) and the detached unit's own `entry_node`; the detached subgraph / instance span nests under it. A raising detached subgraph surfaces ERROR plus the error category and an OTel exception event on both the parent dispatch span and the detached invocation span. This is observer-side only, with no graph-engine change; the Langfuse observer is unchanged (its Trace entity already plays the invocation-level-container role). Conformance fixtures 008 (rewritten) and 058 (newly wired) run in `test_observability`.
12+
- **Per-attempt LLM spans under call-level retry** (proposal 0050, observability §5.5 / llm-provider §7.1). Completes proposal 0050, which shipped `partial` in v0.14.0 (failure-isolation middleware and the `complete(retry=...)` loop landed then; the per-attempt span surface was deferred). Under call-level retry the OTel observer now emits one `openarmature.llm.complete` span per attempt, each carrying `openarmature.llm.attempt_index` (0-based, 0..N-1, and 0 for a no-retry call). An intermediate failed attempt's span carries ERROR status plus its error category and the request-side attributes; the final attempt's span carries the terminal outcome and, on success, the full response surface. A python-internal `LlmRetryAttemptEvent`, dispatched once per attempt, is the sole source of the OTel span; the terminal `LlmCompletionEvent` / `LlmFailedEvent` stay one per call (payload, latency, Langfuse Generation) and no longer drive the OTel span. Langfuse renders one terminal Generation per call, with the per-attempt detail on the OTel span surface only (a spec-side §8 clarification to pin this is tracked, non-blocking). `conformance.toml` flips proposal 0050 to `implemented`; the call-level fixtures 056-058 are driven through the provider plus OTel observer and the single-attempt observability fixture 057 is wired.
13+
14+
### Changed
15+
16+
- **Pinned spec advances v0.60.0 → v0.61.0** (proposal 0061, the detached-trace invocation span above). A single step this cycle; `conformance.toml` records proposal 0061 as `implemented`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
17+
718
## [0.14.0] — 2026-06-17
819

920
### Added

conformance.toml

Lines changed: 27 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -372,30 +372,34 @@ status = "implemented"
372372
since = "0.13.0"
373373

374374
# Spec v0.42.0 (proposal 0050). Retry & degradation primitives —
375-
# failure-isolation middleware (§6.3) + call-level retry (§7). Both
376-
# primitives implemented across the v0.14.0 cycle:
377-
# FailureIsolationMiddleware (distinct FailureIsolatedEvent +
378-
# CaughtException) and the call-level ``retry`` parameter on
379-
# ``Provider.complete()`` — an in-call loop over transient §7 errors
380-
# reusing the §6.1 RetryConfig record. ``partial`` because §7.1's
381-
# per-attempt span surface — N ``openarmature.llm.complete`` spans +
382-
# the ``openarmature.llm.attempt_index`` attribute — is DEFERRED: the
383-
# python LLM span is rendered from the typed event, which is
384-
# terminal-only per the graph-engine §6 mutual-exclusion contract, so
385-
# per-attempt spans require a dedicated within-call sub-event
386-
# (LlmRetryAttemptEvent) scoped to a future cycle. Call-level retry
387-
# ships terminal-only: exactly one LlmCompletionEvent / LlmFailedEvent
388-
# per ``complete()`` call. Failure-isolation conformance fixtures
389-
# (058-063) are all wired + passing: the FailureIsolatedEvent's
390-
# attempt_index reports the final / exhausting attempt per §6.3's
391-
# lineage-correlation rule (spec ruled this in the attempt-index coord
392-
# thread; RetryMiddleware now records the final attempt in a
393-
# terminal-attempt scope the outer isolation reads, rather than the
394-
# post-reset baseline). ``partial`` is now solely about the
395-
# call-level-retry per-attempt span surface above.
375+
# failure-isolation middleware (§6.3) + call-level retry (§7),
376+
# including §7.1's per-attempt span surface. Implemented across the
377+
# v0.14.0 + v0.15.0 cycles: FailureIsolationMiddleware (distinct
378+
# FailureIsolatedEvent + CaughtException) and the call-level ``retry``
379+
# parameter on ``Provider.complete()`` — an in-call loop over transient
380+
# §7 errors reusing the §6.1 RetryConfig record. §7.1's per-attempt
381+
# span surface now ships: a call-level ``retry`` emits N
382+
# ``openarmature.llm.complete`` spans — one per attempt — each carrying
383+
# ``openarmature.llm.attempt_index`` (0-based, call-level, independent
384+
# of the node-level attempt_index). A python-internal
385+
# LlmRetryAttemptEvent dispatched once per attempt is the SOLE source of
386+
# the OTel LLM span (including single no-retry calls, at index 0); the
387+
# terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
388+
# (payload, latency, Langfuse Generation, fixture-072 mutual exclusion)
389+
# and no longer drive the OTel span. Langfuse renders one terminal
390+
# Generation per call. llm-provider fixtures 056-058 (per-attempt
391+
# spans) are validated in tests/unit/test_observability_otel.py through
392+
# the provider + OTel observer; observability fixture 057
393+
# (single-attempt attempt_index) is wired in test_observability.
394+
# Failure-isolation fixtures (058-063) are all wired + passing: the
395+
# FailureIsolatedEvent's attempt_index reports the final / exhausting
396+
# attempt per §6.3's lineage-correlation rule (spec ruled this in the
397+
# attempt-index coord thread; RetryMiddleware now records the final
398+
# attempt in a terminal-attempt scope the outer isolation reads, rather
399+
# than the post-reset baseline).
396400
[proposals."0050"]
397-
status = "partial"
398-
since = "0.14.0"
401+
status = "implemented"
402+
since = "0.15.0"
399403

400404
# Spec v0.43.0 (proposal 0051). Langfuse trace.input/trace.output
401405
# implementation-surface caveat. Purely textual: documents that the

docs/concepts/observability.md

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -659,14 +659,17 @@ as nested spans.
659659

660660
When an `OpenAIProvider` (or any [custom Provider](../model-providers/authoring.md)
661661
that wires the dispatch hook) is used inside a graph with `OTelObserver`
662-
attached, each `provider.complete()` call emits a dedicated span named
663-
`openarmature.llm.complete`, parented under the calling node's span.
664-
The span carries two attribute families.
662+
attached, each `provider.complete()` attempt emits a dedicated span
663+
named `openarmature.llm.complete`, parented under the calling node's
664+
span. A call without retry emits one span; a call-level `retry=` that
665+
retries emits [one span per attempt](#per-attempt-spans-under-call-level-retry).
666+
Each span carries two attribute families.
665667

666668
**`openarmature.llm.*` (always on).** The framework's canonical
667669
namespace: model identifier, finish reason, token counts, prompt
668-
identity from `with_active_prompt(...)`, error category on failure.
669-
Set unconditionally whenever the LLM span itself emits.
670+
identity from `with_active_prompt(...)`, error category on failure, and
671+
`openarmature.llm.attempt_index` (the 0-based call-level attempt
672+
counter). Set unconditionally whenever the LLM span itself emits.
670673

671674
**`gen_ai.*` (OpenTelemetry GenAI semantic conventions, default on).**
672675
Cross-vendor attribute names every LLM-aware backend reads
@@ -702,6 +705,28 @@ when an external auto-instrumentation library (OpenInference,
702705
`opentelemetry-instrumentation-openai`) is already the canonical
703706
source on your stack.
704707

708+
#### Per-attempt spans under call-level retry
709+
710+
[Call-level retry](llms.md#retrying-transient-failures)
711+
(`provider.complete(retry=...)`) retries transient provider errors
712+
inside a single call. Each attempt emits its own
713+
`openarmature.llm.complete` span tagged with
714+
`openarmature.llm.attempt_index` (0-based). A call that succeeds on the
715+
first try emits one span at `attempt_index` 0; a call that fails twice
716+
transiently before succeeding emits three spans (indices 0, 1, 2). Each
717+
failed attempt's span carries `ERROR` status plus
718+
`openarmature.error.category`; the final attempt's span carries the
719+
terminal outcome (`OK` on success, `ERROR` on an exhausted or
720+
non-transient failure).
721+
722+
`openarmature.llm.attempt_index` is the **call-level** attempt counter,
723+
[independent of the node-level `attempt_index`](llms.md#call-level-vs-node-level-retry):
724+
the former counts attempts inside one `complete()` call, the latter
725+
counts node re-executions driven by retry *middleware*. A node retried
726+
once by middleware, each execution calling a provider that itself
727+
retries once, produces node `attempt_index` 0/1 and, within each,
728+
call-level `attempt_index` 0/1.
729+
705730
### LLM payload attributes
706731

707732
By default, LLM spans do **not** carry the messages sent or the
@@ -834,6 +859,14 @@ correctly; doing it from a `SpanProcessor.on_end` callback does
834859
not, because the framework has already called `span.end()` and the
835860
OTel SDK silently drops `set_attribute` on ended spans.
836861

862+
For the `openarmature.llm.complete` span the close event is an
863+
`LlmRetryAttemptEvent` (one per attempt) rather than a `NodeEvent`;
864+
that is the per-attempt event the observer renders the LLM span from.
865+
An enricher scoped to that span (`span.name ==
866+
"openarmature.llm.complete"`) can read the attempt's outcome straight
867+
off it: `event.llm_attempt_index`, `event.error_category`,
868+
`event.usage`, `event.finish_reason`, and so on.
869+
837870
Exceptions raised by an enricher are caught and warned, never
838871
propagated.
839872

@@ -880,6 +913,16 @@ via `current_dispatch()`. See
880913
[Authoring providers](../model-providers/authoring.md) for the full
881914
pattern.
882915

916+
Under [call-level retry](#per-attempt-spans-under-call-level-retry) the
917+
bundled `OpenAIProvider` additionally dispatches a python-internal
918+
`LlmRetryAttemptEvent` once per attempt; that is the event the OTel
919+
observer renders each per-attempt span from (including the lone attempt
920+
of a no-retry call, at index 0). The terminal `LlmCompletionEvent` /
921+
`LlmFailedEvent` above are unchanged: still one per call, still the
922+
stable surface for per-call consumption (token accounting, failure
923+
tracking). An observer that only cares about per-call outcomes can
924+
ignore `LlmRetryAttemptEvent`.
925+
883926
#### Legacy sentinel-namespace pattern (compatibility surface)
884927

885928
`openarmature.observability.LLM_NAMESPACE` and

src/openarmature/graph/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
InvocationStartedEvent,
4545
LlmCompletionEvent,
4646
LlmFailedEvent,
47+
LlmRetryAttemptEvent,
4748
MetadataAugmentationEvent,
4849
NodeEvent,
4950
)
@@ -101,6 +102,7 @@
101102
"InvocationStartedEvent",
102103
"LlmCompletionEvent",
103104
"LlmFailedEvent",
105+
"LlmRetryAttemptEvent",
104106
"MappingReferencesUndeclaredField",
105107
"MetadataAugmentationEvent",
106108
"Middleware",

src/openarmature/graph/events.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -656,6 +656,72 @@ class LlmFailedEvent:
656656
caller_invocation_metadata: Mapping[str, AttributeValue] | None = None
657657

658658

659+
# Python-internal per-attempt LLM event. NOT a spec-normative event type
660+
# (unlike LlmCompletionEvent / LlmFailedEvent): it is the observer-side
661+
# vehicle for the observability §5.5 per-attempt span surface under
662+
# llm-provider §7.1 call-level retry. One is dispatched per in-call
663+
# attempt (including the single attempt of a no-retry call); the OTel
664+
# observer renders one openarmature.llm.complete span from each, while
665+
# the terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
666+
# (payload/latency, Langfuse mapping, the fixture-072 mutual exclusion).
667+
@dataclass(frozen=True)
668+
class LlmRetryAttemptEvent:
669+
"""One LLM-call attempt delivered to observers for per-attempt span
670+
rendering.
671+
672+
Carries the full request-side surface plus that attempt's outcome.
673+
``error_category`` discriminates the outcome: ``None`` for a
674+
successful attempt (the response-side fields are populated), a
675+
category string for a failed attempt (the response-side fields are
676+
``None`` — no response was received).
677+
678+
Field set:
679+
680+
- ``llm_attempt_index``: the call-level retry-attempt index, ``0``
681+
for the first attempt and ``0..N-1`` across the N attempts of a
682+
call-level retry. Distinct from ``attempt_index`` (the node-level
683+
retry index used for calling-span resolution); the two are
684+
independent.
685+
- identity / scoping (``invocation_id`` ... ``call_id``) and the
686+
request side (``input_messages`` / ``request_params`` /
687+
``request_extras`` / ``active_prompt`` / ``active_prompt_group``)
688+
mirror :class:`LlmCompletionEvent`, carried on every attempt.
689+
- response side (``response_id`` / ``response_model`` / ``usage`` /
690+
``finish_reason`` / ``output_content``): populated on a successful
691+
attempt; ``None`` on a failed attempt.
692+
- failure side (``error_category`` / ``error_message`` /
693+
``error_type``): populated on a failed attempt; ``None`` on a
694+
successful one.
695+
"""
696+
697+
invocation_id: str
698+
correlation_id: str | None
699+
node_name: str
700+
namespace: tuple[str, ...]
701+
attempt_index: int
702+
fan_out_index: int | None
703+
branch_name: str | None
704+
provider: str
705+
model: str
706+
call_id: str
707+
llm_attempt_index: int
708+
latency_ms: float | None
709+
input_messages: list[dict[str, Any]]
710+
request_params: Mapping[str, Any]
711+
request_extras: Mapping[str, Any]
712+
active_prompt: Any
713+
active_prompt_group: Any
714+
response_id: str | None = None
715+
response_model: str | None = None
716+
usage: "Usage | None" = None
717+
finish_reason: str | None = None
718+
output_content: str | None = None
719+
error_category: str | None = None
720+
error_message: str | None = None
721+
error_type: str | None = None
722+
caller_invocation_metadata: Mapping[str, AttributeValue] | None = None
723+
724+
659725
# Spec: pipeline-utilities §6.3 cause chain (proposal 0068). A ``carrier``
660726
# link is a graph-engine §4 ``node_exception`` wrapper the engine applies at a
661727
# non-node placement (§9.7 instance / §11.7 branch / §9.6 / §11.6 parent-node
@@ -758,6 +824,7 @@ class FailureIsolatedEvent:
758824
"InvocationStartedEvent",
759825
"LlmCompletionEvent",
760826
"LlmFailedEvent",
827+
"LlmRetryAttemptEvent",
761828
"MetadataAugmentationEvent",
762829
"NodeEvent",
763830
"ParallelBranchesEventConfig",

src/openarmature/graph/observer.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
InvocationStartedEvent,
4141
LlmCompletionEvent,
4242
LlmFailedEvent,
43+
LlmRetryAttemptEvent,
4344
MetadataAugmentationEvent,
4445
NodeEvent,
4546
)
@@ -55,6 +56,9 @@
5556
# typed LLM provider call event, dispatched on every successful LLM
5657
# completion), LlmFailedEvent (proposal 0058 typed LLM failure event,
5758
# dispatched alongside the §7 exception when provider.complete raises),
59+
# LlmRetryAttemptEvent (proposal 0050 per-attempt LLM span event,
60+
# python-internal, dispatched once per in-call attempt under call-level
61+
# retry to drive the per-attempt OTel span surface),
5862
# and FailureIsolatedEvent (proposal 0050 §6.3 framework-emitted event,
5963
# dispatched by FailureIsolationMiddleware when it catches an exception
6064
# escaping the inner chain and substitutes a degraded partial update).
@@ -65,6 +69,7 @@
6569
| InvocationCompletedEvent
6670
| LlmCompletionEvent
6771
| LlmFailedEvent
72+
| LlmRetryAttemptEvent
6873
| FailureIsolatedEvent
6974
)
7075

0 commit comments

Comments
 (0)