LunarCommand · chris-colinsky · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,17 @@ All notable changes to `openarmature-python` are documented in this file.
 
 The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).
 
+## [0.15.0] — 2026-06-18
+
+### Added
+
+- **Detached-trace invocation span** (proposal 0061, observability §4.4, spec v0.61.0). The OTel observer now synthesizes an `openarmature.invocation` span at the root of each detached trace (a detached subgraph and each detached fan-out instance), carrying the parent's shared `invocation_id` (detached mode is observer-side trace rendering, not a new run) and the detached unit's own `entry_node`; the detached subgraph / instance span nests under it. A raising detached subgraph surfaces ERROR plus the error category and an OTel exception event on both the parent dispatch span and the detached invocation span. This is observer-side only, with no graph-engine change; the Langfuse observer is unchanged (its Trace entity already plays the invocation-level-container role). Conformance fixtures 008 (rewritten) and 058 (newly wired) run in `test_observability`.
+- **Per-attempt LLM spans under call-level retry** (proposal 0050, observability §5.5 / llm-provider §7.1). Completes proposal 0050, which shipped `partial` in v0.14.0 (failure-isolation middleware and the `complete(retry=...)` loop landed then; the per-attempt span surface was deferred). Under call-level retry the OTel observer now emits one `openarmature.llm.complete` span per attempt, each carrying `openarmature.llm.attempt_index` (0-based, 0..N-1, and 0 for a no-retry call). An intermediate failed attempt's span carries ERROR status plus its error category and the request-side attributes; the final attempt's span carries the terminal outcome and, on success, the full response surface. A python-internal `LlmRetryAttemptEvent`, dispatched once per attempt, is the sole source of the OTel span; the terminal `LlmCompletionEvent` / `LlmFailedEvent` stay one per call (payload, latency, Langfuse Generation) and no longer drive the OTel span. Langfuse renders one terminal Generation per call, with the per-attempt detail on the OTel span surface only (a spec-side §8 clarification to pin this is tracked, non-blocking). `conformance.toml` flips proposal 0050 to `implemented`; the call-level fixtures 056-058 are driven through the provider plus OTel observer and the single-attempt observability fixture 057 is wired.
+
+### Changed
+
+- **Pinned spec advances v0.60.0 → v0.61.0** (proposal 0061, the detached-trace invocation span above). A single step this cycle; `conformance.toml` records proposal 0061 as `implemented`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
+
 ## [0.14.0] — 2026-06-17
 
 ### Added

diff --git a/conformance.toml b/conformance.toml
@@ -372,30 +372,34 @@ status = "implemented"
 since = "0.13.0"
 
 # Spec v0.42.0 (proposal 0050).  Retry & degradation primitives —
-# failure-isolation middleware (§6.3) + call-level retry (§7).  Both
-# primitives implemented across the v0.14.0 cycle:
-# FailureIsolationMiddleware (distinct FailureIsolatedEvent +
-# CaughtException) and the call-level ``retry`` parameter on
-# ``Provider.complete()`` — an in-call loop over transient §7 errors
-# reusing the §6.1 RetryConfig record.  ``partial`` because §7.1's
-# per-attempt span surface — N ``openarmature.llm.complete`` spans +
-# the ``openarmature.llm.attempt_index`` attribute — is DEFERRED: the
-# python LLM span is rendered from the typed event, which is
-# terminal-only per the graph-engine §6 mutual-exclusion contract, so
-# per-attempt spans require a dedicated within-call sub-event
-# (LlmRetryAttemptEvent) scoped to a future cycle.  Call-level retry
-# ships terminal-only: exactly one LlmCompletionEvent / LlmFailedEvent
-# per ``complete()`` call.  Failure-isolation conformance fixtures
-# (058-063) are all wired + passing: the FailureIsolatedEvent's
-# attempt_index reports the final / exhausting attempt per §6.3's
-# lineage-correlation rule (spec ruled this in the attempt-index coord
-# thread; RetryMiddleware now records the final attempt in a
-# terminal-attempt scope the outer isolation reads, rather than the
-# post-reset baseline).  ``partial`` is now solely about the
-# call-level-retry per-attempt span surface above.
+# failure-isolation middleware (§6.3) + call-level retry (§7),
+# including §7.1's per-attempt span surface.  Implemented across the
+# v0.14.0 + v0.15.0 cycles: FailureIsolationMiddleware (distinct
+# FailureIsolatedEvent + CaughtException) and the call-level ``retry``
+# parameter on ``Provider.complete()`` — an in-call loop over transient
+# §7 errors reusing the §6.1 RetryConfig record.  §7.1's per-attempt
+# span surface now ships: a call-level ``retry`` emits N
+# ``openarmature.llm.complete`` spans — one per attempt — each carrying
+# ``openarmature.llm.attempt_index`` (0-based, call-level, independent
+# of the node-level attempt_index).  A python-internal
+# LlmRetryAttemptEvent dispatched once per attempt is the SOLE source of
+# the OTel LLM span (including single no-retry calls, at index 0); the
+# terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
+# (payload, latency, Langfuse Generation, fixture-072 mutual exclusion)
+# and no longer drive the OTel span.  Langfuse renders one terminal
+# Generation per call.  llm-provider fixtures 056-058 (per-attempt
+# spans) are validated in tests/unit/test_observability_otel.py through
+# the provider + OTel observer; observability fixture 057
+# (single-attempt attempt_index) is wired in test_observability.
+# Failure-isolation fixtures (058-063) are all wired + passing: the
+# FailureIsolatedEvent's attempt_index reports the final / exhausting
+# attempt per §6.3's lineage-correlation rule (spec ruled this in the
+# attempt-index coord thread; RetryMiddleware now records the final
+# attempt in a terminal-attempt scope the outer isolation reads, rather
+# than the post-reset baseline).
 [proposals."0050"]
-status = "partial"
-since = "0.14.0"
+status = "implemented"
+since = "0.15.0"
 
 # Spec v0.43.0 (proposal 0051).  Langfuse trace.input/trace.output
 # implementation-surface caveat.  Purely textual: documents that the

diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md
@@ -659,14 +659,17 @@ as nested spans.
 
 When an `OpenAIProvider` (or any [custom Provider](../model-providers/authoring.md)
 that wires the dispatch hook) is used inside a graph with `OTelObserver`
-attached, each `provider.complete()` call emits a dedicated span named
-`openarmature.llm.complete`, parented under the calling node's span.
-The span carries two attribute families.
+attached, each `provider.complete()` attempt emits a dedicated span
+named `openarmature.llm.complete`, parented under the calling node's
+span. A call without retry emits one span; a call-level `retry=` that
+retries emits [one span per attempt](#per-attempt-spans-under-call-level-retry).
+Each span carries two attribute families.
 
 **`openarmature.llm.*` (always on).** The framework's canonical
 namespace: model identifier, finish reason, token counts, prompt
-identity from `with_active_prompt(...)`, error category on failure.
-Set unconditionally whenever the LLM span itself emits.
+identity from `with_active_prompt(...)`, error category on failure, and
+`openarmature.llm.attempt_index` (the 0-based call-level attempt
+counter). Set unconditionally whenever the LLM span itself emits.
 
 **`gen_ai.*` (OpenTelemetry GenAI semantic conventions, default on).**
 Cross-vendor attribute names every LLM-aware backend reads
@@ -702,6 +705,28 @@ when an external auto-instrumentation library (OpenInference,
 `opentelemetry-instrumentation-openai`) is already the canonical
 source on your stack.
 
+#### Per-attempt spans under call-level retry
+
+[Call-level retry](llms.md#retrying-transient-failures)
+(`provider.complete(retry=...)`) retries transient provider errors
+inside a single call. Each attempt emits its own
+`openarmature.llm.complete` span tagged with
+`openarmature.llm.attempt_index` (0-based). A call that succeeds on the
+first try emits one span at `attempt_index` 0; a call that fails twice
+transiently before succeeding emits three spans (indices 0, 1, 2). Each
+failed attempt's span carries `ERROR` status plus
+`openarmature.error.category`; the final attempt's span carries the
+terminal outcome (`OK` on success, `ERROR` on an exhausted or
+non-transient failure).
+
+`openarmature.llm.attempt_index` is the **call-level** attempt counter,
+[independent of the node-level `attempt_index`](llms.md#call-level-vs-node-level-retry):
+the former counts attempts inside one `complete()` call, the latter
+counts node re-executions driven by retry *middleware*. A node retried
+once by middleware, each execution calling a provider that itself
+retries once, produces node `attempt_index` 0/1 and, within each,
+call-level `attempt_index` 0/1.
+
 ### LLM payload attributes
 
 By default, LLM spans do **not** carry the messages sent or the
@@ -834,6 +859,14 @@ correctly; doing it from a `SpanProcessor.on_end` callback does
 not, because the framework has already called `span.end()` and the
 OTel SDK silently drops `set_attribute` on ended spans.
 
+For the `openarmature.llm.complete` span the close event is an
+`LlmRetryAttemptEvent` (one per attempt) rather than a `NodeEvent`;
+that is the per-attempt event the observer renders the LLM span from.
+An enricher scoped to that span (`span.name ==
+"openarmature.llm.complete"`) can read the attempt's outcome straight
+off it: `event.llm_attempt_index`, `event.error_category`,
+`event.usage`, `event.finish_reason`, and so on.
+
 Exceptions raised by an enricher are caught and warned, never
 propagated.
 
@@ -880,6 +913,16 @@ via `current_dispatch()`. See
 [Authoring providers](../model-providers/authoring.md) for the full
 pattern.
 
+Under [call-level retry](#per-attempt-spans-under-call-level-retry) the
+bundled `OpenAIProvider` additionally dispatches a python-internal
+`LlmRetryAttemptEvent` once per attempt; that is the event the OTel
+observer renders each per-attempt span from (including the lone attempt
+of a no-retry call, at index 0). The terminal `LlmCompletionEvent` /
+`LlmFailedEvent` above are unchanged: still one per call, still the
+stable surface for per-call consumption (token accounting, failure
+tracking). An observer that only cares about per-call outcomes can
+ignore `LlmRetryAttemptEvent`.
+
 #### Legacy sentinel-namespace pattern (compatibility surface)
 
 `openarmature.observability.LLM_NAMESPACE` and

diff --git a/src/openarmature/graph/__init__.py b/src/openarmature/graph/__init__.py
@@ -44,6 +44,7 @@
     InvocationStartedEvent,
     LlmCompletionEvent,
     LlmFailedEvent,
+    LlmRetryAttemptEvent,
     MetadataAugmentationEvent,
     NodeEvent,
 )
@@ -101,6 +102,7 @@
     "InvocationStartedEvent",
     "LlmCompletionEvent",
     "LlmFailedEvent",
+    "LlmRetryAttemptEvent",
     "MappingReferencesUndeclaredField",
     "MetadataAugmentationEvent",
     "Middleware",

diff --git a/src/openarmature/graph/events.py b/src/openarmature/graph/events.py
@@ -656,6 +656,72 @@ class LlmFailedEvent:
     caller_invocation_metadata: Mapping[str, AttributeValue] | None = None
 
 
+# Python-internal per-attempt LLM event. NOT a spec-normative event type
+# (unlike LlmCompletionEvent / LlmFailedEvent): it is the observer-side
+# vehicle for the observability §5.5 per-attempt span surface under
+# llm-provider §7.1 call-level retry. One is dispatched per in-call
+# attempt (including the single attempt of a no-retry call); the OTel
+# observer renders one openarmature.llm.complete span from each, while
+# the terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
+# (payload/latency, Langfuse mapping, the fixture-072 mutual exclusion).
+@dataclass(frozen=True)
+class LlmRetryAttemptEvent:
+    """One LLM-call attempt delivered to observers for per-attempt span
+    rendering.
+
+    Carries the full request-side surface plus that attempt's outcome.
+    ``error_category`` discriminates the outcome: ``None`` for a
+    successful attempt (the response-side fields are populated), a
+    category string for a failed attempt (the response-side fields are
+    ``None`` — no response was received).
+
+    Field set:
+
+    - ``llm_attempt_index``: the call-level retry-attempt index, ``0``
+      for the first attempt and ``0..N-1`` across the N attempts of a
+      call-level retry. Distinct from ``attempt_index`` (the node-level
+      retry index used for calling-span resolution); the two are
+      independent.
+    - identity / scoping (``invocation_id`` ... ``call_id``) and the
+      request side (``input_messages`` / ``request_params`` /
+      ``request_extras`` / ``active_prompt`` / ``active_prompt_group``)
+      mirror :class:`LlmCompletionEvent`, carried on every attempt.
+    - response side (``response_id`` / ``response_model`` / ``usage`` /
+      ``finish_reason`` / ``output_content``): populated on a successful
+      attempt; ``None`` on a failed attempt.
+    - failure side (``error_category`` / ``error_message`` /
+      ``error_type``): populated on a failed attempt; ``None`` on a
+      successful one.
+    """
+
+    invocation_id: str
+    correlation_id: str | None
+    node_name: str
+    namespace: tuple[str, ...]
+    attempt_index: int
+    fan_out_index: int | None
+    branch_name: str | None
+    provider: str
+    model: str
+    call_id: str
+    llm_attempt_index: int
+    latency_ms: float | None
+    input_messages: list[dict[str, Any]]
+    request_params: Mapping[str, Any]
+    request_extras: Mapping[str, Any]
+    active_prompt: Any
+    active_prompt_group: Any
+    response_id: str | None = None
+    response_model: str | None = None
+    usage: "Usage | None" = None
+    finish_reason: str | None = None
+    output_content: str | None = None
+    error_category: str | None = None
+    error_message: str | None = None
+    error_type: str | None = None
+    caller_invocation_metadata: Mapping[str, AttributeValue] | None = None
+
+
 # Spec: pipeline-utilities §6.3 cause chain (proposal 0068). A ``carrier``
 # link is a graph-engine §4 ``node_exception`` wrapper the engine applies at a
 # non-node placement (§9.7 instance / §11.7 branch / §9.6 / §11.6 parent-node
@@ -758,6 +824,7 @@ class FailureIsolatedEvent:
     "InvocationStartedEvent",
     "LlmCompletionEvent",
     "LlmFailedEvent",
+    "LlmRetryAttemptEvent",
     "MetadataAugmentationEvent",
     "NodeEvent",
     "ParallelBranchesEventConfig",

diff --git a/src/openarmature/graph/observer.py b/src/openarmature/graph/observer.py
@@ -40,6 +40,7 @@
     InvocationStartedEvent,
     LlmCompletionEvent,
     LlmFailedEvent,
+    LlmRetryAttemptEvent,
     MetadataAugmentationEvent,
     NodeEvent,
 )
@@ -55,6 +56,9 @@
 # typed LLM provider call event, dispatched on every successful LLM
 # completion), LlmFailedEvent (proposal 0058 typed LLM failure event,
 # dispatched alongside the §7 exception when provider.complete raises),
+# LlmRetryAttemptEvent (proposal 0050 per-attempt LLM span event,
+# python-internal, dispatched once per in-call attempt under call-level
+# retry to drive the per-attempt OTel span surface),
 # and FailureIsolatedEvent (proposal 0050 §6.3 framework-emitted event,
 # dispatched by FailureIsolationMiddleware when it catches an exception
 # escaping the inner chain and substitutes a degraded partial update).
@@ -65,6 +69,7 @@
     | InvocationCompletedEvent
     | LlmCompletionEvent
     | LlmFailedEvent
+    | LlmRetryAttemptEvent
     | FailureIsolatedEvent
 )