Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@ All notable changes to `openarmature-python` are documented in this file.

The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).

## [0.15.0] — 2026-06-18

### Added

- **Detached-trace invocation span** (proposal 0061, observability §4.4, spec v0.61.0). The OTel observer now synthesizes an `openarmature.invocation` span at the root of each detached trace (a detached subgraph and each detached fan-out instance), carrying the parent's shared `invocation_id` (detached mode is observer-side trace rendering, not a new run) and the detached unit's own `entry_node`; the detached subgraph / instance span nests under it. A raising detached subgraph surfaces ERROR plus the error category and an OTel exception event on both the parent dispatch span and the detached invocation span. This is observer-side only, with no graph-engine change; the Langfuse observer is unchanged (its Trace entity already plays the invocation-level-container role). Conformance fixtures 008 (rewritten) and 058 (newly wired) run in `test_observability`.
- **Per-attempt LLM spans under call-level retry** (proposal 0050, observability §5.5 / llm-provider §7.1). Completes proposal 0050, which shipped `partial` in v0.14.0 (failure-isolation middleware and the `complete(retry=...)` loop landed then; the per-attempt span surface was deferred). Under call-level retry the OTel observer now emits one `openarmature.llm.complete` span per attempt, each carrying `openarmature.llm.attempt_index` (0-based, 0..N-1, and 0 for a no-retry call). An intermediate failed attempt's span carries ERROR status plus its error category and the request-side attributes; the final attempt's span carries the terminal outcome and, on success, the full response surface. A python-internal `LlmRetryAttemptEvent`, dispatched once per attempt, is the sole source of the OTel span; the terminal `LlmCompletionEvent` / `LlmFailedEvent` stay one per call (payload, latency, Langfuse Generation) and no longer drive the OTel span. Langfuse renders one terminal Generation per call, with the per-attempt detail on the OTel span surface only (a spec-side §8 clarification to pin this is tracked, non-blocking). `conformance.toml` flips proposal 0050 to `implemented`; the call-level fixtures 056-058 are driven through the provider plus OTel observer and the single-attempt observability fixture 057 is wired.

### Changed

- **Pinned spec advances v0.60.0 → v0.61.0** (proposal 0061, the detached-trace invocation span above). A single step this cycle; `conformance.toml` records proposal 0061 as `implemented`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.

## [0.14.0] — 2026-06-17

### Added
Expand Down
50 changes: 27 additions & 23 deletions conformance.toml
Original file line number Diff line number Diff line change
Expand Up @@ -372,30 +372,34 @@ status = "implemented"
since = "0.13.0"

# Spec v0.42.0 (proposal 0050). Retry & degradation primitives —
# failure-isolation middleware (§6.3) + call-level retry (§7). Both
# primitives implemented across the v0.14.0 cycle:
# FailureIsolationMiddleware (distinct FailureIsolatedEvent +
# CaughtException) and the call-level ``retry`` parameter on
# ``Provider.complete()`` — an in-call loop over transient §7 errors
# reusing the §6.1 RetryConfig record. ``partial`` because §7.1's
# per-attempt span surface — N ``openarmature.llm.complete`` spans +
# the ``openarmature.llm.attempt_index`` attribute — is DEFERRED: the
# python LLM span is rendered from the typed event, which is
# terminal-only per the graph-engine §6 mutual-exclusion contract, so
# per-attempt spans require a dedicated within-call sub-event
# (LlmRetryAttemptEvent) scoped to a future cycle. Call-level retry
# ships terminal-only: exactly one LlmCompletionEvent / LlmFailedEvent
# per ``complete()`` call. Failure-isolation conformance fixtures
# (058-063) are all wired + passing: the FailureIsolatedEvent's
# attempt_index reports the final / exhausting attempt per §6.3's
# lineage-correlation rule (spec ruled this in the attempt-index coord
# thread; RetryMiddleware now records the final attempt in a
# terminal-attempt scope the outer isolation reads, rather than the
# post-reset baseline). ``partial`` is now solely about the
# call-level-retry per-attempt span surface above.
# failure-isolation middleware (§6.3) + call-level retry (§7),
# including §7.1's per-attempt span surface. Implemented across the
# v0.14.0 + v0.15.0 cycles: FailureIsolationMiddleware (distinct
# FailureIsolatedEvent + CaughtException) and the call-level ``retry``
# parameter on ``Provider.complete()`` — an in-call loop over transient
# §7 errors reusing the §6.1 RetryConfig record. §7.1's per-attempt
# span surface now ships: a call-level ``retry`` emits N
# ``openarmature.llm.complete`` spans — one per attempt — each carrying
# ``openarmature.llm.attempt_index`` (0-based, call-level, independent
# of the node-level attempt_index). A python-internal
# LlmRetryAttemptEvent dispatched once per attempt is the SOLE source of
# the OTel LLM span (including single no-retry calls, at index 0); the
# terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
# (payload, latency, Langfuse Generation, fixture-072 mutual exclusion)
# and no longer drive the OTel span. Langfuse renders one terminal
# Generation per call. llm-provider fixtures 056-058 (per-attempt
# spans) are validated in tests/unit/test_observability_otel.py through
# the provider + OTel observer; observability fixture 057
# (single-attempt attempt_index) is wired in test_observability.
# Failure-isolation fixtures (058-063) are all wired + passing: the
# FailureIsolatedEvent's attempt_index reports the final / exhausting
# attempt per §6.3's lineage-correlation rule (spec ruled this in the
# attempt-index coord thread; RetryMiddleware now records the final
# attempt in a terminal-attempt scope the outer isolation reads, rather
# than the post-reset baseline).
[proposals."0050"]
status = "partial"
since = "0.14.0"
status = "implemented"
since = "0.15.0"

# Spec v0.43.0 (proposal 0051). Langfuse trace.input/trace.output
# implementation-surface caveat. Purely textual: documents that the
Expand Down
53 changes: 48 additions & 5 deletions docs/concepts/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -659,14 +659,17 @@ as nested spans.

When an `OpenAIProvider` (or any [custom Provider](../model-providers/authoring.md)
that wires the dispatch hook) is used inside a graph with `OTelObserver`
attached, each `provider.complete()` call emits a dedicated span named
`openarmature.llm.complete`, parented under the calling node's span.
The span carries two attribute families.
attached, each `provider.complete()` attempt emits a dedicated span
named `openarmature.llm.complete`, parented under the calling node's
span. A call without retry emits one span; a call-level `retry=` that
retries emits [one span per attempt](#per-attempt-spans-under-call-level-retry).
Each span carries two attribute families.

**`openarmature.llm.*` (always on).** The framework's canonical
namespace: model identifier, finish reason, token counts, prompt
identity from `with_active_prompt(...)`, error category on failure.
Set unconditionally whenever the LLM span itself emits.
identity from `with_active_prompt(...)`, error category on failure, and
`openarmature.llm.attempt_index` (the 0-based call-level attempt
counter). Set unconditionally whenever the LLM span itself emits.

**`gen_ai.*` (OpenTelemetry GenAI semantic conventions, default on).**
Cross-vendor attribute names every LLM-aware backend reads
Expand Down Expand Up @@ -702,6 +705,28 @@ when an external auto-instrumentation library (OpenInference,
`opentelemetry-instrumentation-openai`) is already the canonical
source on your stack.

#### Per-attempt spans under call-level retry

[Call-level retry](llms.md#retrying-transient-failures)
(`provider.complete(retry=...)`) retries transient provider errors
inside a single call. Each attempt emits its own
`openarmature.llm.complete` span tagged with
`openarmature.llm.attempt_index` (0-based). A call that succeeds on the
first try emits one span at `attempt_index` 0; a call that fails twice
transiently before succeeding emits three spans (indices 0, 1, 2). Each
failed attempt's span carries `ERROR` status plus
`openarmature.error.category`; the final attempt's span carries the
terminal outcome (`OK` on success, `ERROR` on an exhausted or
non-transient failure).

`openarmature.llm.attempt_index` is the **call-level** attempt counter,
[independent of the node-level `attempt_index`](llms.md#call-level-vs-node-level-retry):
the former counts attempts inside one `complete()` call, the latter
counts node re-executions driven by retry *middleware*. A node retried
once by middleware, each execution calling a provider that itself
retries once, produces node `attempt_index` 0/1 and, within each,
call-level `attempt_index` 0/1.

### LLM payload attributes

By default, LLM spans do **not** carry the messages sent or the
Expand Down Expand Up @@ -834,6 +859,14 @@ correctly; doing it from a `SpanProcessor.on_end` callback does
not, because the framework has already called `span.end()` and the
OTel SDK silently drops `set_attribute` on ended spans.

For the `openarmature.llm.complete` span the close event is an
`LlmRetryAttemptEvent` (one per attempt) rather than a `NodeEvent`;
that is the per-attempt event the observer renders the LLM span from.
An enricher scoped to that span (`span.name ==
"openarmature.llm.complete"`) can read the attempt's outcome straight
off it: `event.llm_attempt_index`, `event.error_category`,
`event.usage`, `event.finish_reason`, and so on.

Exceptions raised by an enricher are caught and warned, never
propagated.

Expand Down Expand Up @@ -880,6 +913,16 @@ via `current_dispatch()`. See
[Authoring providers](../model-providers/authoring.md) for the full
pattern.

Under [call-level retry](#per-attempt-spans-under-call-level-retry) the
bundled `OpenAIProvider` additionally dispatches a python-internal
`LlmRetryAttemptEvent` once per attempt; that is the event the OTel
observer renders each per-attempt span from (including the lone attempt
of a no-retry call, at index 0). The terminal `LlmCompletionEvent` /
`LlmFailedEvent` above are unchanged: still one per call, still the
stable surface for per-call consumption (token accounting, failure
tracking). An observer that only cares about per-call outcomes can
ignore `LlmRetryAttemptEvent`.

#### Legacy sentinel-namespace pattern (compatibility surface)

`openarmature.observability.LLM_NAMESPACE` and
Expand Down
2 changes: 2 additions & 0 deletions src/openarmature/graph/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
InvocationStartedEvent,
LlmCompletionEvent,
LlmFailedEvent,
LlmRetryAttemptEvent,
MetadataAugmentationEvent,
NodeEvent,
)
Expand Down Expand Up @@ -101,6 +102,7 @@
"InvocationStartedEvent",
"LlmCompletionEvent",
"LlmFailedEvent",
"LlmRetryAttemptEvent",
"MappingReferencesUndeclaredField",
"MetadataAugmentationEvent",
"Middleware",
Expand Down
67 changes: 67 additions & 0 deletions src/openarmature/graph/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,72 @@ class LlmFailedEvent:
caller_invocation_metadata: Mapping[str, AttributeValue] | None = None


# Python-internal per-attempt LLM event. NOT a spec-normative event type
# (unlike LlmCompletionEvent / LlmFailedEvent): it is the observer-side
# vehicle for the observability §5.5 per-attempt span surface under
# llm-provider §7.1 call-level retry. One is dispatched per in-call
# attempt (including the single attempt of a no-retry call); the OTel
# observer renders one openarmature.llm.complete span from each, while
# the terminal LlmCompletionEvent / LlmFailedEvent stay one-per-call
# (payload/latency, Langfuse mapping, the fixture-072 mutual exclusion).
@dataclass(frozen=True)
class LlmRetryAttemptEvent:
"""One LLM-call attempt delivered to observers for per-attempt span
rendering.

Carries the full request-side surface plus that attempt's outcome.
``error_category`` discriminates the outcome: ``None`` for a
successful attempt (the response-side fields are populated), a
category string for a failed attempt (the response-side fields are
``None`` — no response was received).

Field set:

- ``llm_attempt_index``: the call-level retry-attempt index, ``0``
for the first attempt and ``0..N-1`` across the N attempts of a
call-level retry. Distinct from ``attempt_index`` (the node-level
retry index used for calling-span resolution); the two are
independent.
- identity / scoping (``invocation_id`` ... ``call_id``) and the
request side (``input_messages`` / ``request_params`` /
``request_extras`` / ``active_prompt`` / ``active_prompt_group``)
mirror :class:`LlmCompletionEvent`, carried on every attempt.
- response side (``response_id`` / ``response_model`` / ``usage`` /
``finish_reason`` / ``output_content``): populated on a successful
attempt; ``None`` on a failed attempt.
- failure side (``error_category`` / ``error_message`` /
``error_type``): populated on a failed attempt; ``None`` on a
successful one.
"""

invocation_id: str
correlation_id: str | None
node_name: str
namespace: tuple[str, ...]
attempt_index: int
fan_out_index: int | None
branch_name: str | None
provider: str
model: str
call_id: str
llm_attempt_index: int
latency_ms: float | None
input_messages: list[dict[str, Any]]
request_params: Mapping[str, Any]
request_extras: Mapping[str, Any]
active_prompt: Any
active_prompt_group: Any
response_id: str | None = None
response_model: str | None = None
usage: "Usage | None" = None
finish_reason: str | None = None
output_content: str | None = None
error_category: str | None = None
error_message: str | None = None
error_type: str | None = None
caller_invocation_metadata: Mapping[str, AttributeValue] | None = None


# Spec: pipeline-utilities §6.3 cause chain (proposal 0068). A ``carrier``
# link is a graph-engine §4 ``node_exception`` wrapper the engine applies at a
# non-node placement (§9.7 instance / §11.7 branch / §9.6 / §11.6 parent-node
Expand Down Expand Up @@ -758,6 +824,7 @@ class FailureIsolatedEvent:
"InvocationStartedEvent",
"LlmCompletionEvent",
"LlmFailedEvent",
"LlmRetryAttemptEvent",
"MetadataAugmentationEvent",
"NodeEvent",
"ParallelBranchesEventConfig",
Expand Down
5 changes: 5 additions & 0 deletions src/openarmature/graph/observer.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
InvocationStartedEvent,
LlmCompletionEvent,
LlmFailedEvent,
LlmRetryAttemptEvent,
MetadataAugmentationEvent,
NodeEvent,
)
Expand All @@ -55,6 +56,9 @@
# typed LLM provider call event, dispatched on every successful LLM
# completion), LlmFailedEvent (proposal 0058 typed LLM failure event,
# dispatched alongside the §7 exception when provider.complete raises),
# LlmRetryAttemptEvent (proposal 0050 per-attempt LLM span event,
# python-internal, dispatched once per in-call attempt under call-level
# retry to drive the per-attempt OTel span surface),
# and FailureIsolatedEvent (proposal 0050 §6.3 framework-emitted event,
# dispatched by FailureIsolationMiddleware when it catches an exception
# escaping the inner chain and substitutes a degraded partial update).
Expand All @@ -65,6 +69,7 @@
| InvocationCompletedEvent
| LlmCompletionEvent
| LlmFailedEvent
| LlmRetryAttemptEvent
| FailureIsolatedEvent
)

Expand Down
Loading