Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,20 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Implicit prefix-cache wire-byte stability** (proposal 0047, spec v0.39.0). The OpenAI Chat Completions wire body is now byte-stable across equivalent OA inputs — equivalent calls produce byte-identical request bodies regardless of dict insertion order at every user-supplied-dict boundary (tool definitions including the top-level `function` record + the `parameters` JSON Schema, `response_format.json_schema.schema`, `RuntimeConfig` extras, `tool_call.arguments` JSON encoding). A new `_canonicalize_dict_keys` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split between "object keys MUST be sorted" and "array order MUST be preserved per caller-supplied order"). A top-level belt-and-suspenders canonicalization pass over the assembled body catches anything the per-field passes miss. Combined with the existing `Response.usage.cached_tokens` / `cache_creation_tokens` fields sourced from `prompt_tokens_details` (v0.12.0) and the OTel observer's `openarmature.llm.cache_read.input_tokens` + `openarmature.llm.cache_creation.input_tokens` attributes (also v0.12.0), this closes proposal 0047 end-to-end. Prompt-management §13 *Cross-variable substring stability* is satisfied by the existing Jinja2 `StrictUndefined` render path; pinned by a new test. Scope is the Chat Completions endpoint only — the OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (the providers aren't implemented in python today).
- **Implicit prefix-cache wire-byte stability** (proposal 0047, spec v0.39.0). Closes proposal 0047 end-to-end across three pieces all landing in v0.13.0: (1) `Response.usage.cached_tokens` / `cache_creation_tokens` fields sourced from the OpenAI `prompt_tokens_details` payload (PR #136); (2) the OTel observer emits `openarmature.llm.cache_read.input_tokens` and optional `openarmature.llm.cache_creation.input_tokens` when the corresponding usage field is populated (PR #140); (3) the OpenAI Chat Completions wire body is now byte-stable across equivalent OA inputs — equivalent calls produce byte-identical request bodies regardless of dict insertion order at every user-supplied-dict boundary (tool definitions including the top-level `function` record + the `parameters` JSON Schema, `response_format.json_schema.schema`, `RuntimeConfig` extras, `tool_call.arguments` JSON encoding) via a new `_canonicalize_dict_keys` helper that recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering, plus a top-level belt-and-suspenders canonicalization pass over the assembled body (PR #145). Prompt-management §13 *Cross-variable substring stability* is satisfied by the existing Jinja2 `StrictUndefined` render path; pinned by a new test. Scope is the Chat Completions endpoint only — the OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (the providers aren't implemented in python today).
- **`LlmFailedEvent` typed event variant** (proposal 0058, spec v0.53.0). Carves LLM provider failures into a spec-normatively-typed event variant alongside `LlmCompletionEvent`. 17 mirrored identity / scoping / request-side fields + 3 failure-specific fields (`error_category` always-present from the llm-provider §7 normative category enumeration; optional `error_type` for vendor-specific detail or upstream exception class name; always-present `error_message`). `OpenAIProvider.complete()` emits the typed event alongside the §7 exception on both raise paths — adapter-caught provider exceptions AND pre-send validation raises. Caller-side exception flow unchanged; the exception still raises out of `complete()`. Mutually exclusive with `LlmCompletionEvent` on the same call. Both bundled observers (OTel + Langfuse) consume `LlmFailedEvent` directly: same `openarmature.llm.complete` span / Generation shape as the success path with ERROR status / level + `openarmature.error.category` attribute (OTel) / `error_category` as statusMessage (Langfuse), `start_time` back-dated by `latency_ms` so the failure duration reflects the time-to-raise.
- **`LlmCompletionEvent` extended with proposal 0057 request-side fields** (spec v0.51.0). The typed event now carries `input_messages`, `output_content`, `request_params`, `request_extras`, `active_prompt`, `active_prompt_group`, `call_id`, and `response_model` alongside the existing v0.49.0 fields. `request_id` renamed to `response_id` per the proposal's response-side naming. Inline image bytes in `input_messages` stay redacted per observability §5.5.5 — the OpenAI provider reuses the existing message-serialization helper for the projection. Observer-side privacy gates (OTel `disable_llm_payload`, Langfuse equivalents) apply at rendering, symmetric with the §5.5.1 span attribute path.

### Changed

- **Sentinel-namespace `NodeEvent` emission for LLM events retired entirely from `OpenAIProvider`** (proposal 0058 cleanup). The provider no longer dispatches the `("openarmature.llm.complete",)`-namespaced `NodeEvent`s on either outcome path; both success and failure flow through their respective typed variants exclusively. The `_make_llm_event` helper is removed. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` for success and `isinstance(event, LlmFailedEvent)` for failure to keep receiving LLM-call notifications. `LlmEventPayload` and `LLM_NAMESPACE` remain in `openarmature.observability.llm_event` as a documented compatibility surface for custom providers that haven't migrated; neither is referenced by the bundled provider or observers anymore.
- **Pinned spec advances from v0.51.0 to v0.53.0** (absorbs proposals 0023 + 0058). Proposal 0023 (canonical state reducers) ships in spec v0.52.0 but is not implemented this cycle — `conformance.toml` marks 0023 as `not-yet`; fixtures 034–038 stay parser-deferred.
- **Pinned spec advances from v0.46.0 to v0.53.0** across the v0.13.0 cycle. Absorbs four implemented proposals (0047 — implicit prefix-cache wire-byte stability; 0049 — typed `LlmCompletionEvent`; 0057 — `LlmCompletionEvent` request-side field-set extension; 0058 — typed `LlmFailedEvent`) plus 0023 (canonical state reducers, v0.52.0) carried as `not-yet` in the manifest. Pin journey: v0.46.0 → v0.51.0 (PR #141 absorbs 0057) → v0.53.0 (PR #144 absorbs 0058; spec v0.52.0's 0023 entry rides along as `not-yet`). Fixtures 034–038 (0023) stay parser-deferred.
- **`tool_call.arguments` JSON encoding now uses `sort_keys=True`** (proposal 0047 §8 byte-stability requirement for caller-supplied dicts JSON-encoded into a string field). Functionally equivalent — the encoded string parses to the same dict — but byte-different from the previous insertion-order encoding. Downstream consumers that snapshot wire bodies (golden-file tests, audit logging, recorded fixtures) will see byte-different `tool_calls[].function.arguments` strings across this upgrade for any call whose argument dict was emitted in non-sorted insertion order before.
- **OTel and Langfuse observers drive the `openarmature.llm.complete` span / Generation observation lifecycle from the typed `LlmCompletionEvent`** (proposal 0049 + 0057, observability §5.5.7). Successful LLM-provider calls now open + close the OTel span and the Langfuse Generation in one shot at typed-event arrival, with `start_time` back-dated by `LlmCompletionEvent.latency_ms` so duration reflects the adapter-boundary measurement rather than dispatcher queue delay. The §5.5 attribute set and §8.4 Generation metadata are unchanged. (Failure paths land on `LlmFailedEvent` later in the same cycle — see the proposal 0058 entry above.)
- **`OpenAIProvider.complete()` no longer emits the sentinel `NodeEvent` pair on the success path** (v0.13.0 cleanup). The bundled OTel and Langfuse observers now consume the typed `LlmCompletionEvent` directly; the sentinel pair was kept on the success path through earlier releases for compatibility with pre-typed-event observers. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` to continue seeing successful LLM calls. (The failure-path sentinel emission is retired entirely later in the same cycle — see the proposal 0058 entry above.)
- **`LangfuseClient` Protocol gains optional `start_time` / `end_time` timestamps** on `generation(...)` and the Generation/Span handles' `end(...)`. The Langfuse observer passes back-dated timestamps on the typed-event success path so the Langfuse UI shows the actual adapter-boundary duration. The SDK adapter handles v4 Langfuse SDK quirks transparently: `Langfuse.start_observation()` does NOT accept `start_time`, so back-dated generations are routed through the private `_otel_tracer.start_span(name=..., start_time=int_ns)` API (mirroring the SDK's own `create_event` precedent) and the resulting OTel span is wrapped in `LangfuseGeneration` directly; the non-back-dated path still uses `start_observation`. `LangfuseSpan.end()` is typed `Optional[int]` (nanoseconds), so the adapter converts the Protocol's `datetime` surface to int nanoseconds before forwarding. The `InMemoryLangfuseClient` stores both fields verbatim on `LangfuseObservation` for test assertions.
- **`OpenAIProvider(populate_caller_metadata=...)` default flipped from `False` to `True`.** The python implementation now populates `LlmCompletionEvent.caller_invocation_metadata` by default so the bundled OTel and Langfuse observers can emit the §5.6 `openarmature.user.<key>` span-attribute family without a separate opt-in. Pass `populate_caller_metadata=False` to suppress the snapshot when no downstream consumer needs it. The spec-defined opt-in mechanism is unchanged; only the python default flips.

### Added

- **`LlmCompletionEvent` extended with proposal 0057 request-side fields** (spec v0.51.0). The typed event now carries `input_messages`, `output_content`, `request_params`, `request_extras`, `active_prompt`, `active_prompt_group`, `call_id`, and `response_model` alongside the existing v0.49.0 fields. `request_id` renamed to `response_id` per the proposal's response-side naming. Inline image bytes in `input_messages` stay redacted per observability §5.5.5 — the OpenAI provider reuses the existing message-serialization helper for the projection. Observer-side privacy gates (OTel `disable_llm_payload`, Langfuse equivalents) apply at rendering, symmetric with the §5.5.1 span attribute path.

## [0.12.0] — 2026-06-05

Observability release. The pinned spec advances from v0.38.0 to v0.46.0, absorbing eight accepted proposals (0047-0054). Three ship as fully implemented this cycle: proposal 0048 grows a read-symmetric `get_invocation_metadata()` API + a §9 *Queryable observer pattern* concept doc section; proposal 0052 puts `openarmature.implementation.name` + `.version` attribution attributes on every OTel invocation span + every Langfuse Trace; proposal 0054 ships `CompiledGraph.drain_events_for(invocation_id, *, timeout)` as the architectural pair to 0048's §9.4 accumulator lifecycle. Two ship as textual-only acks (0051 Langfuse trace I/O caveat; 0053 §3.4 shared-parent boundary clarification). One Fixed: the retry middleware now resets the invocation-metadata ContextVar between attempts per §3.4. The production-observability example grows the queryable accumulator + drain_events_for pattern end-to-end so the new APIs have a runnable demo.
Expand Down
71 changes: 42 additions & 29 deletions conformance.toml
Original file line number Diff line number Diff line change
Expand Up @@ -266,33 +266,38 @@ status = "implemented"
since = "0.11.0"

# Spec v0.39.0 (proposal 0047). Implicit prefix-cache wire-byte
# stability. Cross-capability proposal landed in v0.13.0 across
# three pieces: (1) ``Response.usage`` cache-stat fields
# (``cached_tokens`` / ``cache_creation_tokens``) sourced from the
# OpenAI ``prompt_tokens_details`` payload, with conditional emission
# stability. Cross-capability proposal landed end-to-end in the
# v0.13.0 cycle across three pieces, all post-v0.12.0:
# (1) ``Response.usage`` cache-stat fields (``cached_tokens`` /
# ``cache_creation_tokens``) sourced from the OpenAI
# ``prompt_tokens_details`` payload, with conditional emission
# preserved (absent-vs-zero distinction stays observable) — landed
# in the v0.12.0 cycle as the proposal's payload-side prerequisite;
# in PR #136 as the proposal's payload-side prerequisite;
# (2) OTel observer emits ``openarmature.llm.cache_read.input_tokens``
# (and optional ``openarmature.llm.cache_creation.input_tokens``)
# when the corresponding usage field is populated — also v0.12.0;
# (3) §8.1 intra-impl wire-byte canonicalization in the OpenAI
# adapter — landed here. The canonicalizer recursively sorts dict
# keys at every nesting level while preserving caller-supplied
# array order, applied at the four user-input boundaries
# when the corresponding usage field is populated — landed in
# PR #140; (3) §8.1 intra-impl wire-byte canonicalization in the
# OpenAI adapter — landed in PR #145. The canonicalizer recursively
# sorts dict keys at every nesting level while preserving caller-
# supplied array order, applied at the four user-input boundaries
# (``tool.parameters`` / ``tool.function`` record top-level per
# spec Q5, ``response_format.json_schema.schema``, ``RuntimeConfig``
# extras, ``tool_call.arguments`` JSON encoding) plus a top-level
# belt-and-suspenders pass over the assembled request body. Scope
# is the Chat Completions endpoint only; the OpenAI Responses API
# endpoint is deferred to a future cycle (no python consumer
# today). Prompt-management §13 cross-variable substring stability
# is satisfied by the existing Jinja2 ``StrictUndefined`` render
# path; pinned by ``tests/unit/test_prompts.py::
# belt-and-suspenders pass over the assembled request body.
# Downstream-observable wire-byte shift on
# ``tool_call.arguments``: the encoded string now uses
# ``sort_keys=True`` (functionally equivalent — parses to the same
# dict — but byte-different for golden-file / audit-snapshot
# consumers). Scope is the Chat Completions endpoint only; the
# OpenAI Responses API endpoint is deferred to a future cycle (no
# python consumer today). Prompt-management §13 cross-variable
# substring stability is satisfied by the existing Jinja2
# ``StrictUndefined`` render path; pinned by
# ``tests/unit/test_prompts.py::
# test_cross_variable_substring_stability_text_prompt`` and
# ``test_cross_variable_substring_stability_chat_prompt``.
# Anthropic / Gemini
# wire-byte conformance fixtures stay deferred — neither provider
# is implemented in python today.
# Anthropic / Gemini wire-byte conformance fixtures stay deferred
# — neither provider is implemented in python today.
[proposals."0047"]
status = "implemented"
since = "0.13.0"
Expand Down Expand Up @@ -344,16 +349,24 @@ status = "implemented"
since = "0.12.0"

# Spec v0.41.0 (proposal 0049). Typed LLM Completion Event — first
# typed event variant on the observer event union. Shipped in
# v0.13.0: provider dual-emits the typed event alongside the sentinel
# NodeEvent pair (success-only per spec scope); LlmCompletionEvent
# carries identity/scoping/outcome fields per the spec field table.
# Conformance fixtures 050-056 activated by the typed_event_collector
# harness directive. The OTel + Langfuse observers continue to drive
# their §5.5 / §8.4.4 surface off the sentinel NodeEvent pair during
# the dual-emit transition window; type-discrimination migration
# lands once the follow-on request-side-fields extension (proposal
# 0057) ships.
# typed event variant on the observer event union. Shipped fully in
# v0.13.0 across PRs #141 (typed-event definition + provider
# emission + 0057 field-set extension), #142 (OTel observer migration
# to type discrimination), #143 (Langfuse observer migration +
# success-side sentinel emission dropped), and #144 (0058 typed
# LlmFailedEvent + sentinel-namespace NodeEvent emission for LLM
# events retired entirely from the bundled OpenAIProvider).
# LlmCompletionEvent carries identity/scoping/outcome fields per
# the spec field table. Both bundled observers (OTel + Langfuse)
# consume the typed events via isinstance discrimination on both
# outcome paths. Conformance fixtures 050-056 activated by the
# typed_event_collector harness directive. Fixtures 057-068
# (proposal 0057 request-side fields) and 069-073 (proposal 0058
# typed failure event) stay parser-deferred pending the harness's
# typed_event_collector directive schema catch-up + the event_counts
# list directive introduced by fixture 071; behavior pinned by
# unit tests in tests/unit/test_llm_provider.py +
# test_observability_otel.py + test_observability_langfuse.py.
[proposals."0049"]
status = "implemented"
since = "0.13.0"
Expand Down
4 changes: 2 additions & 2 deletions docs/agent/non-obvious-shapes.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Catching `Exception` works but is too broad; catching one hierarchy misses the o

### Filter `openarmature.*`-namespaced events when your observer only cares about user nodes

OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise:
OA emits observer events under sentinel node-names for some internal dispatch: `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014) and `openarmature.checkpoint.save` for checkpoint saves (proposal 0010) ride on `NodeEvent` with a sentinel namespace. (LLM provider calls used to follow the same pattern but moved to typed `LlmCompletionEvent` / `LlmFailedEvent` variants in v0.13.0 per proposals 0049 + 0058 — those are filtered by `isinstance` instead.) The sentinel-namespace events let the OTel / Langfuse observers emit checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise:

```python
async def __call__(self, event: NodeEvent) -> None:
Expand All @@ -137,7 +137,7 @@ async def __call__(self, event: NodeEvent) -> None:
# … user-node handling
```

`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract.
`event.namespace[0]` is the safest discriminator. Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract.

### Fan-out subgraphs that emit `list[X]` per instance produce `list[list[X]]` at `target_field`

Expand Down
Loading