Skip to content

Commit 43a4ddc

Browse files
Prepare v0.13.0 release: reconcile changelog, docs, and examples (#146)
* Correct v0.13.0 release narrative per spec review Three blocking + three should-fix items spec flagged on the pre-tag review. All narrative; no code behavior change. - 0047 CHANGELOG entry mis-attributed pieces 1+2 (Response.usage cache fields + OTel cache attributes) to v0.12.0. Verified via git: those landed in PRs #136 + #140 post-v0.12.0-tag, so all three pieces of 0047 ship in v0.13.0. Reframed. - conformance.toml [proposals."0047"] leading-comment block had the same v0.12.0 mis-attribution. Same correction; added PR references for traceability. - Unreleased section had two ### Added headings with the 0057 entry orphaned below ### Changed. Consolidated. - Spec pin advance text undercounted the cycle journey (said v0.51.0 → v0.53.0; actual is v0.46.0 → v0.53.0 across three hops). Reframed and listed absorbed proposals inline. - tool_call.arguments JSON encoding now uses sort_keys=True (functionally equivalent but byte-different for downstream snapshot consumers). Surfaced as its own ### Changed entry instead of buried in the 0047 ### Added. - conformance.toml [proposals."0049"] leading-comment block grew the fixture-deferral surface (057-068 + 069-073 parser- deferred pending harness directive schema catch-up; behavior pinned by unit tests) per spec OQ2. * Migrate LLM-event docs to typed-event-first Three docs still pushed the legacy sentinel-namespace pattern as the primary path for custom observers consuming LLM events and custom providers emitting them. After v0.13.0 the bundled provider emits typed LlmCompletionEvent / LlmFailedEvent variants directly; the bundled OTel + Langfuse observers consume via isinstance discrimination. Rewrites: - docs/concepts/observability.md: "Publishing LLM events for custom observers" → "Consuming LLM events in custom observers". Typed-event consumption shown as primary (isinstance branch on LlmCompletionEvent + LlmFailedEvent with the mutual-exclusion + field-set notes). Sentinel pattern demoted to a "Legacy sentinel-namespace pattern (compatibility surface)" subsection for downstream code interoperating with custom providers that haven't migrated. - docs/model-providers/authoring.md: custom-provider emission sketch rewritten — dispatch LlmCompletionEvent on success, LlmFailedEvent alongside the §7 exception on failure. Shows the current-attempt-index / current-fan-out-index / etc. scoping fields the typed events carry. Calls out the mutual-exclusion + exception-flow-preservation contracts. Legacy sentinel pattern retained as a compatibility-surface callout for older providers. - docs/agent/non-obvious-shapes.md: "filter openarmature.*- namespaced events" tip drops the openarmature.llm.complete example (v0.13.0 retired the sentinel pattern for LLM events); checkpoint sentinels stay since the tip is still applicable for those. Adjusted the follow-on paragraph mentioning LLM events. mkdocs strict build clean. * Regenerate AGENTS.md for typed-event doc migration The non-obvious-shapes doc migration changed a generator source without regenerating the committed AGENTS.md. Bring it back in sync so the drift guard passes. * Extend production-observability example for v0.13.0 Add an LlmFailureTracker observer that consumes the typed LlmFailedEvent and rolls up per-invocation error-category counts, and extend LlmUsageAccumulator to track cached_tokens and report a cache-hit ratio. The persist node now reports both rollups and the OTel formatter surfaces the cache-read attribute. Also drop spec/proposal references and em dashes from the example's comments and walk-through, which carry no meaning for end users reading the code. * Drop spec and proposal references from examples Example comments and docstrings quoted proposal numbers and spec section refs that have no meaning to end users reading the code. Reword them to describe only the implementation behavior. * Add tests for production-observability observers The examples smoke test only proves the demo loads and its build_graph() compiles. Cover the two queryable observers the production-observability example ships: cache-token accumulation and the derived cache-hit ratio, failure-category counting, mutual exclusion between the success and failure events, the per-invocation bucket cleanup, and the OTel cache-read attribute. The persist-output check drives the real persist node offline. * Guard legacy LLM observer snippet with NodeEvent check The legacy sentinel-namespace observer example accessed event.namespace / event.pre_state without narrowing to NodeEvent. A real observer receives the full ObserverEvent union, where variants like InvocationCompletedEvent have no namespace, so the snippet would raise AttributeError. Add an isinstance(event, NodeEvent) guard so the copy-paste example is correct.
1 parent a6b6f26 commit 43a4ddc

11 files changed

Lines changed: 783 additions & 251 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,20 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11-
- **Implicit prefix-cache wire-byte stability** (proposal 0047, spec v0.39.0). The OpenAI Chat Completions wire body is now byte-stable across equivalent OA inputs — equivalent calls produce byte-identical request bodies regardless of dict insertion order at every user-supplied-dict boundary (tool definitions including the top-level `function` record + the `parameters` JSON Schema, `response_format.json_schema.schema`, `RuntimeConfig` extras, `tool_call.arguments` JSON encoding). A new `_canonicalize_dict_keys` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split between "object keys MUST be sorted" and "array order MUST be preserved per caller-supplied order"). A top-level belt-and-suspenders canonicalization pass over the assembled body catches anything the per-field passes miss. Combined with the existing `Response.usage.cached_tokens` / `cache_creation_tokens` fields sourced from `prompt_tokens_details` (v0.12.0) and the OTel observer's `openarmature.llm.cache_read.input_tokens` + `openarmature.llm.cache_creation.input_tokens` attributes (also v0.12.0), this closes proposal 0047 end-to-end. Prompt-management §13 *Cross-variable substring stability* is satisfied by the existing Jinja2 `StrictUndefined` render path; pinned by a new test. Scope is the Chat Completions endpoint only — the OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (the providers aren't implemented in python today).
11+
- **Implicit prefix-cache wire-byte stability** (proposal 0047, spec v0.39.0). Closes proposal 0047 end-to-end across three pieces all landing in v0.13.0: (1) `Response.usage.cached_tokens` / `cache_creation_tokens` fields sourced from the OpenAI `prompt_tokens_details` payload (PR #136); (2) the OTel observer emits `openarmature.llm.cache_read.input_tokens` and optional `openarmature.llm.cache_creation.input_tokens` when the corresponding usage field is populated (PR #140); (3) the OpenAI Chat Completions wire body is now byte-stable across equivalent OA inputs — equivalent calls produce byte-identical request bodies regardless of dict insertion order at every user-supplied-dict boundary (tool definitions including the top-level `function` record + the `parameters` JSON Schema, `response_format.json_schema.schema`, `RuntimeConfig` extras, `tool_call.arguments` JSON encoding) via a new `_canonicalize_dict_keys` helper that recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering, plus a top-level belt-and-suspenders canonicalization pass over the assembled body (PR #145). Prompt-management §13 *Cross-variable substring stability* is satisfied by the existing Jinja2 `StrictUndefined` render path; pinned by a new test. Scope is the Chat Completions endpoint only — the OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (the providers aren't implemented in python today).
1212
- **`LlmFailedEvent` typed event variant** (proposal 0058, spec v0.53.0). Carves LLM provider failures into a spec-normatively-typed event variant alongside `LlmCompletionEvent`. 17 mirrored identity / scoping / request-side fields + 3 failure-specific fields (`error_category` always-present from the llm-provider §7 normative category enumeration; optional `error_type` for vendor-specific detail or upstream exception class name; always-present `error_message`). `OpenAIProvider.complete()` emits the typed event alongside the §7 exception on both raise paths — adapter-caught provider exceptions AND pre-send validation raises. Caller-side exception flow unchanged; the exception still raises out of `complete()`. Mutually exclusive with `LlmCompletionEvent` on the same call. Both bundled observers (OTel + Langfuse) consume `LlmFailedEvent` directly: same `openarmature.llm.complete` span / Generation shape as the success path with ERROR status / level + `openarmature.error.category` attribute (OTel) / `error_category` as statusMessage (Langfuse), `start_time` back-dated by `latency_ms` so the failure duration reflects the time-to-raise.
13+
- **`LlmCompletionEvent` extended with proposal 0057 request-side fields** (spec v0.51.0). The typed event now carries `input_messages`, `output_content`, `request_params`, `request_extras`, `active_prompt`, `active_prompt_group`, `call_id`, and `response_model` alongside the existing v0.49.0 fields. `request_id` renamed to `response_id` per the proposal's response-side naming. Inline image bytes in `input_messages` stay redacted per observability §5.5.5 — the OpenAI provider reuses the existing message-serialization helper for the projection. Observer-side privacy gates (OTel `disable_llm_payload`, Langfuse equivalents) apply at rendering, symmetric with the §5.5.1 span attribute path.
1314

1415
### Changed
1516

1617
- **Sentinel-namespace `NodeEvent` emission for LLM events retired entirely from `OpenAIProvider`** (proposal 0058 cleanup). The provider no longer dispatches the `("openarmature.llm.complete",)`-namespaced `NodeEvent`s on either outcome path; both success and failure flow through their respective typed variants exclusively. The `_make_llm_event` helper is removed. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` for success and `isinstance(event, LlmFailedEvent)` for failure to keep receiving LLM-call notifications. `LlmEventPayload` and `LLM_NAMESPACE` remain in `openarmature.observability.llm_event` as a documented compatibility surface for custom providers that haven't migrated; neither is referenced by the bundled provider or observers anymore.
17-
- **Pinned spec advances from v0.51.0 to v0.53.0** (absorbs proposals 0023 + 0058). Proposal 0023 (canonical state reducers) ships in spec v0.52.0 but is not implemented this cycle — `conformance.toml` marks 0023 as `not-yet`; fixtures 034–038 stay parser-deferred.
18+
- **Pinned spec advances from v0.46.0 to v0.53.0** across the v0.13.0 cycle. Absorbs four implemented proposals (0047 — implicit prefix-cache wire-byte stability; 0049 — typed `LlmCompletionEvent`; 0057 — `LlmCompletionEvent` request-side field-set extension; 0058 — typed `LlmFailedEvent`) plus 0023 (canonical state reducers, v0.52.0) carried as `not-yet` in the manifest. Pin journey: v0.46.0 → v0.51.0 (PR #141 absorbs 0057) → v0.53.0 (PR #144 absorbs 0058; spec v0.52.0's 0023 entry rides along as `not-yet`). Fixtures 034–038 (0023) stay parser-deferred.
19+
- **`tool_call.arguments` JSON encoding now uses `sort_keys=True`** (proposal 0047 §8 byte-stability requirement for caller-supplied dicts JSON-encoded into a string field). Functionally equivalent — the encoded string parses to the same dict — but byte-different from the previous insertion-order encoding. Downstream consumers that snapshot wire bodies (golden-file tests, audit logging, recorded fixtures) will see byte-different `tool_calls[].function.arguments` strings across this upgrade for any call whose argument dict was emitted in non-sorted insertion order before.
1820
- **OTel and Langfuse observers drive the `openarmature.llm.complete` span / Generation observation lifecycle from the typed `LlmCompletionEvent`** (proposal 0049 + 0057, observability §5.5.7). Successful LLM-provider calls now open + close the OTel span and the Langfuse Generation in one shot at typed-event arrival, with `start_time` back-dated by `LlmCompletionEvent.latency_ms` so duration reflects the adapter-boundary measurement rather than dispatcher queue delay. The §5.5 attribute set and §8.4 Generation metadata are unchanged. (Failure paths land on `LlmFailedEvent` later in the same cycle — see the proposal 0058 entry above.)
1921
- **`OpenAIProvider.complete()` no longer emits the sentinel `NodeEvent` pair on the success path** (v0.13.0 cleanup). The bundled OTel and Langfuse observers now consume the typed `LlmCompletionEvent` directly; the sentinel pair was kept on the success path through earlier releases for compatibility with pre-typed-event observers. External custom observers that filtered LLM calls by `event.namespace == LLM_NAMESPACE` MUST migrate to `isinstance(event, LlmCompletionEvent)` to continue seeing successful LLM calls. (The failure-path sentinel emission is retired entirely later in the same cycle — see the proposal 0058 entry above.)
2022
- **`LangfuseClient` Protocol gains optional `start_time` / `end_time` timestamps** on `generation(...)` and the Generation/Span handles' `end(...)`. The Langfuse observer passes back-dated timestamps on the typed-event success path so the Langfuse UI shows the actual adapter-boundary duration. The SDK adapter handles v4 Langfuse SDK quirks transparently: `Langfuse.start_observation()` does NOT accept `start_time`, so back-dated generations are routed through the private `_otel_tracer.start_span(name=..., start_time=int_ns)` API (mirroring the SDK's own `create_event` precedent) and the resulting OTel span is wrapped in `LangfuseGeneration` directly; the non-back-dated path still uses `start_observation`. `LangfuseSpan.end()` is typed `Optional[int]` (nanoseconds), so the adapter converts the Protocol's `datetime` surface to int nanoseconds before forwarding. The `InMemoryLangfuseClient` stores both fields verbatim on `LangfuseObservation` for test assertions.
2123
- **`OpenAIProvider(populate_caller_metadata=...)` default flipped from `False` to `True`.** The python implementation now populates `LlmCompletionEvent.caller_invocation_metadata` by default so the bundled OTel and Langfuse observers can emit the §5.6 `openarmature.user.<key>` span-attribute family without a separate opt-in. Pass `populate_caller_metadata=False` to suppress the snapshot when no downstream consumer needs it. The spec-defined opt-in mechanism is unchanged; only the python default flips.
2224

23-
### Added
24-
25-
- **`LlmCompletionEvent` extended with proposal 0057 request-side fields** (spec v0.51.0). The typed event now carries `input_messages`, `output_content`, `request_params`, `request_extras`, `active_prompt`, `active_prompt_group`, `call_id`, and `response_model` alongside the existing v0.49.0 fields. `request_id` renamed to `response_id` per the proposal's response-side naming. Inline image bytes in `input_messages` stay redacted per observability §5.5.5 — the OpenAI provider reuses the existing message-serialization helper for the projection. Observer-side privacy gates (OTel `disable_llm_payload`, Langfuse equivalents) apply at rendering, symmetric with the §5.5.1 span attribute path.
26-
2725
## [0.12.0] — 2026-06-05
2826

2927
Observability release. The pinned spec advances from v0.38.0 to v0.46.0, absorbing eight accepted proposals (0047-0054). Three ship as fully implemented this cycle: proposal 0048 grows a read-symmetric `get_invocation_metadata()` API + a §9 *Queryable observer pattern* concept doc section; proposal 0052 puts `openarmature.implementation.name` + `.version` attribution attributes on every OTel invocation span + every Langfuse Trace; proposal 0054 ships `CompiledGraph.drain_events_for(invocation_id, *, timeout)` as the architectural pair to 0048's §9.4 accumulator lifecycle. Two ship as textual-only acks (0051 Langfuse trace I/O caveat; 0053 §3.4 shared-parent boundary clarification). One Fixed: the retry middleware now resets the invocation-metadata ContextVar between attempts per §3.4. The production-observability example grows the queryable accumulator + drain_events_for pattern end-to-end so the new APIs have a runnable demo.

conformance.toml

Lines changed: 42 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -266,33 +266,38 @@ status = "implemented"
266266
since = "0.11.0"
267267

268268
# Spec v0.39.0 (proposal 0047). Implicit prefix-cache wire-byte
269-
# stability. Cross-capability proposal landed in v0.13.0 across
270-
# three pieces: (1) ``Response.usage`` cache-stat fields
271-
# (``cached_tokens`` / ``cache_creation_tokens``) sourced from the
272-
# OpenAI ``prompt_tokens_details`` payload, with conditional emission
269+
# stability. Cross-capability proposal landed end-to-end in the
270+
# v0.13.0 cycle across three pieces, all post-v0.12.0:
271+
# (1) ``Response.usage`` cache-stat fields (``cached_tokens`` /
272+
# ``cache_creation_tokens``) sourced from the OpenAI
273+
# ``prompt_tokens_details`` payload, with conditional emission
273274
# preserved (absent-vs-zero distinction stays observable) — landed
274-
# in the v0.12.0 cycle as the proposal's payload-side prerequisite;
275+
# in PR #136 as the proposal's payload-side prerequisite;
275276
# (2) OTel observer emits ``openarmature.llm.cache_read.input_tokens``
276277
# (and optional ``openarmature.llm.cache_creation.input_tokens``)
277-
# when the corresponding usage field is populated — also v0.12.0;
278-
# (3) §8.1 intra-impl wire-byte canonicalization in the OpenAI
279-
# adapter — landed here. The canonicalizer recursively sorts dict
280-
# keys at every nesting level while preserving caller-supplied
281-
# array order, applied at the four user-input boundaries
278+
# when the corresponding usage field is populated — landed in
279+
# PR #140; (3) §8.1 intra-impl wire-byte canonicalization in the
280+
# OpenAI adapter — landed in PR #145. The canonicalizer recursively
281+
# sorts dict keys at every nesting level while preserving caller-
282+
# supplied array order, applied at the four user-input boundaries
282283
# (``tool.parameters`` / ``tool.function`` record top-level per
283284
# spec Q5, ``response_format.json_schema.schema``, ``RuntimeConfig``
284285
# extras, ``tool_call.arguments`` JSON encoding) plus a top-level
285-
# belt-and-suspenders pass over the assembled request body. Scope
286-
# is the Chat Completions endpoint only; the OpenAI Responses API
287-
# endpoint is deferred to a future cycle (no python consumer
288-
# today). Prompt-management §13 cross-variable substring stability
289-
# is satisfied by the existing Jinja2 ``StrictUndefined`` render
290-
# path; pinned by ``tests/unit/test_prompts.py::
286+
# belt-and-suspenders pass over the assembled request body.
287+
# Downstream-observable wire-byte shift on
288+
# ``tool_call.arguments``: the encoded string now uses
289+
# ``sort_keys=True`` (functionally equivalent — parses to the same
290+
# dict — but byte-different for golden-file / audit-snapshot
291+
# consumers). Scope is the Chat Completions endpoint only; the
292+
# OpenAI Responses API endpoint is deferred to a future cycle (no
293+
# python consumer today). Prompt-management §13 cross-variable
294+
# substring stability is satisfied by the existing Jinja2
295+
# ``StrictUndefined`` render path; pinned by
296+
# ``tests/unit/test_prompts.py::
291297
# test_cross_variable_substring_stability_text_prompt`` and
292298
# ``test_cross_variable_substring_stability_chat_prompt``.
293-
# Anthropic / Gemini
294-
# wire-byte conformance fixtures stay deferred — neither provider
295-
# is implemented in python today.
299+
# Anthropic / Gemini wire-byte conformance fixtures stay deferred
300+
# — neither provider is implemented in python today.
296301
[proposals."0047"]
297302
status = "implemented"
298303
since = "0.13.0"
@@ -344,16 +349,24 @@ status = "implemented"
344349
since = "0.12.0"
345350

346351
# Spec v0.41.0 (proposal 0049). Typed LLM Completion Event — first
347-
# typed event variant on the observer event union. Shipped in
348-
# v0.13.0: provider dual-emits the typed event alongside the sentinel
349-
# NodeEvent pair (success-only per spec scope); LlmCompletionEvent
350-
# carries identity/scoping/outcome fields per the spec field table.
351-
# Conformance fixtures 050-056 activated by the typed_event_collector
352-
# harness directive. The OTel + Langfuse observers continue to drive
353-
# their §5.5 / §8.4.4 surface off the sentinel NodeEvent pair during
354-
# the dual-emit transition window; type-discrimination migration
355-
# lands once the follow-on request-side-fields extension (proposal
356-
# 0057) ships.
352+
# typed event variant on the observer event union. Shipped fully in
353+
# v0.13.0 across PRs #141 (typed-event definition + provider
354+
# emission + 0057 field-set extension), #142 (OTel observer migration
355+
# to type discrimination), #143 (Langfuse observer migration +
356+
# success-side sentinel emission dropped), and #144 (0058 typed
357+
# LlmFailedEvent + sentinel-namespace NodeEvent emission for LLM
358+
# events retired entirely from the bundled OpenAIProvider).
359+
# LlmCompletionEvent carries identity/scoping/outcome fields per
360+
# the spec field table. Both bundled observers (OTel + Langfuse)
361+
# consume the typed events via isinstance discrimination on both
362+
# outcome paths. Conformance fixtures 050-056 activated by the
363+
# typed_event_collector harness directive. Fixtures 057-068
364+
# (proposal 0057 request-side fields) and 069-073 (proposal 0058
365+
# typed failure event) stay parser-deferred pending the harness's
366+
# typed_event_collector directive schema catch-up + the event_counts
367+
# list directive introduced by fixture 071; behavior pinned by
368+
# unit tests in tests/unit/test_llm_provider.py +
369+
# test_observability_otel.py + test_observability_langfuse.py.
357370
[proposals."0049"]
358371
status = "implemented"
359372
since = "0.13.0"

docs/agent/non-obvious-shapes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ Catching `Exception` works but is too broad; catching one hierarchy misses the o
127127

128128
### Filter `openarmature.*`-namespaced events when your observer only cares about user nodes
129129

130-
OA emits observer events under sentinel node-names for its own internal dispatch: `openarmature.llm.complete` for LLM provider calls (proposal 0024), `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014), `openarmature.checkpoint.save` for checkpoint saves (proposal 0010). These events let the OTel / Langfuse observers emit LLM-provider spans, checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise:
130+
OA emits observer events under sentinel node-names for some internal dispatch: `openarmature.checkpoint.migrate` for state-migration runs (proposal 0014) and `openarmature.checkpoint.save` for checkpoint saves (proposal 0010) ride on `NodeEvent` with a sentinel namespace. (LLM provider calls used to follow the same pattern but moved to typed `LlmCompletionEvent` / `LlmFailedEvent` variants in v0.13.0 per proposals 0049 + 0058 — those are filtered by `isinstance` instead.) The sentinel-namespace events let the OTel / Langfuse observers emit checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise:
131131

132132
```python
133133
async def __call__(self, event: NodeEvent) -> None:
@@ -137,7 +137,7 @@ async def __call__(self, event: NodeEvent) -> None:
137137
# … user-node handling
138138
```
139139

140-
`event.namespace[0]` is the safest discriminator (the leaf `event.node_name` would also work for LLM events but won't match the checkpoint sentinels since those repurpose `node_name` differently). Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract.
140+
`event.namespace[0]` is the safest discriminator. Don't try to filter on `current_invocation_id() is None`: OA-internal events are dispatched within the same invocation context as user-node events, so `invocation_id` is set for both; the namespace-prefix check is the stable contract.
141141

142142
### Fan-out subgraphs that emit `list[X]` per instance produce `list[list[X]]` at `target_field`
143143

0 commit comments

Comments
 (0)