Prepare v0.13.0 release: reconcile changelog, docs, and examples#146
Merged
Conversation
Three blocking + three should-fix items spec flagged on the pre-tag review. All narrative; no code behavior change. - 0047 CHANGELOG entry mis-attributed pieces 1+2 (Response.usage cache fields + OTel cache attributes) to v0.12.0. Verified via git: those landed in PRs #136 + #140 post-v0.12.0-tag, so all three pieces of 0047 ship in v0.13.0. Reframed. - conformance.toml [proposals."0047"] leading-comment block had the same v0.12.0 mis-attribution. Same correction; added PR references for traceability. - Unreleased section had two ### Added headings with the 0057 entry orphaned below ### Changed. Consolidated. - Spec pin advance text undercounted the cycle journey (said v0.51.0 → v0.53.0; actual is v0.46.0 → v0.53.0 across three hops). Reframed and listed absorbed proposals inline. - tool_call.arguments JSON encoding now uses sort_keys=True (functionally equivalent but byte-different for downstream snapshot consumers). Surfaced as its own ### Changed entry instead of buried in the 0047 ### Added. - conformance.toml [proposals."0049"] leading-comment block grew the fixture-deferral surface (057-068 + 069-073 parser- deferred pending harness directive schema catch-up; behavior pinned by unit tests) per spec OQ2.
Three docs still pushed the legacy sentinel-namespace pattern as the primary path for custom observers consuming LLM events and custom providers emitting them. After v0.13.0 the bundled provider emits typed LlmCompletionEvent / LlmFailedEvent variants directly; the bundled OTel + Langfuse observers consume via isinstance discrimination. Rewrites: - docs/concepts/observability.md: "Publishing LLM events for custom observers" → "Consuming LLM events in custom observers". Typed-event consumption shown as primary (isinstance branch on LlmCompletionEvent + LlmFailedEvent with the mutual-exclusion + field-set notes). Sentinel pattern demoted to a "Legacy sentinel-namespace pattern (compatibility surface)" subsection for downstream code interoperating with custom providers that haven't migrated. - docs/model-providers/authoring.md: custom-provider emission sketch rewritten — dispatch LlmCompletionEvent on success, LlmFailedEvent alongside the §7 exception on failure. Shows the current-attempt-index / current-fan-out-index / etc. scoping fields the typed events carry. Calls out the mutual-exclusion + exception-flow-preservation contracts. Legacy sentinel pattern retained as a compatibility-surface callout for older providers. - docs/agent/non-obvious-shapes.md: "filter openarmature.*- namespaced events" tip drops the openarmature.llm.complete example (v0.13.0 retired the sentinel pattern for LLM events); checkpoint sentinels stay since the tip is still applicable for those. Adjusted the follow-on paragraph mentioning LLM events. mkdocs strict build clean.
The non-obvious-shapes doc migration changed a generator source without regenerating the committed AGENTS.md. Bring it back in sync so the drift guard passes.
Add an LlmFailureTracker observer that consumes the typed LlmFailedEvent and rolls up per-invocation error-category counts, and extend LlmUsageAccumulator to track cached_tokens and report a cache-hit ratio. The persist node now reports both rollups and the OTel formatter surfaces the cache-read attribute. Also drop spec/proposal references and em dashes from the example's comments and walk-through, which carry no meaning for end users reading the code.
Example comments and docstrings quoted proposal numbers and spec section refs that have no meaning to end users reading the code. Reword them to describe only the implementation behavior.
There was a problem hiding this comment.
Pull request overview
Prepares the v0.13.0 release narrative by reconciling changelog + conformance notes and updating docs/examples to the typed LLM event model (LlmCompletionEvent / LlmFailedEvent) and the new cache-stat surface.
Changes:
- Reconciles release notes and conformance commentary for spec pinning, proposal attribution, and the
tool_call.argumentswire-byte change. - Updates observability docs and provider-authoring guidance to be typed-event-first, retaining the sentinel-namespace pattern as compatibility-only.
- Extends the production observability example to track cache-hit ratio and per-invocation LLM failure categories.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/openarmature/AGENTS.md | Regenerated agent docs to reflect typed-event-first LLM observability guidance and updated sentinel filtering notes. |
| examples/tool-use/main.py | Removes spec/proposal references and clarifies tool_call_id round-trip requirement. |
| examples/production-observability/main.py | Adds cache-hit rollup via usage.cached_tokens and a LlmFailedEvent-driven per-invocation failure-category tracker. |
| examples/chat-with-multimodal/main.py | Removes spec/proposal references from the example header commentary. |
| docs/model-providers/authoring.md | Updates provider authoring guidance to emit typed LLM events on success/failure and documents mutual-exclusion/exception-flow contracts. |
| docs/examples/production-observability.md | Updates walkthrough to match the expanded example output (cache-hit ratio + failure attribution) and new versions/spec pin. |
| docs/concepts/observability.md | Migrates custom-observer guidance to typed LLM events and demotes sentinel namespace to a legacy compatibility surface. |
| docs/agent/non-obvious-shapes.md | Updates observer-event filtering guidance to reflect typed LLM events rather than LLM sentinel NodeEvents. |
| conformance.toml | Updates conformance commentary for proposal attribution and documents the downstream-observable tool_call.arguments encoding byte change. |
| CHANGELOG.md | Rewrites v0.13.0 “Unreleased” entries for correctness (attribution, spec pin journey, and tool_call.arguments change) and consolidates headings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The examples smoke test only proves the demo loads and its build_graph() compiles. Cover the two queryable observers the production-observability example ships: cache-token accumulation and the derived cache-hit ratio, failure-category counting, mutual exclusion between the success and failure events, the per-invocation bucket cleanup, and the OTel cache-read attribute. The persist-output check drives the real persist node offline.
The legacy sentinel-namespace observer example accessed event.namespace / event.pre_state without narrowing to NodeEvent. A real observer receives the full ObserverEvent union, where variants like InvocationCompletedEvent have no namespace, so the snippet would raise AttributeError. Add an isinstance(event, NodeEvent) guard so the copy-paste example is correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Documentation and changelog prep for the v0.13.0 release. The typed-event migration code (proposals 0047, 0049, 0057, 0058) already landed in PRs 141 through 145; this PR reconciles the release narrative and brings the docs and examples that lagged behind into sync. No changes to the library's runtime code: this is changelog, conformance manifest, docs, examples, and the generated
AGENTS.md.1. Changelog and conformance corrections
Spec flagged three accuracy gaps in a pre-release review of the
[Unreleased]section. All three are fixed:Response.usagecache fields (PR 136) and the OTel cache attributes (PR 140) were originally credited to v0.12.0. Both actually landed this cycle, so the 0047 entry now records all three pieces (PRs 136, 140, 145) under v0.13.0.tool_call.argumentswire change. The switch tosort_keys=Truenow has its own### Changedentry, andconformance.tomlnotes the downstream-observable wire-byte shift for consumers that snapshot request bodies.The two
### Addedheadings in the[Unreleased]section are consolidated into one.2. Docs migration to typed-event-first
Three docs still led with the legacy sentinel-namespace pattern for LLM events. They now lead with the typed
LlmCompletionEvent/LlmFailedEventvariants and demote the sentinel pattern to a compatibility note:docs/concepts/observability.md: custom-observer consumption showsisinstancediscrimination as the primary path, and calls out the success/failure mutual-exclusion contract.docs/model-providers/authoring.md: the custom-provider emission sketch dispatches the typed events on each outcome path.docs/agent/non-obvious-shapes.md: the LLM sentinel is dropped from the sentinel-events list (checkpoint sentinels stay).AGENTS.mdis regenerated to match its sources.3. Examples
production-observabilityextended to exercise the v0.13.0 surface end to end: a newLlmFailureTrackerobserver consumesLlmFailedEventfor per-invocation error-category rollups,LlmUsageAccumulatorgains a cache-hit ratio from the newcached_tokensfield, and the OTel formatter surfaces the cache-read attribute. The walkthrough doc gains matching "reading the output" commentary.Test plan
uv run pytest: 1244 passed, 355 skipped.uv run ruff check+ruff format --check: clean.uv run pyright: clean.uv run mkdocs build: clean.production-observabilityexample was run live against a real provider: the new usage line (with the cache segment), the failures line, and theopenarmature.llm.cache_read.input_tokensspan attribute all render correctly.tests/test_production_observability_accumulators.py(9 tests) locks the example's queryable-observer logic deterministically, covering the paths a happy-path live run cannot reach: the cache-hit ratio math with non-zero cached tokens,cached=Nonetolerance, failure-category counting and ordering, mutual exclusion between the success and failure events, per-invocation bucket cleanup, the realpersist-node output, and the OTel cache-read attribute. Thepersistcheck drives the real node offline (an unknown invocation id makesdrain_events_forreturn an empty summary, so no live provider call is needed).