Skip to content

Commit e0c738d

Browse files
Extend production-observability example with accumulator pattern (#133)
* Extend production-observability with accumulator pattern Two pre-release polish items for the v0.12.0 cycle. CHANGELOG: the Unreleased Changed entry for the spec-pin advance originally said proposal 0052 "lands in a follow-on PR of this cycle" and 0054 "lands in a follow-on PR". Both have since landed (PRs 131 + 132). Rewrites the bullet to factually describe the final state: three proposals (0048, 0052, 0054) ship as fully implemented, two (0051, 0053) ship as textual-only. production-observability example: adds an LlmUsageAccumulator class plus a terminal persist node that demonstrates the queryable observer + drain_events_for pattern end-to-end. The accumulator subscribes to the LLM-namespace event stream, accumulates per- invocation token totals via current_invocation_id() bucket keys, and exposes convention-only get_bucket / drop methods. The persist node calls drain_events_for to synchronize on the deliver loop before reading the bucket so the rollup reflects every LLM call in the invocation, drops the bucket per the explicit-cleanup discipline, and prints a cost summary. The graph grows from respond -> END to respond -> persist -> END. Module-level singletons (_accumulator + _compiled_graph) keep the persist node closure-free and follow the existing _provider_instance precedent. Walkthrough doc updates the H1, overview, what-it-teaches list, captured-output sample, and reading-the-output walkthrough to cover the new pattern. * Surface invocation-span attribution attrs in OTel formatter The example's _format_otel_spans excluded the root openarmature.invocation span from its captured-output listing because two issues kept it from landing in the in-memory exporter and from showing usefully even when it did: 1. The OTel observer's shutdown() was never called, so the root invocation span stayed open and never moved into the exporter's finished-spans list. Adds otel_observer.shutdown() to the finally block after drain(), mirroring the pattern in the OTel unit tests. 2. The formatter's curated key set didn't include the invocation-level attributes the new span carries (openarmature.graph.entry_node, .spec_version, openarmature.implementation.name + .version). The formatter now picks the right key set based on span name: the invocation span surfaces its four invocation-level attrs only, inner-node spans surface the per-node + cross-cutting user.* + GenAI semconv attrs. Skipping cross-cutting attrs on the invocation line avoids repeating data that appears three more times below. Net visible change: the captured-OTel-spans block now opens with a [openarmature.invocation] line carrying implementation_name='openarmature-python' + implementation_version + spec_version + entry_node. Operators filtering traces by library version in Phoenix / Datadog / Honeycomb / Tempo / HyperDX read these directly from the root invocation span. Walkthrough doc's reading-the-output bullet now distinguishes the three OTel attribute families (invocation-level 5.1, cross-cutting 5.6, GenAI semconv) and explains why the invocation span only closes on observer shutdown(). * Tighten accumulator example per PR review Eight PR review threads, addressing four distinct issues. state.invocation_id -> current_invocation_id() in the example module docstring and walkthrough doc. The runnable persist() uses current_invocation_id() because State has no invocation_id field by default; the docstring snippets had drifted to the wrong shape. assert -> RuntimeError in persist(). The three runtime preconditions (_compiled_graph not None, _accumulator not None, current_invocation_id() not None) now raise explicit RuntimeError so the failure mode stays informative under python -O, which strips asserts and would otherwise produce silent None dereferences. InvocationCompletedEvent backstop cleanup in the accumulator. persist()'s drop is the fast path; if drain_events_for times out and the deliver loop later processes late-arriving LLM events, setdefault() would recreate a bucket that nothing ever cleans up. Adding InvocationCompletedEvent handling at the top of __call__ drops any leftover bucket on invocation completion. The drop is idempotent so it composes with persist()'s drop without harm. Defensive total_tokens derivation. LlmEventPayload makes all three usage fields optional; providers that emit prompt + completion but no total (anything non-OpenAI in practice) would leave bucket.total_tokens at zero while the sub-fields are correct. Now derives total from prompt + completion when total is None on the payload. build_graph() self-contained per the demo convention. Previously, persist() depended on _compiled_graph + _accumulator module globals that only main() populated, so a copy-pasting reader doing `graph = build_graph(); await graph.invoke(...)` would hit RuntimeError at persist time. build_graph() now owns the accumulator construction, the graph attach, and the global wiring. main() drops the duplicate construction and just attaches OTel + Langfuse on top. * Reconcile docstrings + comments with refactor Four stale-text findings from a second PR review pass — all caused by the previous review pass changing behavior without fully sweeping the surrounding documentation. Module docstring snippet now shows the full call shape: ``await graph.drain_events_for(current_invocation_id(), timeout=2.0)``, matching the runnable persist() pattern (the previous snippet stripped the await + timeout for brevity but under-described the API). The accumulator's drop() comment block was rewritten to describe the actual two-step lifecycle: fast-path explicit drop after read by the terminal node, plus the InvocationCompletedEvent backstop that the prior pass added. The old comment claimed "does NOT auto-drop on InvocationCompletedEvent" which directly contradicted the implementation. RuntimeError message in persist() now points readers at build_graph() instead of main() for the initialization pattern — the prior pass moved the singleton wiring into build_graph but left the error message pointing at the old call site. Walkthrough doc's "three observers attached at compile time" becomes "three observers attached before invoke", which is honest for both the build_graph-side accumulator attachment and the main-side OTel + Langfuse attachments. attach_observer happens after compile() in the OA API regardless of which function calls it.
1 parent a283e62 commit e0c738d

3 files changed

Lines changed: 339 additions & 37 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Changed
1010

11-
- **Pinned spec advanced from v0.38.0 to v0.46.0.** Submodule + `[tool.openarmature].spec_version` + `conformance.toml` `spec_pin` advance together. Absorbs eight new proposals (0047-0054) into the conformance manifest. Two of them ship as part of the v0.12.0 cycle as textual-only acknowledgments with no code change required: **proposal 0051** (observability §8.4.1 Langfuse `trace.input` / `trace.output` implementation-surface caveat — documents that vendor SDK round-trip is required to project caller-side trace I/O updates onto the wire; the v0.11.0 (proposal 0043) caller-hook shape already matches the documented behavior) and **proposal 0053** (observability §3.4 shared-parent boundary clarification — tightens the structural-shared-parent classification to predicate the invocation span on whether at least one fan-out or parallel-branches dispatch is on the augmenter's call-stack path; behavior already matches via fixtures 034 + 039). Two more ship as implemented in this cycle: **proposal 0048** (read-symmetric metadata + queryable observer pattern docs — see *Added* below) and **proposal 0052** (implementation attribution attributes — landing in a follow-on PR of this cycle). The remaining proposals are marked `not-yet` in the conformance manifest with roadmap targets: 0047 + 0049 (v0.13.0 LLM provider hardening batch), 0050 (v0.14.0 retry & reliability batch), 0054 (per-invocation observer event drain — bundled into this v0.12.0 cycle alongside 0048 per the §9.4 accumulator-lifecycle pairing; lands in a follow-on PR).
11+
- **Pinned spec advanced from v0.38.0 to v0.46.0.** Submodule + `[tool.openarmature].spec_version` + `conformance.toml` `spec_pin` advance together. Absorbs eight new proposals (0047-0054) into the conformance manifest. Two ship as textual-only acknowledgments with no code change required: **proposal 0051** (observability §8.4.1 Langfuse `trace.input` / `trace.output` implementation-surface caveat — documents that vendor SDK round-trip is required to project caller-side trace I/O updates onto the wire; the v0.11.0 (proposal 0043) caller-hook shape already matches the documented behavior) and **proposal 0053** (observability §3.4 shared-parent boundary clarification — tightens the structural-shared-parent classification to predicate the invocation span on whether at least one fan-out or parallel-branches dispatch is on the augmenter's call-stack path; behavior already matches via fixtures 034 + 039). Three ship as fully implemented this cycle: **proposal 0048** (read-symmetric metadata + queryable observer pattern docs — see *Added* below), **proposal 0052** (implementation attribution attributes — see *Added* below), and **proposal 0054** (per-invocation observer event drain — see *Added* below; bundled with 0048 as the §9.4 accumulator-lifecycle pair). The remaining proposals are marked `not-yet` in the conformance manifest with roadmap targets: 0047 + 0049 (v0.13.0 LLM provider hardening batch) and 0050 (v0.14.0 retry & reliability batch).
1212
- **README and docs homepage refreshed around reasons-to-choose.** Replaced the 10-bullet "Why OpenArmature" feature inventory in `README.md` with 5 differentiating reasons (LLM-infused workflows to agents on one engine; crash-safe resume by contract; destination-pluggable observability with OTel + Langfuse, no SaaS lock-in; compile-time topology checks; spec + conformance). The docs homepage (`docs/index.md`) card grid carries the same five plus a sixth card retained from the previous grid for async-first / LLM-agnostic: workflows-to-agents, crash-safe, pluggable observability, bad-graphs-don't-compile, parallelism (fan-out + parallel-branches + nested correctness), async-first.
1313
- **Docs sweep: stale references and em-dash normalization.** Fixed three definite stale references (`spec_version='0.26.0'` in the Langfuse example output now reads `'0.38.0'`; the dangling `v0.16.1` qualifier dropped from the parallel-branches concept page; `compiled.attach_observer` corrected to `graph.attach_observer` in `non-obvious-shapes.md` for variable-name consistency with the rest of the docs). Swept em dashes out of the user-facing docs (130 instances across 17 files) per the convention set during the patterns expansion. mkdocs strict build clean; no broken intra-docs links.
1414
- **The checkpointing-and-migration example grows a crash-and-resume drama.** The first invoke of the v1 graph now hits a simulated transient failure inside `size_crew` (raises a `RuntimeError` on its first attempt only). The example catches `NodeException` at the `invoke()` boundary, prints what's saved on disk (`define_objective`'s position is already in `completed_positions`), then re-invokes with `resume_invocation=<id>`. The retried `size_crew` succeeds, `draft_timeline` runs, and the pipeline finishes - dramatizing the synchronous-checkpoint-by-contract reliability claim from the README pitch. The existing v1->v2 migration phase rides on top of the crash-survived checkpoint, so both reliability stories compose in one demo. Walk-through doc rewritten to cover both phases.

docs/examples/production-observability.md

Lines changed: 56 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Production observability with dual observers and timing middleware
1+
# Production observability with dual observers, timing middleware, and per-invocation cost rollup
22

33
!!! info "Source"
44
[https://github.com/LunarCommand/openarmature-python/blob/main/examples/production-observability/main.py](https://github.com/LunarCommand/openarmature-python/blob/main/examples/production-observability/main.py){target="_blank" rel="noopener"}
@@ -7,14 +7,19 @@ A single-turn lunar-mission Q&A endpoint instrumented the way you'd
77
ship it: BOTH OTel and Langfuse observers attached to the same
88
graph, caller hooks deriving domain-shaped `trace.input` /
99
`trace.output` from State, the built-in `TimingMiddleware`
10-
recording per-node duration, and multi-tenant caller-supplied
11-
metadata propagating to both observers in one `invoke()` call.
10+
recording per-node duration, multi-tenant caller-supplied
11+
metadata propagating to both observers in one `invoke()` call, AND
12+
a third queryable-accumulator observer that a terminal `persist`
13+
node reads at request scope after synchronizing on the deliver
14+
loop with `drain_events_for`.
1215

1316
## Overview
1417

15-
One node, one LLM call, two production-grade observability
16-
backends. The pipeline takes a question, calls the LLM, returns the
17-
answer. The interesting part is the observability wiring:
18+
Two nodes (`respond` then `persist`), one LLM call, three observers
19+
attached before invoke. The pipeline takes a question, calls the
20+
LLM, returns the answer, then synchronizes on the observer queue
21+
and rolls up token cost. The interesting part is the observability
22+
wiring:
1823

1924
- `OTelObserver` attached with an `InMemorySpanExporter`
2025
(production swaps this for `BatchSpanProcessor` +
@@ -77,6 +82,23 @@ sees the same logical events represented two ways.
7782
`InMemorySpanExporter` records every Span. Production
7883
deployments swap each for a real exporter / SDK adapter; the
7984
observer call surface doesn't change.
85+
- **Queryable accumulator + `drain_events_for`**
86+
([queryable observer pattern](../concepts/observability.md)).
87+
A third observer — `LlmUsageAccumulator` — subscribes to the
88+
same event stream but only records the LLM-namespace events
89+
carrying an `LlmEventPayload`. It accumulates per-invocation
90+
token totals in memory, indexed by `current_invocation_id()`.
91+
The terminal `persist` node calls
92+
`await graph.drain_events_for(current_invocation_id(), timeout=2.0)`
93+
to synchronize on the deliver loop, then reads the accumulator's
94+
bucket and drops it. Without the drain, the bucket might be
95+
missing the most-recent LLM event's tokens (the deliver loop
96+
hasn't reached them yet). The `Observer` protocol itself stays
97+
a single-callable shape; the accumulator just exposes its own
98+
read methods (`get_bucket` / `drop`) that the persist node knows
99+
about. This is the canonical shape for per-invocation cost
100+
attribution at request scope, replacing the round-trip-through-
101+
State workarounds that pre-v0.12.0 deployments used.
80102

81103
## How to run
82104

@@ -105,13 +127,15 @@ request id: <uuid>
105127
feature flag:v2-canary
106128
107129
[timing] respond: 1234.5ms (success)
130+
[persist] LLM usage: prompt=42, completion=38, total=80 across 1 call(s)
108131
answer: The primary objective of Apollo 11 was ...
109132
model: gpt-4o-mini-2024-07-18
110133
111134
--- captured OTel spans ---
112-
[openarmature.invocation] 1240.0ms openarmature.user.tenantId='demo-acme', ...
135+
[openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='0.46.0', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='0.12.0'
113136
[respond] 1235.0ms openarmature.node.name='respond', openarmature.user.tenantId='demo-acme', ...
114-
[openarmature.llm.complete] 1200.0ms gen_ai.system='openai', gen_ai.usage.input_tokens=42, ...
137+
[openarmature.llm.complete] 1200.0ms openarmature.user.tenantId='demo-acme', gen_ai.system='openai', gen_ai.usage.input_tokens=42, ...
138+
[persist] 2.0ms openarmature.node.name='persist', openarmature.user.tenantId='demo-acme', ...
115139
116140
--- captured Langfuse trace ---
117141
Trace id=<uuid>
@@ -133,12 +157,33 @@ Trace id=<uuid>
133157
`TimingMiddleware` callback as soon as the respond chain returns.
134158
`outcome` is `"success"` here; a `ProviderRateLimit` would surface
135159
as `outcome="exception"` with `exception_category="provider_rate_limit"`.
160+
- **`[persist] LLM usage: ...`**: emitted by the `persist` node
161+
after it drains the deliver loop and reads the
162+
`LlmUsageAccumulator`'s bucket for this invocation. If the drain
163+
times out (slow / hung observer), the persist line is prefixed by
164+
a `[persist] drain incomplete: N events still pending after 2.0s`
165+
surface — the production version of that log would also flip an
166+
SLO-breach metric.
136167
- **OTel spans block**: one line per captured span, sorted by
137168
start time. The relevant attributes shown are a curated subset
138169
for readability; the full attribute set is on each `Span` object
139-
for any reader inspecting them programmatically. Note the
140-
`openarmature.user.*` attributes appearing on every span (the
141-
cross-cutting attribute propagation from `invoke(metadata=...)`).
170+
for any reader inspecting them programmatically. Note three
171+
attribute families worth telling apart:
172+
- The root `openarmature.invocation` span carries
173+
`openarmature.graph.spec_version` plus the
174+
`openarmature.implementation.name` / `.version` attribution
175+
attributes. These are invocation-span-only (per spec §5.1) —
176+
operators filtering by library version use these.
177+
- The `openarmature.user.*` attributes appear on every span,
178+
reflecting the cross-cutting propagation from
179+
`invoke(metadata=...)`.
180+
- `gen_ai.usage.*` lands on the LLM span only, sourced from the
181+
provider's wire response.
182+
183+
The invocation span only lands in the exporter after the OTel
184+
observer's `shutdown()` is called (closing the root span). The
185+
demo calls it after `drain()` in the `finally` block; production
186+
long-running processes call it at process exit.
142187
- **Langfuse trace block**: the same invocation as seen by the
143188
Langfuse data model. `trace.input` / `trace.output` come from the
144189
caller hooks (`{"question": ...}` / `{"answer": ..., "model": ...}`)

0 commit comments

Comments
 (0)