Commit c5761bb
authored
observability: phase 6.1 PR-C.3 — prepare_sync + fixture 010 (#27)
* prepare-sync: add active-span ContextVar + protocol docstring
Step 1 of PR-C.3. Adds the engine-readable
``current_active_observer_span`` ContextVar in
``observability/correlation.py``, with the inverted-directionality
docstring spec recommended (``observer→engine`` flow vs. PR-A's
``engine→observer`` set). Typed as ``object | None`` so the base
package stays free of an OpenTelemetry import — the OTel observer
writes ``Span`` instances; the engine treats the value opaquely
and delegates the actual attach to a try-imported OTel helper.
Also extends the ``Observer`` Protocol docstring to document the
optional ``prepare_sync(event) -> None`` extension method:
opt-in via ``hasattr``, no subclass or runtime_checkable Protocol
required, engine calls only for ``"started"``-phase events with
the same isolation contract as the async path. Engine wiring +
OTel observer refactor land in subsequent steps.
* prepare-sync: engine wiring + phase-gated forward in _dispatch
Step 2 of PR-C.3. ``_dispatch`` now calls each subscribed
observer's optional ``prepare_sync(event)`` synchronously BEFORE
queueing for ``"started"``-phase events, with the same isolation
contract as the async path: ``warnings.warn`` on exception,
doesn't block queueing or subsequent events.
Phase-gated: forwarding to ``prepare_sync`` only fires when
``"started"`` is in the subscribed observer's ``phases`` set —
mirrors how ``deliver_loop`` filters async dispatch. A user who
explicitly subscribes only to ``{"completed"}`` gets neither the
sync prep nor the async started events, so the wrapper acts as a
uniform phase shield across both axes.
Hook is opt-in via ``hasattr`` — observers without
``prepare_sync`` are unaffected. OTel observer's ``prepare_sync``
method lands in step 3.
* prepare-sync: OTelObserver sync core + ContextVar publish
Step 3 of PR-C.3.
Renames ``_handle_started`` → ``_open_started_span`` and bakes
idempotency into it: a short-circuit at the top returns early if
a span already exists for the event's ``_StackKey``. That covers
the common case where ``prepare_sync`` opened the span
synchronously in the engine task and the async ``__call__`` later
re-fires for the same event — the second call becomes a true
no-op rather than opening a duplicate span. Observer-attached-late
and test paths that bypass ``prepare_sync`` still get the span
opened via ``__call__``'s fall-through.
Adds the public ``prepare_sync(event)`` method. Routing-gated
(only ``"started"``-phase non-LLM events qualify), it calls
``_open_started_span`` and then publishes the just-opened span via
``_set_active_observer_span``. The engine's ``innermost`` reads
the ContextVar in step 4 to attach the span into the OTel context
so logs emitted from inside the node body — even on the first
line, before any ``await`` — pick up the right trace_id/span_id
via the OTel ``LoggingHandler``.
The Token returned by ``_set_active_observer_span`` is discarded
on purpose: last-writer-wins is the documented contract — the
next ``prepare_sync`` call overwrites, and the task-local context
dies with the invocation task.
* prepare-sync: engine OTel attach around node bodies
Step 4 of PR-C.3.
Adds a try-imported OTel attach helper pair in compiled.py:
``_attach_active_observer_span`` reads
``current_active_observer_span`` (set synchronously by an
observer's ``prepare_sync`` before queueing) and splices the
span into the OTel context via
``opentelemetry.context.attach(set_span_in_context(span))``;
``_detach_active_observer_span`` pairs the detach in ``finally``.
Both ``_step_function_node``'s and ``_step_fan_out_node``'s
``innermost`` closures now attach right after
``_dispatch_started`` returns and detach in a ``finally`` around
the ``await node.run(...)`` / ``await node.run_with_context(...)``
call. That puts the attach scope around exactly the user-code
window — so logs emitted on the FIRST line of a node body, before
any ``await``, pick up the right ``trace_id``/``span_id`` via
OTel's ``LoggingHandler`` — and the detach fires before
``_dispatch_completed`` queues the completed event or the merge
runs.
The except branch binds the OTel names to ``None`` so pyright
narrows on ``if _otel_attach is None: ...`` rather than flagging
"possibly unbound." Engine stays no-OTel-dep at runtime: installs
without ``[otel]`` get a no-op attach/detach, the ContextVar
stays ``None``, and nothing changes. Drives the load-bearing
log-correlation cases landing in steps 5 and 6.
* prepare-sync: drive fixture 010 (log correlation)
Step 5 of PR-C.3.
Promotes ``010-otel-log-correlation`` from ``_DEFERRED_FIXTURES``
to ``_SUPPORTED_FIXTURES`` and adds a hand-built driver covering
both YAML sub-cases. Driver is hand-built rather than going
through the conformance adapter — fixture 010's ``emits_log:``
directive isn't an adapter primitive (the adapter recognizes
``update_pure``, ``subgraph``, etc., and silently ignores
anything else), and the sub-cases are small enough that
hand-built python is clearer than threading a new directive
through the adapter.
Sub-case 1 (``log_records_carry_trace_span_correlation_ids``):
two nodes ``a`` → ``b``, both emit a log on the FIRST line of
their body (before any ``await`` — the load-bearing case
``prepare_sync`` exists to cover). Asserts all logs share a
trace_id, each log's span_id matches the active node span at
emission, and all carry the invocation's correlation_id.
Sub-case 2 (``detached_subgraph_log_uses_detached_trace_id...``):
outer invocation has a detached subgraph; logs across the boundary
land in different traces but share the correlation_id. Outer log
fires from per-node middleware on the SubgraphNode wrapper
(SubgraphNode wrappers don't get ``prepare_sync`` per spec — the
inner detached node handles attach for itself). Asserts trace_ids
differ + correlation_id flows unchanged.
Helpers ``_setup_isolated_log_bridge`` and ``_restore_log_state``
snapshot/restore root-logger handler+filter+factory state so the
process-global ``install_log_bridge`` mutations don't bleed into
neighboring tests. ``_enable_test_logger_at_info`` walks the
fixture-010 logger up to ``INFO`` so YAML's ``level: INFO``
records actually flow through Python's logger-level filter to
the bridge handler — undone on exit.
* prepare-sync: load-bearing first-line-log unit test
Step 6 of PR-C.3.
Adds ``test_log_on_first_line_of_node_body_carries_node_span``
under ``tests/unit/test_observability_otel.py``: a focused
single-node test that emits a log on the FIRST line of a node
body (before any ``await``) and asserts the resulting log record
carries the node span's ``trace_id`` AND ``span_id``.
This is the regression target ``prepare_sync`` exists to cover.
Without the synchronous engine-task observer prep:
- The engine queues the started event for async dispatch.
- The node body runs immediately in the engine task.
- A log emitted on the first line, before any ``await``, runs
before the OTel observer's ``__call__`` has fired on the worker
task — so the span isn't open yet, OTel's ``get_current()``
returns an invalid span, and the log lands with
``trace_id=0`` / ``span_id=0``.
With ``prepare_sync``, the observer creates the span synchronously
in the engine task BEFORE queueing, publishes it via the
``current_active_observer_span`` ContextVar, and the engine
attaches it to OTel context around the body. The first-line log
sees the right span. Lives in unit/ (not just buried in fixture
010's driver) so a regression jumps straight to ``prepare_sync``-
related code.
Snapshot/restore the root logger's handlers, filters, factory,
and the test logger's level so process-global ``install_log_bridge``
state doesn't bleed into other tests.
* prepare-sync: clear active-span ContextVar after detach
PR-C.3 review fixup. The cleared spec lifecycle reasoning at
the coord thread covered only the happy path: "ContextVar gets
overwritten on the next prepare_sync." If a subsequent
prepare_sync raises or early-returns without publishing — for
any reason — the engine reads the previous node's span and
attaches it around the new node's body, producing wrong log
correlation.
Bound the "ContextVar is set" window to the node-body scope by
clearing it to None in innermost's finally right after the OTel
detach (both _step_function_node's and _step_fan_out_node's
paths). Between dispatches and during merge / completed-event
dispatch the ContextVar is now None, so a failing or
early-returning prepare_sync can't reveal a stale span when the
engine reads. Lifecycle ownership stays with the attach/detach
scope rather than fanning out across observers in _dispatch.
Updated current_active_observer_span's docstring to reflect the
narrower lifecycle.
* prepare-sync: detect & warn on async user implementations
PR-C.3 review fixup. The opt-in-via-hasattr contract means
pyright doesn't catch a user signature mismatch when a developer
assumes "all observer methods are async" and defines
``async def prepare_sync(...)``. Today the call silently returns
an unawaited coroutine — the prep work never runs and Python
emits a delayed "coroutine was never awaited" RuntimeWarning at
GC time, breaking log correlation in a way that's hard to trace
back to the observer.
In ``_dispatch``, after each ``prepare_sync(event)`` returns,
check ``inspect.isawaitable(result)``. On hit: close the
awaitable (suppresses the secondary RuntimeWarning) and emit an
explicit ``warnings.warn`` naming the misconfiguration so it
fails loudly at the call site. Post-call detection catches the
common ``async def`` case AND the rarer
lambda-returning-coroutine / ``functools.partial``-of-async
cases — one check, all forms covered.
* prepare-sync: warn on close-cleanup failure (codeql)
PR-C.3 review fixup. The ``except Exception: pass`` after the
best-effort ``close_method()`` call tripped CodeQL's
``py/empty-except`` rule on two surfaces (code-quality + advanced
security). Cleanup is intentionally best-effort — a raise here
MUST NOT propagate or break sibling observers' dispatch — but
swallowing silently makes the rare cleanup-failure case
invisible.
Replace the empty pass with ``except Exception as close_error:``
followed by a ``warnings.warn`` mentioning the cleanup-failure.
Same isolation contract preserved (no propagation, no
sibling-blocking) but the swallow is now observable. CodeQL
``py/empty-except`` cleared on both surfaces.
* prepare-sync: precise warn text + spec-derived test bodies
PR-C.3 review fixup.
- observer.py: rewrite the awaitable-from-prepare_sync warning.
The old text claimed "did NOT run", but a user returning an
``asyncio.Task`` / ``Future`` may have work in flight on the
loop — just not awaited at the prepare_sync call site. The
contract violation is "no guarantee the prep completes before
the node body," not "definitely doesn't run." Reworded to that
shape and included ``type(result).__name__`` so the user
can see which awaitable they returned at a glance.
- tests/conformance/test_observability.py: sub-case 1's driver
hardcoded the YAML message bodies ("node a executing" /
"node b executing") for record filtering and lookup, even
though it had already read ``emits_log.message`` from the
spec to drive the node body. That duplicated spec data and
made the test brittle to fixture wording changes. Derive a
``node_emit_messages`` map from ``nodes_spec`` up front; use
the values for both record filtering and ``by_body`` indexing.
Sub-case 2 already worked this way (uses ``outer_emit`` /
``inner_emit`` derived from spec); sub-case 1 now matches.1 parent 9a29795 commit c5761bb
6 files changed
Lines changed: 783 additions & 73 deletions
File tree
- src/openarmature
- graph
- observability
- otel
- tests
- conformance
- unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| 74 | + | |
73 | 75 | | |
74 | 76 | | |
75 | 77 | | |
| |||
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
102 | 152 | | |
103 | 153 | | |
104 | 154 | | |
| |||
690 | 740 | | |
691 | 741 | | |
692 | 742 | | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
693 | 754 | | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | | - | |
704 | | - | |
705 | | - | |
706 | | - | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
707 | 772 | | |
708 | 773 | | |
709 | 774 | | |
| |||
1045 | 1110 | | |
1046 | 1111 | | |
1047 | 1112 | | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
1048 | 1126 | | |
1049 | | - | |
1050 | | - | |
1051 | | - | |
1052 | | - | |
1053 | | - | |
1054 | | - | |
1055 | | - | |
1056 | | - | |
1057 | | - | |
1058 | | - | |
1059 | | - | |
1060 | | - | |
1061 | | - | |
1062 | | - | |
1063 | | - | |
1064 | | - | |
1065 | | - | |
1066 | | - | |
1067 | | - | |
1068 | | - | |
1069 | | - | |
1070 | | - | |
1071 | | - | |
1072 | | - | |
1073 | | - | |
1074 | | - | |
1075 | | - | |
1076 | | - | |
1077 | | - | |
1078 | | - | |
1079 | | - | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
1080 | 1162 | | |
1081 | 1163 | | |
1082 | 1164 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
63 | 83 | | |
64 | 84 | | |
65 | 85 | | |
| |||
344 | 364 | | |
345 | 365 | | |
346 | 366 | | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
347 | 387 | | |
348 | 388 | | |
349 | 389 | | |
350 | 390 | | |
351 | 391 | | |
352 | 392 | | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
353 | 446 | | |
354 | 447 | | |
355 | 448 | | |
| |||
0 commit comments