Activate fixture 037 case 5 (resume re-fire)#102
Merged
Conversation
Wires the langfuse conformance harness for the remaining decision-tree
case of proposal 0043's §8.4.1 trace.input/output sourcing fixture.
The two-phase resume flow (first invoke catches NodeException → resume
invoke completes) now runs end-to-end through new harness primitives:
- ``flaky: {fail_first_invocation_only: true, on_success: {...}}``
compact test seam in ``_build_node_body``.
- ``checkpointer: in_memory`` directive registers
``InMemoryCheckpointer`` on the graph builder.
- ``returns_state_snapshot`` added to ``_TRACE_IO_HOOK_REGISTRY``.
- ``_run_resume_case`` runs the two-phase flow + asserts both traces +
checks the §8.4.1 invariants (distinct trace ids, shared
correlation_id, first trace unchanged, hooks re-fire on resumed
trace).
Activation surfaced two engine bugs that PR #99 missed.
The first: ``InvocationCompletedEvent.final_state`` on the failure
path defaulted to ``starting_state``, but spec §8.4.1 *Resume
semantics* requires the failure-path ``trace.output`` hook to receive
"the partial final state captured at the failure point" (the most
recent successful step's post-merge state). Adds a new
``latest_state_box`` on ``_InvocationContext`` that the engine writes
after every successful step's ``state = step_result.state``
assignment; the outermost ``invoke()`` reads it in the finally-block
before falling back to ``starting_state``.
The second: ``latest_state_box`` MUST be per-context (unlike its
sibling ``final_node_box`` which shares by reference across subgraph
descents). An inner-subgraph step's success previously would
overwrite the outer box with an inner-typed state; on a subsequent
outer-level raise the outer ``trace.output`` hook would receive an
inner state when its signature expects the outer state class. Each
``descend_into_*`` method now omits ``latest_state_box`` from the
copy, so each level gets a fresh box.
Four new unit-test regressions pin the bug fix across all four
graph-descent shapes: flat, subgraph, fan-out instance, parallel-
branches branch. Each test wires a graph where an outer node
succeeds (outer_a_done=true) and a deeper raise propagates back; the
``trace_output_from_state`` hook MUST see the outer-state-typed
value with the success captured.
Cross-cap parser deferral for 037 stays in place — that parser
still doesn't model ``langfuse_trace`` shape. Activation lives in
the langfuse-specific harness only.
There was a problem hiding this comment.
Pull request overview
This PR activates the resume re-fire case for Langfuse trace input/output conformance fixture 037 and fixes failure-path state propagation so trace output hooks receive the correct partial state.
Changes:
- Adds resume-case Langfuse conformance harness support, including in-memory checkpointing, flaky-node behavior, and two-phase trace assertions.
- Tracks latest successful outer state for
InvocationCompletedEvent.final_stateon failure paths. - Adds regression tests for failure-path final state behavior across flat graphs and nested execution shapes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/graph/observer.py |
Adds per-context latest_state_box tracking to invocation context. |
src/openarmature/graph/compiled.py |
Uses latest successful state for failure-path invocation completion events. |
tests/conformance/test_observability_langfuse.py |
Extends Langfuse conformance harness for fixture 037 resume flow. |
tests/unit/test_observability_langfuse.py |
Adds regression coverage for partial final state and context isolation. |
CHANGELOG.md |
Documents fixture activation and failure-path fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR #102 review caught two issues: The fan-out regression test's inner subgraph contained only a raising node, so under the original shared-`latest_state_box` bug no inner step would have successfully written to the box — the test would have passed without exercising the leak it was meant to guard. The inner subgraph now has two nodes: `inner_succeeds` writes `inner_done=true` (so the descent's _invoke writes inner state to the box) followed by `inner_raises`. Confirmed by temp-reverting the descend-omit-`latest_state_box` change and observing the test fail with the typed-state-mismatch assertion. CHANGELOG said "three regression tests" but enumerated four (flat, subgraph, fan-out, parallel-branches). Bumped the count to four.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Activates the resume re-fire case of conformance fixture
observability/037-langfuse-trace-input-output, completing proposal 0043's §8.4.1 contract coverage. The two-phase flow (first invoke catchesNodeException→ resume invoke completes) verifies that:correlation_idis preserved across both traces per §3.1Harness extensions added:
flaky: {fail_first_invocation_only: true, on_success: {...}}compact test seam;checkpointer: in_memorydirective wiresInMemoryCheckpointer;returns_state_snapshotadded to the trace-io hook registry; new_run_resume_caseruns the two-phase flow + invariant assertions.Engine bug fixes the activation surfaced
PR #99 (proposal 0043) left two failure-path bugs that fixture 037 case 5 exposed:
Bug 1 —
InvocationCompletedEvent.final_statedefaulted tostarting_stateon raises.Spec §8.4.1 Resume semantics requires the failure-path
trace_output_from_statehook to receive "the partial final state captured at the failure point" — the most recent successful step's post-merge state. The original implementation gave the hook the bare initial state instead. Now the engine tracks the latest post-merge state via alatest_state_boxon_InvocationContext, updated after every successful step'sstate = step_result.stateassignment; the outermostinvoke()'s finally-block reads the box before falling back tostarting_state.Bug 2 —
latest_state_boxneeded per-context isolation.Unlike its sibling
final_node_box(shared by reference because the spec wants the innermost failing node's name — the real culprit),latest_state_boxMUST isolate per level so the outermost Langfuse trace receives outer-state-typed values. Without isolation, a subgraph-internal step's success would overwrite the outer box with an inner-typed state; on a subsequent outer-level raise the outertrace.outputhook would receive an inner state when its signature expects the outer state class. Eachdescend_into_*method now omitslatest_state_boxfrom the copy.Test coverage matrix (failure paths)
Test plan
uv run pytest tests/conformance/test_observability_langfuse.py -v -k \"037\"— 1 passed, 8 deselecteduv run pytest tests/ -q --ignore=tests/integration— 998 passed, 203 skipped, 0 failed (up from 994; four engine-bug regressions + verified fixture 037 all 5 cases)uv run pyright src/openarmature— 0 errorsuv run ruff check src/ tests/ examples/— cleanlangfuse_traceshape. Activation lives in the langfuse-specific harness only.