Skip to content

Activate fixture 037 case 5 (resume re-fire)#102

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/0043-fixture-037-resume-case
May 31, 2026
Merged

Activate fixture 037 case 5 (resume re-fire)#102
chris-colinsky merged 2 commits into
mainfrom
feature/0043-fixture-037-resume-case

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

Activates the resume re-fire case of conformance fixture observability/037-langfuse-trace-input-output, completing proposal 0043's §8.4.1 contract coverage. The two-phase flow (first invoke catches NodeException → resume invoke completes) verifies that:

  • Each invoke mints its own Langfuse Trace per §8.4.1
  • Caller hooks re-fire on the resumed trace
  • The first trace's fields are NOT mutated by the resume
  • correlation_id is preserved across both traces per §3.1

Harness extensions added: flaky: {fail_first_invocation_only: true, on_success: {...}} compact test seam; checkpointer: in_memory directive wires InMemoryCheckpointer; returns_state_snapshot added to the trace-io hook registry; new _run_resume_case runs the two-phase flow + invariant assertions.

Engine bug fixes the activation surfaced

PR #99 (proposal 0043) left two failure-path bugs that fixture 037 case 5 exposed:

Bug 1 — InvocationCompletedEvent.final_state defaulted to starting_state on raises.
Spec §8.4.1 Resume semantics requires the failure-path trace_output_from_state hook to receive "the partial final state captured at the failure point" — the most recent successful step's post-merge state. The original implementation gave the hook the bare initial state instead. Now the engine tracks the latest post-merge state via a latest_state_box on _InvocationContext, updated after every successful step's state = step_result.state assignment; the outermost invoke()'s finally-block reads the box before falling back to starting_state.

Bug 2 — latest_state_box needed per-context isolation.
Unlike its sibling final_node_box (shared by reference because the spec wants the innermost failing node's name — the real culprit), latest_state_box MUST isolate per level so the outermost Langfuse trace receives outer-state-typed values. Without isolation, a subgraph-internal step's success would overwrite the outer box with an inner-typed state; on a subsequent outer-level raise the outer trace.output hook would receive an inner state when its signature expects the outer state class. Each descend_into_* method now omits latest_state_box from the copy.

Test coverage matrix (failure paths)

Path Covered
Flat graph failure ✅ new
Subgraph inner failure ✅ new
Fan-out instance inner failure ✅ new
Parallel-branches inner failure ✅ new
Resume re-fire (fixture 037 case 5) ✅ new (conformance)

Test plan

  • uv run pytest tests/conformance/test_observability_langfuse.py -v -k \"037\" — 1 passed, 8 deselected
  • uv run pytest tests/ -q --ignore=tests/integration — 998 passed, 203 skipped, 0 failed (up from 994; four engine-bug regressions + verified fixture 037 all 5 cases)
  • uv run pyright src/openarmature — 0 errors
  • uv run ruff check src/ tests/ examples/ — clean
  • Cross-cap parser deferral for 037 stays in place — that parser still doesn't model langfuse_trace shape. Activation lives in the langfuse-specific harness only.

Wires the langfuse conformance harness for the remaining decision-tree
case of proposal 0043's §8.4.1 trace.input/output sourcing fixture.
The two-phase resume flow (first invoke catches NodeException → resume
invoke completes) now runs end-to-end through new harness primitives:

- ``flaky: {fail_first_invocation_only: true, on_success: {...}}``
  compact test seam in ``_build_node_body``.
- ``checkpointer: in_memory`` directive registers
  ``InMemoryCheckpointer`` on the graph builder.
- ``returns_state_snapshot`` added to ``_TRACE_IO_HOOK_REGISTRY``.
- ``_run_resume_case`` runs the two-phase flow + asserts both traces +
  checks the §8.4.1 invariants (distinct trace ids, shared
  correlation_id, first trace unchanged, hooks re-fire on resumed
  trace).

Activation surfaced two engine bugs that PR #99 missed.

The first: ``InvocationCompletedEvent.final_state`` on the failure
path defaulted to ``starting_state``, but spec §8.4.1 *Resume
semantics* requires the failure-path ``trace.output`` hook to receive
"the partial final state captured at the failure point" (the most
recent successful step's post-merge state).  Adds a new
``latest_state_box`` on ``_InvocationContext`` that the engine writes
after every successful step's ``state = step_result.state``
assignment; the outermost ``invoke()`` reads it in the finally-block
before falling back to ``starting_state``.

The second: ``latest_state_box`` MUST be per-context (unlike its
sibling ``final_node_box`` which shares by reference across subgraph
descents).  An inner-subgraph step's success previously would
overwrite the outer box with an inner-typed state; on a subsequent
outer-level raise the outer ``trace.output`` hook would receive an
inner state when its signature expects the outer state class.  Each
``descend_into_*`` method now omits ``latest_state_box`` from the
copy, so each level gets a fresh box.

Four new unit-test regressions pin the bug fix across all four
graph-descent shapes: flat, subgraph, fan-out instance, parallel-
branches branch.  Each test wires a graph where an outer node
succeeds (outer_a_done=true) and a deeper raise propagates back; the
``trace_output_from_state`` hook MUST see the outer-state-typed
value with the success captured.

Cross-cap parser deferral for 037 stays in place — that parser
still doesn't model ``langfuse_trace`` shape.  Activation lives in
the langfuse-specific harness only.
Copilot AI review requested due to automatic review settings May 31, 2026 03:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR activates the resume re-fire case for Langfuse trace input/output conformance fixture 037 and fixes failure-path state propagation so trace output hooks receive the correct partial state.

Changes:

  • Adds resume-case Langfuse conformance harness support, including in-memory checkpointing, flaky-node behavior, and two-phase trace assertions.
  • Tracks latest successful outer state for InvocationCompletedEvent.final_state on failure paths.
  • Adds regression tests for failure-path final state behavior across flat graphs and nested execution shapes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/openarmature/graph/observer.py Adds per-context latest_state_box tracking to invocation context.
src/openarmature/graph/compiled.py Uses latest successful state for failure-path invocation completion events.
tests/conformance/test_observability_langfuse.py Extends Langfuse conformance harness for fixture 037 resume flow.
tests/unit/test_observability_langfuse.py Adds regression coverage for partial final state and context isolation.
CHANGELOG.md Documents fixture activation and failure-path fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_observability_langfuse.py
Comment thread CHANGELOG.md Outdated
PR #102 review caught two issues:

The fan-out regression test's inner subgraph contained only a raising
node, so under the original shared-`latest_state_box` bug no inner
step would have successfully written to the box — the test would
have passed without exercising the leak it was meant to guard. The
inner subgraph now has two nodes: `inner_succeeds` writes
`inner_done=true` (so the descent's _invoke writes inner state to the
box) followed by `inner_raises`. Confirmed by temp-reverting the
descend-omit-`latest_state_box` change and observing the test fail
with the typed-state-mismatch assertion.

CHANGELOG said "three regression tests" but enumerated four (flat,
subgraph, fan-out, parallel-branches). Bumped the count to four.
@chris-colinsky chris-colinsky merged commit fafe911 into main May 31, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0043-fixture-037-resume-case branch May 31, 2026 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants