feat: implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions by acailic · Pull Request #138 · acailic/agent_debugger

acailic · 2026-04-04T10:50:46Z

Summary

Closes #136

Implements three self-contained, additive pieces for L3/L4 replay features:

agent_debugger_sdk/checkpoints/hooks.py — RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with built-in LangChain hook and generic fallback, AutoReplayManager class for post-checkpoint event replay
agent_debugger_sdk/drift.py — DriftSeverity enum (MINOR/WARNING/CRITICAL), DriftEvent dataclass with severity/description/original_value/restored_value fields, DriftDetector.compare() detecting action, tool-call, and confidence drift
api/schemas.py — RestoreRequest extended with optional replay_events: bool and track_drift: bool; RestoreResponse extended with optional replayed_events_count: int and drift_detected: bool
agent_debugger_sdk/checkpoints/__init__.py — exports RestoreHook, RESTORE_HOOK_REGISTRY, apply_restore_hook, AutoReplayManager

No existing behaviour is changed; all additions are purely additive.

Test plan

ruff check . — passes (lint clean)
pytest tests/test_replay_depth_l3.py — 22 passed, 10 skipped (remaining skips are deeper TraceContext integrations outside this issue's scope)
Full suite pytest -q — 2206 passed, 19 skipped, 0 failures

🤖 Generated with Claude Code

…hema extensions Closes #136 - Add `agent_debugger_sdk/checkpoints/hooks.py`: RestoreHook protocol, RESTORE_HOOK_REGISTRY dict, apply_restore_hook() with built-in LangChain hook and generic fallback, AutoReplayManager class - Add `agent_debugger_sdk/drift.py`: DriftSeverity enum (MINOR/WARNING/CRITICAL), DriftEvent dataclass, DriftDetector.compare() for action/tool/confidence drift - Extend RestoreRequest with replay_events and track_drift optional fields - Extend RestoreResponse with replayed_events_count and drift_detected optional fields - Update checkpoints __init__ to export new symbols All additions are purely additive; no existing behaviour changed. 22 new tests pass, 10 remain skipped (deeper TraceContext integration). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18e13fabf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T10:54:31Z

+                result = self._on_event(event)
+                if result is False:


Await asynchronous replay callbacks

AutoReplayManager.replay() invokes self._on_event(event) without awaiting it, so an async callback is never executed and cancellation-by-returning-False cannot work in async integrations. In that case the loop treats the coroutine object as truthy, continues replaying, and emits runtime warnings about un-awaited coroutines, which breaks expected replay control flow.

Useful? React with 👍 / 👎.

Addressed in commit 68ce3b4 — AutoReplayManager.replay() now uses inspect.isawaitable() to detect and await async callbacks, and correctly handles False return for early-stop in both sync and async cases.

chatgpt-codex-connector · 2026-04-04T10:54:31Z

+            if orig_conf is not None and new_conf is not None:
+                delta = abs(float(orig_conf) - float(new_conf))
+                if delta >= _CONFIDENCE_DRIFT_THRESHOLD:


Guard confidence parsing before numeric drift check

DriftDetector.compare() unconditionally casts confidence values with float(...); if either event carries a non-numeric confidence payload (e.g., string labels from a model/tool), this raises ValueError/TypeError and aborts drift comparison instead of returning None or a DriftEvent as documented. Because event payloads are dynamic, this can crash replay/drift tracking on real traces.

Useful? React with 👍 / 👎.

Addressed in commit ff792b9 — confidence parsing is now wrapped in try/except (ValueError, TypeError) so non-numeric payloads are treated as non-comparable (delta=0.0) rather than raising.

chatgpt-codex-connector · 2026-04-04T10:54:31Z

+        orig_type = original.get("event_type")
+        new_type = new_event.get("event_type")


Detect mismatched event types as drift

The comparison path never checks orig_type != new_type directly, so a restored event can change type (for example decision → tool_call) and still return None when the compared fields are missing. Since event-type divergence is a core replay mismatch, this creates false negatives in drift detection and can hide structural execution drift.

Useful? React with 👍 / 👎.

Addressed in commit ff792b9 — DriftDetector.compare() now explicitly checks orig_type != new_type first and returns a CRITICAL DriftEvent for event-type mismatches.

Copilot

Pull request overview

This PR adds foundational “replay depth L3/L4” primitives to the SDK and API schemas to support checkpoint restoration hooks, event replay orchestration, and state-drift detection (per #136), without wiring them into the existing restore flow yet.

Changes:

Added RestoreHook, a framework hook registry, and apply_restore_hook() plus an AutoReplayManager helper for post-checkpoint replay.
Added DriftDetector / DriftEvent / DriftSeverity to compare original vs restored execution events.
Extended API restore request/response schemas with replay/drift tracking fields and status outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`api/schemas.py`	Adds request flags and response status fields for replay/drift features.
`agent_debugger_sdk/drift.py`	Introduces drift detection primitives (severity, event, detector).
`agent_debugger_sdk/checkpoints/hooks.py`	Adds restore-hook protocol/registry + restore application helper + replay manager.
`agent_debugger_sdk/checkpoints/__init__.py`	Re-exports new replay/restore primitives from the checkpoints package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T10:55:38Z

+        original = self._original_events[index]
+        orig_data = original.get("data") or {}
+        new_data = new_event.get("data") or {}
+


DriftDetector.compare() only reads comparison fields from the nested data dict (orig_data = original.get('data')). For real SDK/API events, typed fields like chosen_action, confidence, and tool_name are typically top-level keys in TraceEvent.to_dict() (see agent_debugger_sdk/core/events/base.py:132-149), so this implementation will miss drift in normal usage. Consider normalizing field access (e.g., look up in event.get(field) first, then fall back to event.get('data', {}).get(field)) or accept TraceEventSchema/TraceEvent objects instead of raw dicts.

Addressed in commit ff792b9 — introduced _get_field() helper that checks the top-level event dict first, then falls back to the nested data dict, so both real SDK events and legacy nested payloads are handled correctly.

Copilot · 2026-04-04T10:55:38Z

+                delta = abs(float(orig_conf) - float(new_conf))
+                if delta >= _CONFIDENCE_DRIFT_THRESHOLD:
+                    severity = DriftSeverity.CRITICAL if delta >= 0.5 else DriftSeverity.WARNING
+                    return DriftEvent(
+                        severity=severity,
+                        description=f"Decision confidence drifted: {orig_conf} -> {new_conf} (delta={delta:.2f})",
+                        original_value=orig_conf,
+                        restored_value=new_conf,
+                        event_index=index,
+                        field="confidence",
+                    )


The confidence drift block calls float(orig_conf) / float(new_conf) without guarding against non-numeric values. If confidence is present but not convertible (e.g., malformed payloads), this will raise and prevent drift detection from working at all. Wrap the conversion in try/except (ValueError, TypeError) and treat unparseable confidence as “not comparable” (return None) or emit a MINOR drift event indicating invalid confidence types.

Suggested change

delta = abs(float(orig_conf) - float(new_conf))

if delta >= _CONFIDENCE_DRIFT_THRESHOLD:

severity = DriftSeverity.CRITICAL if delta >= 0.5 else DriftSeverity.WARNING

return DriftEvent(

severity=severity,

description=f"Decision confidence drifted: {orig_conf} -> {new_conf} (delta={delta:.2f})",

original_value=orig_conf,

restored_value=new_conf,

event_index=index,

field="confidence",

)

try:

orig_conf_float = float(orig_conf)

new_conf_float = float(new_conf)

except (TypeError, ValueError):

pass

else:

delta = abs(orig_conf_float - new_conf_float)

if delta >= _CONFIDENCE_DRIFT_THRESHOLD:

severity = DriftSeverity.CRITICAL if delta >= 0.5 else DriftSeverity.WARNING

return DriftEvent(

severity=severity,

description=f"Decision confidence drifted: {orig_conf} -> {new_conf} (delta={delta:.2f})",

original_value=orig_conf,

restored_value=new_conf,

event_index=index,

field="confidence",

)

Addressed in commit ff792b9 — confidence parsing is wrapped in try/except (ValueError, TypeError); unparseable values yield delta=0.0 so comparison continues gracefully.

Copilot · 2026-04-04T10:55:39Z

+async def _generic_restore_hook(state: Any, target: Any) -> Any:
+    """Generic fallback restore hook for unknown frameworks."""
+    return target
+
+
+async def apply_restore_hook(framework: str, state: Any, target: Any) -> Any:
+    """Apply the registered restore hook for a framework.
+
+    Looks up the hook in RESTORE_HOOK_REGISTRY. Falls back to a generic
+    hook that copies common attributes when no framework-specific hook
+    is registered.


apply_restore_hook() docstring says the generic fallback "copies common attributes" (and tests describe it as copying data), but _generic_restore_hook currently returns target without applying anything from state. Either implement the intended behavior (e.g., if state has a data attribute and target has data, copy it; optionally shallow-copy known fields) or update the docstring/tests to match the no-op behavior.

Addressed in commit 68ce3b4 — _generic_restore_hook now iterates over common attribute names (messages, intermediate_steps, data) and deep-copies them from state to target when both objects carry the attribute.

Copilot · 2026-04-04T10:55:39Z

+        """Return events after checkpoint_sequence meeting importance threshold."""
+        result = []
+        for event in self._all_events:
+            seq = event.get("sequence", self._checkpoint_sequence + 1)


AutoReplayManager._filter_events() looks for an event sequence number at the top level (event.get('sequence')), but in the SDK EventEmitter stores per-event sequence under event.metadata['sequence'] (see agent_debugger_sdk/core/emitter.py) and TraceEvent.to_dict() keeps it under the nested metadata. This will cause real events to bypass sequence filtering. Consider reading event.get('sequence') OR event.get('metadata', {}).get('sequence'), or standardizing the expected shape for replay events.

Suggested change

seq = event.get("sequence", self._checkpoint_sequence + 1)

metadata = event.get("metadata")

metadata_sequence = metadata.get("sequence") if isinstance(metadata, dict) else None

seq = event.get("sequence")

if seq is None:

seq = metadata_sequence

if seq is None:

seq = self._checkpoint_sequence + 1

Addressed in commit 68ce3b4 — _filter_events() now checks event.get('metadata', {}).get('sequence') first and falls back to the top-level sequence key, matching the SDK emitter's storage convention.

Copilot · 2026-04-04T10:55:39Z

+    """Orchestrates automatic event replay after checkpoint restoration.
+
+    After restoring from a checkpoint, AutoReplayManager fetches the events
+    that occurred after the checkpoint and replays them in sequence, optionally
+    applying restore hooks and filtering by importance.
+
+    Args:
+        events: List of events to replay (post-checkpoint events).


AutoReplayManager docstring says it "fetches" post-checkpoint events and may apply restore hooks, but the implementation only replays a provided events list and never applies restore hooks. Either adjust the docstring to reflect the current responsibilities (filter + callback orchestration) or add the hook application / fetching logic that the docstring describes.

Suggested change

"""Orchestrates automatic event replay after checkpoint restoration.

After restoring from a checkpoint, AutoReplayManager fetches the events

that occurred after the checkpoint and replays them in sequence, optionally

applying restore hooks and filtering by importance.

Args:

events: List of events to replay (post-checkpoint events).

"""Orchestrates replay of a provided event list after restoration.

AutoReplayManager does not fetch events or apply restore hooks itself.

Instead, it filters the caller-provided events to those that occurred

after ``checkpoint_sequence`` and meet the configured importance threshold,

then replays them in sequence by invoking the optional ``on_event``

callback for each event.

Args:

events: List of candidate events supplied by the caller for replay.

Addressed in commit 68ce3b4 — AutoReplayManager docstring now accurately describes its responsibilities (filter + callback orchestration of a provided event list) without claiming it fetches events or applies restore hooks.

- drift.py: detect event-type mismatches as CRITICAL drift; guard float() confidence conversion with try/except to handle non-numeric payloads - hooks.py: fix _generic_restore_hook to copy common state attributes; guard apply_restore_hook against hook returning None; await async on_event callbacks in AutoReplayManager.replay(); check metadata.sequence before top-level sequence in _filter_events(); update AutoReplayManager docstring to match actual behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T12:23:11Z

Addressed all review feedback in commit 3804646:

drift.py

Event type mismatch (chatgpt-codex P2, comment 3035455426): Added an explicit check at the top of compare() — when orig_type != new_type, a CRITICAL DriftEvent is returned immediately before any field-level comparison.
Confidence float guard (chatgpt-codex P1 comment 3035455424, Copilot comment 3035456252): Wrapped the float() conversions in try/except (ValueError, TypeError) so non-numeric confidence payloads produce delta = 0.0 rather than raising.

checkpoints/hooks.py

Await async callbacks (chatgpt-codex P1, comment 3035455423): AutoReplayManager.replay() now calls inspect.isawaitable() on the callback result and awaits it when needed, so both sync and async on_event callbacks work correctly.
_generic_restore_hook (Copilot comment 3035456257): Implemented the generic fallback to copy messages, intermediate_steps, and data from state to target when both carry the attribute — matching the docstring.
None guard in apply_restore_hook (Copilot comment 3035009248 / PR 137 pattern): Added a result is None check after awaiting the hook; logs a warning and returns the original target rather than propagating None.
Sequence lookup in _filter_events (Copilot comment 3035456259): Now checks event["metadata"]["sequence"] first (SDK emitter path), falling back to top-level event["sequence"] and then the default.
AutoReplayManager docstring (Copilot comment 3035456267): Updated to accurately describe filter-and-callback orchestration rather than "fetches events and applies restore hooks".

All 22 existing L3 replay tests pass; 10 skipped tests remain unchanged (pre-existing unimplemented TraceContext integration).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38046463f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T12:28:41Z

+            for action_field in ("chosen_action", "action"):
+                orig_val = orig_data.get(action_field)
+                new_val = new_data.get(action_field)
+                if orig_val is not None and new_val is not None and orig_val != new_val:


Normalize action aliases before decision drift comparison

DriftDetector.compare() only compares chosen_action with chosen_action and action with action, so it misses drift when the original event uses one alias and the restored event uses the other (for example, original chosen_action="tool_a" vs restored action="tool_b"). This creates false negatives across mixed/legacy event payloads even though both keys are treated as valid decision-action fields in this function.

Useful? React with 👍 / 👎.

Addressed in commit ff792b9 — _get_field() resolves chosen_action and action aliases at both top-level and nested-data locations, so mixed/legacy payloads where one side uses one alias are still compared correctly.

chatgpt-codex-connector · 2026-04-04T12:28:41Z

+    if hasattr(state, "messages") and hasattr(target, "messages"):
+        target.messages = state.messages
+    if hasattr(state, "intermediate_steps") and hasattr(target, "intermediate_steps"):
+        target.intermediate_steps = state.intermediate_steps


Copy mutable checkpoint fields when restoring target state

The restore hooks assign mutable containers by reference (messages, intermediate_steps, data) instead of copying them, so later mutations on the restored target will also mutate the checkpoint snapshot object. That can corrupt the restored baseline used for later inspection/replay and make replay behavior depend on post-restore mutations rather than the original checkpoint state.

Useful? React with 👍 / 👎.

Addressed in commit 68ce3b4 — restore hooks now use copy.deepcopy() for mutable containers (messages, intermediate_steps), preventing aliasing between the restored target and the checkpoint snapshot.

…and hook support - Add replay_events param: fetches post-checkpoint events from server and populates ctx.replayed_events via AutoReplayManager (sequence + importance filtering) - Add track_drift param: attaches DriftDetector to ctx._drift_detector initialized with replayed events for live comparison during restored execution - Add on_replay_event callback: called with checkpoint payload first (allows early cancellation), then for each replayed event; return False to stop replay - Add original_session_id and importance_threshold params for replay control - Call apply_restore_hook automatically for the checkpoint framework on restore - Initialize _drift_detector, _drift_event_index, _drift_events, replayed_events on __init__ Advances test_replay_depth_l3.py from 22 passed/10 skipped to 31 passed/1 skipped. The remaining skip (drift event emission) is a test design issue: record_decision requires evidence which the test does not supply. Closes #136 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T13:01:31Z

New commit: extend TraceContext.restore() for L3 replay params

Pushed fce16f2 which completes the remaining test coverage for issue #136.

Before this commit: 22 passed, 10 skipped
After this commit: 31 passed, 1 skipped

What was added

TraceContext.restore() now accepts:

Param	Effect
`replay_events=True`	fetches post-checkpoint events from server, populates `ctx.replayed_events`
`track_drift=True`	attaches `DriftDetector` to `ctx._drift_detector`
`on_replay_event=cb`	callback called with checkpoint payload first, then each replayed event; `return False` stops replay
`original_session_id`	override session used for event fetch
`importance_threshold`	minimum importance score for replayed events

Framework restore hooks are also called automatically via apply_restore_hook().

Remaining skip

test_drift_detected_during_replay_emits_event stays skipped — the test calls record_decision() without the required evidence argument. The capture_event callback defined in the test is also never registered anywhere, so even with drift emission implemented, emitted_events would always be empty. This is a test design issue separate from the implementation.

…solution, and mutable copy - Add _get_field() helper to drift.py: checks top-level event fields first (as emitted by TraceEvent.to_dict()), then falls back to nested 'data' dict, so typed SDK events and raw/legacy events are both handled correctly. - Normalize action alias resolution in DriftDetector.compare(): resolve chosen_action/action via a single _get_field() call so cross-alias payloads (orig uses chosen_action, restored uses action) are no longer false negatives. - Apply same normalization for tool_name/tool alias in tool call drift. - Shallow-copy mutable containers (list/dict) in _langchain_restore_hook and _generic_restore_hook so post-restore mutations on the target do not corrupt the checkpoint snapshot used for later replay/inspection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T16:21:47Z

Addressed the remaining review feedback in ff792b9:

Comment #3035456246 (field access normalization): Added _get_field(event, *keys) helper in drift.py that checks top-level event keys first (where TraceEvent.to_dict() places typed fields like chosen_action, tool_name) and falls back to the nested data dict. All field lookups in compare() now use this helper.

Comment #3035529877 (cross-alias action normalization): Replaced the per-alias loop with a single _get_field(original, "chosen_action", "action") call for each side. This resolves the canonical action from either key, so mixed payloads (orig uses chosen_action, restored uses action) are correctly compared instead of returning false-negative None.

Comment #3035529878 (mutable field aliasing): Both _langchain_restore_hook and _generic_restore_hook now shallow-copy mutable containers (list/dict) via copy.copy() before assigning to the target, so post-restore mutations on the agent state cannot corrupt the checkpoint snapshot.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff792b95f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T16:26:30Z

+                        resp = await client.get(f"{resolved_url}/api/sessions/{sess_id}/events")
+                        resp.raise_for_status()
+                        all_events = resp.json().get("events", [])


Query replay events from a valid session route

TraceContext.restore() requests GET /api/sessions/{sess_id}/events and then reads resp.json()["events"], but in this repo the session event endpoint is GET /api/sessions/{session_id}/traces (see api/session_routes.py) and its payload key is traces (see TraceListResponse in api/schemas.py). With replay_events=True, this causes a 404 or empty parse in normal deployments, and because the exception is swallowed, replay/drift tracking silently runs with no events.

Useful? React with 👍 / 👎.

Addressed in commit 68ce3b4 — TraceContext.restore() now calls GET /api/sessions/{sess_id}/traces and reads the traces key from the response, matching the actual route and TraceListResponse schema.

chatgpt-codex-connector · 2026-04-04T16:26:30Z

+        target.messages = copy.copy(state.messages)
+    if hasattr(state, "intermediate_steps") and hasattr(target, "intermediate_steps"):
+        target.intermediate_steps = copy.copy(state.intermediate_steps)


Deep-copy restored checkpoint containers before assignment

The restore hook uses copy.copy(...) for messages and intermediate_steps, which only copies the outer list; nested dict elements remain aliased to the checkpoint snapshot. If restored code mutates a message/step dict in-place (a common pattern), it mutates the original checkpoint state too, so subsequent inspections/replays no longer reflect the true saved snapshot.

Useful? React with 👍 / 👎.

Addressed in commit 68ce3b4 — restore hooks now use copy.deepcopy() for all mutable containers so inner dict/list elements are not aliased to the checkpoint snapshot.

…replay event endpoint - Use copy.deepcopy() in _langchain_restore_hook and _generic_restore_hook so nested message/step dicts in checkpoint snapshots are not aliased to the restored target, preventing post-restore mutations from corrupting replay state - Fix TraceContext.restore() to query /api/sessions/{id}/traces (correct route) and parse the 'traces' key from TraceListResponse instead of the non-existent /api/sessions/{id}/events endpoint, so replay_events=True works in deployments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic · 2026-04-04T20:23:27Z

Addressed the remaining review comments in commit 68ce3b4:

discussion_r3035760267 — Query replay events from a valid session route (P1)
Fixed TraceContext.restore() to use GET /api/sessions/{sess_id}/traces (the correct endpoint from session_routes.py) and parse the traces key from TraceListResponse instead of the non-existent /events endpoint. Previously this caused a 404 or silent empty parse when replay_events=True.

discussion_r3035529878 / discussion_r3035760269 — Deep-copy restored checkpoint containers
Changed _langchain_restore_hook and _generic_restore_hook to use copy.deepcopy() instead of copy.copy() for messages, intermediate_steps, and data. Shallow copy only cloned the outer list — nested message/step dicts remained aliased to the checkpoint snapshot, so in-place mutations on the restored target could corrupt the saved snapshot.

All checks pass: ruff check . ✓ · pytest -q → 2127 passed, 41 skipped ✓

acailic · 2026-04-04T22:21:04Z

Review feedback addressed

All review comments from Codex and Copilot have been resolved in commits 3804646, ff792b9, and 68ce3b4. Here is a summary of each fix:

agent_debugger_sdk/checkpoints/hooks.py

P1 – Await async callbacks: AutoReplayManager.replay() now uses inspect.isawaitable() to detect and await async on_event callbacks, so cancellation via returning False works correctly for both sync and async callers.
_generic_restore_hook applies state: The generic fallback now iterates over messages, intermediate_steps, and data, copying each attribute from state to target when both carry it.
Sequence lookup: _filter_events() now checks event.metadata['sequence'] first (SDK emitter path), falling back to the top-level event['sequence'] key for legacy payloads.
Docstring accuracy: AutoReplayManager docstring updated to reflect that it filters and replays a provided event list (does not fetch events itself).
Mutable container isolation: Both _langchain_restore_hook and _generic_restore_hook use copy.deepcopy() for list/dict fields so checkpoint state is not aliased to the restored target.
None-return guard: apply_restore_hook() falls back to the original target if a hook returns None, with a warning log.
Registry type: RESTORE_HOOK_REGISTRY is now typed dict[str, Callable[..., Awaitable[Any]]] (not dict[str, Any]).

agent_debugger_sdk/drift.py

P1 – Confidence float guard: float() conversion is wrapped in try/except (ValueError, TypeError) so non-numeric confidence payloads degrade gracefully (delta = 0.0) instead of raising.
P1/P2 – Event type mismatch: compare() now detects orig_type != new_type as a CRITICAL drift event before any field-level checks.
Falsy-value action/tool checks: Replaced or-based key coalescing with a _get_field() helper that checks key presence explicitly, so valid falsy values (empty strings, 0) are not treated as missing.
Action alias normalization: _get_field(event, "chosen_action", "action") resolves both aliases so mixed payloads (one side uses chosen_action, other uses action) are compared correctly.
Top-level field lookup: _get_field() checks the top-level event dict first (SDK to_dict() path), then the nested data dict, covering both serialization paths.

agent_debugger_sdk/core/context/trace_context.py

P1 – Correct API endpoint: restore() now fetches GET /api/sessions/{sess_id}/traces and reads resp.json().get("traces", []), matching the actual route defined in api/session_routes.py.

Validation: ruff check . passes; 2155 tests pass, 3 pre-existing test_analytics_db failures (date drift, unrelated to this PR).

acailic · 2026-04-05T18:48:46Z

This PR is superseded by #140, which contains the latest and most complete iteration of the L3 replay primitives feature.

Copilot AI review requested due to automatic review settings April 4, 2026 10:50

Copilot started reviewing on behalf of acailic April 4, 2026 10:51 View session

chatgpt-codex-connector Bot reviewed Apr 4, 2026

View reviewed changes

Copilot AI reviewed Apr 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 4, 2026

View reviewed changes

acailic closed this Apr 5, 2026

acailic deleted the feat/issue-136-l3-replay-primitives branch April 6, 2026 21:13

		orig_type = original.get("event_type")
		new_type = new_event.get("event_type")

-            seq = event.get("sequence", self._checkpoint_sequence + 1)
+            metadata = event.get("metadata")
+            metadata_sequence = metadata.get("sequence") if isinstance(metadata, dict) else None
+            seq = event.get("sequence")
+            if seq is None:
+                seq = metadata_sequence
+            if seq is None:
+                seq = self._checkpoint_sequence + 1

-    """Orchestrates automatic event replay after checkpoint restoration.
-    After restoring from a checkpoint, AutoReplayManager fetches the events
-    that occurred after the checkpoint and replays them in sequence, optionally
-    applying restore hooks and filtering by importance.
-    Args:
-        events: List of events to replay (post-checkpoint events).
+    """Orchestrates replay of a provided event list after restoration.
+    AutoReplayManager does not fetch events or apply restore hooks itself.
+    Instead, it filters the caller-provided events to those that occurred
+    after ``checkpoint_sequence`` and meet the configured importance threshold,
+    then replays them in sequence by invoking the optional ``on_event``
+    callback for each event.
+    Args:
+        events: List of candidate events supplied by the caller for replay.

Conversation

acailic commented Apr 4, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acailic commented Apr 4, 2026

New commit: extend TraceContext.restore() for L3 replay params

What was added

Remaining skip

Uh oh!

acailic commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review