Skip to content

feat: capture memcache snapshot and tool log on session failure#217

Closed
sheeki03 wants to merge 6 commits intoGitHubSecurityLab:mainfrom
sheeki03:feat/session-failure-forensics
Closed

feat: capture memcache snapshot and tool log on session failure#217
sheeki03 wants to merge 6 commits intoGitHubSecurityLab:mainfrom
sheeki03:feat/session-failure-forensics

Conversation

@sheeki03
Copy link
Copy Markdown

Problem

When mark_failed() is called after a crash or retry exhaustion, only the error string is saved. There is no record of what the agent found (memcache state) or what tools it called (tool log). Post-mortem inspection requires re-running the entire workflow.

Depends on: #216 (auto-save scaffolding)

Changes

Session model (session.py)

Adds two new fields to TaskflowSession:

  • memcache_snapshot: dict[str, Any] — full memcache state at failure time
  • tool_log_snapshot: list[dict[str, Any]] — auto-save tool log entries at failure time

Both default to empty (backward-compatible with existing session JSON files). mark_failed() accepts optional memcache_snapshot and tool_log_snapshot parameters.

Backend snapshot methods

Adds snapshot_state() to both memcache backends:

  • SqliteBackend: Queries all distinct keys, returns merged values. For _log: prefixed keys, uses get_log() when available (PR Bump authlib from 1.6.3 to 1.6.4 #4 adds it) or falls back to get_state().
  • MemcacheDictionaryFileBackend: Inflates from disk, returns a copy.deepcopy() of the in-memory dict. Deep copy prevents callers from accidentally mutating backend state through nested references.

Runner wiring (runner.py)

Both mark_failed call sites (retry exhaustion and must_complete failure) now capture:

  1. _snapshot_memcache_state() — instantiates the appropriate backend and calls snapshot_state()
  2. read_tool_log(_auto_save_dir) — reads the NDJSON auto-save log

Tests

  • TestSessionForensics (3 tests): round-trip with snapshots, backward-compatible without snapshots, old session JSON without new fields loads
  • TestSnapshotStateSqlite (2 tests): all keys returned, empty DB
  • TestSnapshotStateDictFile (2 tests): deep copy verified (nested mutation doesn't affect backend), empty state

Adds _tool_call_counter, _auto_save_interval, _auto_save_dir,
_write_auto_save, and _read_tool_log to run_main. When AUTO_SAVE_DIR
and AUTO_SAVE_INTERVAL are set, tool results are periodically appended
to an NDJSON log file. Disabled by default (interval=0).
Moves write_auto_save() and read_tool_log() from closures inside
run_main() to module-level functions with explicit parameters. Tests
now exercise the real implementation instead of duplicating the logic.
- Add encoding="utf-8" to open() in write_auto_save and read_tool_log
- Catch ValueError on non-numeric AUTO_SAVE_INTERVAL with fallback to 0
- Soften docstring from "crash-safe" to "append-only"
Adds memcache_snapshot and tool_log_snapshot fields to TaskflowSession.
On mark_failed, the runner captures current memcache state via
snapshot_state() and the auto-save tool log for post-mortem inspection.
Adds snapshot_state() to sqlite and dictionary_file backends.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants