Skip to content

[evolution] fix(#399): add SessionDB (state.db) scan path to introspection_extract#401

Closed
Da-Mikey wants to merge 1 commit into
Lexus2016:mainfrom
Da-Mikey:fix/399-sessiondb-introspection
Closed

[evolution] fix(#399): add SessionDB (state.db) scan path to introspection_extract#401
Da-Mikey wants to merge 1 commit into
Lexus2016:mainfrom
Da-Mikey:fix/399-sessiondb-introspection

Conversation

@Da-Mikey

Copy link
Copy Markdown
Contributor

[evolution]

Summary

The function in previously scanned only transcripts and snapshots. On installations using the SQLite SessionDB (), the table holds >90% of real sessions while on-disk files contain only error snapshots — making the introspection pipeline blind and error-biased.

What changed

  • **Added ** — reads the table grouped by , ordered by , casting rows to the message-dict shape expects (parses from JSON text for assistant turns, passes for tool results).
  • **Added parameter to ** — third scan path runs after JSONL and request_dump scan, feeding each DB session through identically.
  • Auto-detection — and now pass from to .

Before/After

Metric Before After
Sessions scanned (7d) 11 (error-biased request dumps) 338 (full conversation DB)
Database sessions visible 0 >90% of real sessions

Files changed

  • — +100 lines (new function, param, update)
  • — +4 lines (pass to )

All 16 existing tests pass.

…on_extract

The build_digest() function previously scanned only *.jsonl transcripts and
request_dump_*.json snapshots from the sessions directory. On installations
that persist conversation history via the SQLite SessionDB (state.db), the
messages table holds >90% of real sessions, while on-disk files contain only
error snapshots -- making the entire introspection pipeline blind and error-biased.

Changes:
1. Added _sessions_from_db(db_path, cutoff) -- reads the messages table
   grouped by session_id, ordered by timestamp, casting each row to the
   same message-dict shape scan_messages() consumes (role, content,
   tool_calls parsed from JSON text, tool_call_id).
2. Added optional db_path parameter to build_digest() -- when provided and
   the file exists, each session from the DB is fed through scan_messages()
   and aggregated identically to the file-based paths.
3. Updated main() and evolution_trace_miner.py to auto-detect
   state.db under HERMES_HOME and pass it to build_digest().

Before: 11 sessions scanned (from error-biased request dumps only)
After:  338 sessions scanned (from the real conversation database)

Resolves Lexus2016#399
@Da-Mikey

Copy link
Copy Markdown
Contributor Author

Closing — superseded by PR #402 (merged 9526b75). The SessionDB scan path and timeout gating fixes from #399/#400 are now on main. Thanks for the original contribution @Da-Mikey!

@Da-Mikey Da-Mikey closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant