[FIX] introspection_extract.py ignores SessionDB (state.db) — blind to >90% of real sessions

## Problem (observed in real usage)

The `introspection_extract.py` pre-extract script scans only on-disk files (`*.jsonl` transcripts and `request_dump_*.json` snapshots) under `~/.hermes/sessions/`. However, on installations that persist sessions via the SQLite `SessionDB` (`state.db`), the actual conversation history lives in the `messages` table — NOT in on-disk files. No `*.jsonl` files exist, and `request_dump_*.json` files are only error snapshots (written when a provider request fails).

### Evidence (aggregated, anonymized)

- Frequency: 100% of runs on SessionDB-backed installs. Last 7 days: the extractor reported `sessions_scanned: 11` (from request dumps only), while the SessionDB holds **108 active sessions** (4,502 messages) for the same window. The pipeline sees <10% of real sessions, and that 10% is biased toward error cases.
- Tool/area involved: `scripts/introspection_extract.py` → `build_digest()` → `sessions_dir.glob("*.jsonl")` and `sessions_dir.glob("request_dump_*.json")`
- Failure shape: `sessions_dir` contains zero `.jsonl` files and only error-snapshot `request_dump_*.json`; `state.db` table `messages` (schema: id, session_id, role, content, tool_call_id, tool_calls, tool_name, timestamp, ...) is never queried. Every prior introspection cycle operated on this biased 10% sample.

### Impact on real tasks

The entire self-improvement loop is working on a sample of <10% of sessions, and that sample is **error-biased** (request dumps are only written on failures). Successful sessions — where the agent works well and where patterns of efficiency, misunderstanding, and capability gaps actually live — are invisible. Introspection cannot find what it cannot see. Every pattern detection, issue filing, and realized-impact verification based on this data is working from a non-representative sample.

### Proposed direction

Add a third scan path to `build_digest()` that reads from the SessionDB SQLite database (`state.db` → `messages` table) when it exists. The table already carries `role`, `content`, `tool_calls`, `tool_name`, `tool_call_id`, and `timestamp` — the same fields `scan_messages()` already consumes. Group by `session_id`, order by `timestamp`, and pass each session's messages through the existing `scan_messages()`. This makes the extractor see 100% of sessions instead of <10%.

### Value

- Impact: 1.0 — the entire introspection pipeline is blind to >90% of real sessions
- Effort: 0.4 — add a SQLite cursor path reusing the existing `scan_messages()` function
- Priority Score: 1.68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] introspection_extract.py ignores SessionDB (state.db) — blind to >90% of real sessions #399

Problem (observed in real usage)

Evidence (aggregated, anonymized)

Impact on real tasks

Proposed direction

Value

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FIX] introspection_extract.py ignores SessionDB (state.db) — blind to >90% of real sessions #399

Description

Problem (observed in real usage)

Evidence (aggregated, anonymized)

Impact on real tasks

Proposed direction

Value

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions