Skip to content

Commit aa3d023

Browse files
authored
feat: SessionStore.list_session_summaries — batch summary fetch for list-all (#847)
## Problem `list_sessions_from_store()` calls `store.list_sessions()` for IDs, then `store.load()` **per session** to derive title/first-prompt/branch/etc. With N sessions that's N full-transcript round-trips just to render a list. #846 (`load_range`) cut bytes-per-session but not round-trips — 500 sessions still meant 500+ store calls. ## Public API change — additive only, no breaking changes ```python # New TypedDict (3 fields; `data` is opaque, stores persist verbatim) class SessionSummaryEntry(TypedDict): session_id: str mtime: int data: dict[str, Any] # New helper — adapters call this from append() def fold_session_summary( prev: SessionSummaryEntry | None, key: SessionKey, entries: list[SessionStoreEntry], ) -> SessionSummaryEntry: ... # New optional Protocol method (default: raise NotImplementedError) class SessionStore(Protocol): async def list_session_summaries(self, project_key: str) -> list[SessionSummaryEntry]: ... ``` **Unchanged:** `SessionStore.{append,load,list_sessions,delete,list_subkeys}`, `list_sessions_from_store()` signature/return, `get_session_info_from_store()`, `SDKSessionInfo`, `SessionStoreListEntry`, `InMemorySessionStore` public methods. Stores that don't implement `list_session_summaries` work exactly as before. ## Approach Every field the list view needs is **append-incremental**: 4 are set-once from the first entries (`is_sidechain`, `created_at`, `cwd`, `first_prompt`), 7 are last-write-wins from the tail (`custom_title`, `ai_title`, `last_prompt`, `summary_hint`, `git_branch`, `tag`, `mtime`), 0 require a full scan. So a store can maintain a small summary record alongside each session, updated inside `append()` with the entries already in hand — no re-reads. This PR adds: - **`SessionSummaryEntry`** (TypedDict) — 3-field record (`session_id`, `mtime`, opaque `data`). Stores persist it verbatim and never interpret `data`. - **`fold_session_summary(prev, key, entries)`** — pure helper that folds new entries into the previous summary. Adapters call this from `append()` so derivation logic lives in one place (no per-adapter drift). `created_at` latches the first *parseable* entry timestamp — a documented divergence from the lite-parse path only when the very first entry lacks a timestamp (never happens in CLI-produced transcripts). - **`SessionStore.list_session_summaries(project_key)`** — optional Protocol method returning all summaries for a project in one call. - **Fast path in `list_sessions_from_store()`** — when the store implements `list_session_summaries`: build a unified slot list (summary-derived slots + gap-fill placeholders for sessions present in `list_sessions()` but lacking a sidecar), sort by `mtime`, apply `offset`/`limit`, then `load()` only the placeholders that landed in the page. Summary-backed sidechain/empty sessions are pre-filtered *before* pagination so they don't consume page positions (matching disk/slow-path filter-then-paginate); only gap-fill placeholders that resolve to `None` after load can short-page, so a store with complete sidecars never short-pages. `load()` count is bounded by page size, not total missing — zero `load()` calls when sidecars are complete. Gap-fill is best-effort: if the store lacks `list_sessions` it's skipped with a debug log. Otherwise falls back to the existing per-session `load()` path (bounded at 16 concurrent). - **`InMemorySessionStore`** reference impl (entry-timestamp-derived `mtime` so fast/slow paths sort on one clock) + conformance contract #14. `get_session_info_from_store()` (single session) is unchanged — full `load()` is fine there. ## Correctness `tests/test_session_summary.py::TestParityWithLiteParse` proves `summary_entry_to_sdk_info(fold_session_summary(...))` produces the same `SDKSessionInfo` as the existing `_parse_session_info_from_lite` batch path on the same entry stream. ## For reviewers - `_internal/session_summary.py` — fold logic, esp. first-prompt skip filter (mirrors `_extract_first_prompt_from_head`) - `_internal/sessions.py` fast-path block — fallback semantics - `types.py` — public surface: `SessionSummaryEntry`, `fold_session_summary`, `list_session_summaries` Supersedes #846. TS port: anthropics/claude-cli-internal#30520.
1 parent a05af6b commit aa3d023

8 files changed

Lines changed: 1481 additions & 62 deletions

File tree

src/claude_agent_sdk/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
tag_session_via_store,
3838
)
3939
from ._internal.session_store import InMemorySessionStore, project_key_for_directory
40+
from ._internal.session_summary import fold_session_summary
4041
from ._internal.sessions import (
4142
get_session_info,
4243
get_session_info_from_store,
@@ -109,6 +110,7 @@
109110
SessionStore,
110111
SessionStoreEntry,
111112
SessionStoreListEntry,
113+
SessionSummaryEntry,
112114
SettingSource,
113115
StopHookInput,
114116
StreamEvent,
@@ -602,8 +604,10 @@ async def call_tool(name: str, arguments: dict[str, Any]) -> Any:
602604
"SessionStore",
603605
"SessionStoreEntry",
604606
"SessionStoreListEntry",
607+
"SessionSummaryEntry",
605608
"SessionListSubkeysKey",
606609
"InMemorySessionStore",
610+
"fold_session_summary",
607611
"MirrorErrorMessage",
608612
"project_key_for_directory",
609613
# Session listing (SessionStore-backed async variants)

src/claude_agent_sdk/_internal/session_store.py

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@
1212
SessionStore,
1313
SessionStoreEntry,
1414
SessionStoreListEntry,
15+
SessionSummaryEntry,
1516
)
17+
from .session_summary import fold_session_summary
1618
from .sessions import project_key_for_directory
1719

1820
__all__ = [
@@ -41,11 +43,42 @@ class InMemorySessionStore(SessionStore):
4143
def __init__(self) -> None:
4244
self._store: dict[str, list[SessionStoreEntry]] = {}
4345
self._mtimes: dict[str, int] = {}
46+
self._summaries: dict[tuple[str, str], SessionSummaryEntry] = {}
47+
self._last_mtime = 0
48+
49+
def _next_mtime(self) -> int:
50+
"""Storage write time for this adapter, in Unix epoch ms.
51+
52+
Guaranteed strictly monotonically increasing across calls within the
53+
process so back-to-back appends always produce distinct mtimes (real
54+
storage backends — file mtime on modern filesystems, S3
55+
LastModified, Postgres updated_at — get this property for free from
56+
their commit ordering).
57+
"""
58+
now_ms = int(time.time() * 1000)
59+
if now_ms <= self._last_mtime:
60+
now_ms = self._last_mtime + 1
61+
self._last_mtime = now_ms
62+
return now_ms
4463

4564
async def append(self, key: SessionKey, entries: list[SessionStoreEntry]) -> None:
4665
k = _key_to_string(key)
4766
self._store.setdefault(k, []).extend(entries)
48-
self._mtimes[k] = int(time.time() * 1000)
67+
now_ms = self._next_mtime()
68+
# Maintain the per-session summary sidecar incrementally so
69+
# list_session_summaries() never re-reads. Subagent subpaths don't
70+
# contribute to the main session's summary.
71+
if key.get("subpath") is None:
72+
sk = (key["project_key"], key["session_id"])
73+
folded = fold_session_summary(self._summaries.get(sk), key, entries)
74+
# Stamp the sidecar with this adapter's storage write time — the
75+
# SAME clock list_sessions() exposes below. SessionSummaryEntry.
76+
# mtime is contractually storage write time (not entry time), so
77+
# the fast-path staleness check (summary.mtime < list_sessions
78+
# mtime) works correctly.
79+
folded["mtime"] = now_ms
80+
self._summaries[sk] = folded
81+
self._mtimes[k] = now_ms
4982

5083
async def load(self, key: SessionKey) -> list[SessionStoreEntry] | None:
5184
entries = self._store.get(_key_to_string(key))
@@ -64,6 +97,11 @@ async def list_sessions(self, project_key: str) -> list[SessionStoreListEntry]:
6497
)
6598
return results
6699

100+
async def list_session_summaries(
101+
self, project_key: str
102+
) -> list[SessionSummaryEntry]:
103+
return [s for (pk, _), s in self._summaries.items() if pk == project_key]
104+
67105
async def delete(self, key: SessionKey) -> None:
68106
k = _key_to_string(key)
69107
self._store.pop(k, None)
@@ -72,6 +110,7 @@ async def delete(self, key: SessionKey) -> None:
72110
# transcripts, metadata) so they aren't orphaned. A targeted delete
73111
# with an explicit subpath removes only that one entry.
74112
if key.get("subpath") is None:
113+
self._summaries.pop((key["project_key"], key["session_id"]), None)
75114
prefix = f"{key['project_key']}/{key['session_id']}/"
76115
for store_key in [sk for sk in self._store if sk.startswith(prefix)]:
77116
self._store.pop(store_key, None)
@@ -103,6 +142,8 @@ def clear(self) -> None:
103142
"""Test helper — clear all stored data."""
104143
self._store.clear()
105144
self._mtimes.clear()
145+
self._summaries.clear()
146+
self._last_mtime = 0
106147

107148

108149
def file_path_to_session_key(file_path: str, projects_dir: str) -> SessionKey | None:
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
"""Incremental session-summary derivation for :class:`SessionStore` adapters.
2+
3+
:func:`fold_session_summary` lets a store maintain a per-session
4+
:class:`SessionSummaryEntry` sidecar incrementally inside ``append()`` so
5+
``list_sessions_from_store()`` can fetch all metadata in a single
6+
``list_session_summaries()`` call instead of N per-session ``load()`` calls.
7+
8+
Every derived field is append-incremental (set-once or last-wins) so adapters
9+
never need to re-read previously appended entries.
10+
"""
11+
12+
from __future__ import annotations
13+
14+
from datetime import datetime
15+
from typing import Any, cast
16+
17+
from ..types import (
18+
SDKSessionInfo,
19+
SessionKey,
20+
SessionStoreEntry,
21+
SessionSummaryEntry,
22+
)
23+
from .sessions import _COMMAND_NAME_RE, _SKIP_FIRST_PROMPT_PATTERN
24+
25+
__all__ = ["fold_session_summary", "summary_entry_to_sdk_info"]
26+
27+
28+
# Map of JSONL entry keys → SessionSummaryEntry keys for last-wins string
29+
# fields. Each appended entry overwrites the previous value when present.
30+
_LAST_WINS_FIELDS: dict[str, str] = {
31+
"customTitle": "custom_title",
32+
"aiTitle": "ai_title",
33+
"lastPrompt": "last_prompt",
34+
"summary": "summary_hint",
35+
"gitBranch": "git_branch",
36+
}
37+
38+
39+
def _iso_to_epoch_ms(ts: Any) -> int | None:
40+
"""Parse an ISO-8601 timestamp string to Unix epoch milliseconds."""
41+
if not isinstance(ts, str):
42+
return None
43+
try:
44+
# Python 3.10's fromisoformat doesn't support trailing 'Z'
45+
norm = ts.replace("Z", "+00:00") if ts.endswith("Z") else ts
46+
return int(datetime.fromisoformat(norm).timestamp() * 1000)
47+
except ValueError:
48+
return None
49+
50+
51+
def _entry_text_blocks(entry: dict[str, Any]) -> list[str]:
52+
"""Extract text strings from a ``type=="user"`` entry's message content."""
53+
message = entry.get("message")
54+
if not isinstance(message, dict):
55+
return []
56+
content = message.get("content")
57+
texts: list[str] = []
58+
if isinstance(content, str):
59+
texts.append(content)
60+
elif isinstance(content, list):
61+
for block in content:
62+
if (
63+
isinstance(block, dict)
64+
and block.get("type") == "text"
65+
and isinstance(block.get("text"), str)
66+
):
67+
texts.append(block["text"])
68+
return texts
69+
70+
71+
def _fold_first_prompt(data: dict[str, Any], entry: dict[str, Any]) -> None:
72+
"""Replicate ``_extract_first_prompt_from_head`` for a single parsed entry.
73+
74+
Mutates ``data`` in place: sets ``first_prompt`` + ``first_prompt_locked``
75+
on a real match, or stashes a ``command_fallback`` for slash-command
76+
messages. Skips tool_result, isMeta, isCompactSummary, and auto-generated
77+
patterns.
78+
"""
79+
if data.get("first_prompt_locked"):
80+
return
81+
if entry.get("type") != "user":
82+
return
83+
if entry.get("isMeta") is True or entry.get("isCompactSummary") is True:
84+
return
85+
# Skip tool_result-carrying user messages.
86+
message = entry.get("message")
87+
if isinstance(message, dict):
88+
content = message.get("content")
89+
if isinstance(content, list) and any(
90+
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
91+
):
92+
return
93+
94+
for raw in _entry_text_blocks(entry):
95+
result = raw.replace("\n", " ").strip()
96+
if not result:
97+
continue
98+
cmd_match = _COMMAND_NAME_RE.search(result)
99+
if cmd_match:
100+
if not data.get("command_fallback"):
101+
data["command_fallback"] = cmd_match.group(1)
102+
continue
103+
if _SKIP_FIRST_PROMPT_PATTERN.match(result):
104+
continue
105+
if len(result) > 200:
106+
result = result[:200].rstrip() + "\u2026"
107+
data["first_prompt"] = result
108+
data["first_prompt_locked"] = True
109+
return
110+
111+
112+
def fold_session_summary(
113+
prev: SessionSummaryEntry | None,
114+
key: SessionKey,
115+
entries: list[SessionStoreEntry],
116+
) -> SessionSummaryEntry:
117+
"""Fold a batch of appended entries into the running summary for ``key``.
118+
119+
Stores call this from inside ``append()`` to keep a
120+
:class:`SessionSummaryEntry` sidecar up to date without re-reading the
121+
transcript. ``prev`` is the previous summary for the same key (or ``None``
122+
for the first append).
123+
124+
Do not call this for keys with a ``subpath`` — subagent transcripts must
125+
not contribute to the main session's summary. Guard with
126+
``if key.get("subpath") is None:`` before calling.
127+
128+
All derived state lives in the opaque ``data`` dict; stores persist it
129+
verbatim and do not interpret it.
130+
131+
``mtime`` is NOT touched by the fold — it is the sidecar's storage
132+
write time and must be stamped by the adapter after persisting. It has
133+
to share a clock with the ``mtime`` returned by
134+
:meth:`SessionStore.list_sessions` for the same session (typically file
135+
mtime, S3 ``LastModified``, Postgres ``updated_at``, or whatever native
136+
timestamp the adapter surfaces); deriving it from entry ISO timestamps
137+
would make every batched-write sidecar appear strictly older than the
138+
session's current mtime, defeating the fast-path staleness check. For a
139+
new session (``prev is None``) the fold returns ``mtime=0`` as a
140+
placeholder; the adapter is expected to overwrite it.
141+
142+
``created_at`` latches the first parseable entry timestamp; the disk
143+
lite-parse only inspects the first line, so for streams whose first
144+
entry lacks a timestamp (does not occur in CLI-produced transcripts)
145+
the fold path yields a non-``None`` ``created_at`` where lite-parse
146+
yields ``None``.
147+
"""
148+
if prev is not None:
149+
summary: SessionSummaryEntry = {
150+
"session_id": prev["session_id"],
151+
"mtime": prev["mtime"],
152+
"data": dict(prev["data"]),
153+
}
154+
else:
155+
summary = {"session_id": key["session_id"], "mtime": 0, "data": {}}
156+
data = summary["data"]
157+
158+
for raw in entries:
159+
# SessionStoreEntry is a permissive TypedDict; widen to a plain dict
160+
# so .get() of unknown keys type-checks.
161+
entry = cast("dict[str, Any]", raw)
162+
163+
ms = _iso_to_epoch_ms(entry.get("timestamp"))
164+
165+
if "is_sidechain" not in data:
166+
data["is_sidechain"] = entry.get("isSidechain") is True
167+
if "created_at" not in data and ms is not None:
168+
data["created_at"] = ms
169+
170+
if "cwd" not in data:
171+
cwd = entry.get("cwd")
172+
if isinstance(cwd, str) and cwd:
173+
data["cwd"] = cwd
174+
175+
_fold_first_prompt(data, entry)
176+
177+
for src, dst in _LAST_WINS_FIELDS.items():
178+
val = entry.get(src)
179+
if isinstance(val, str):
180+
data[dst] = val
181+
182+
if entry.get("type") == "tag":
183+
tag_val = entry.get("tag")
184+
if isinstance(tag_val, str) and tag_val:
185+
data["tag"] = tag_val
186+
else:
187+
# Empty string or absent tag clears the tag.
188+
data.pop("tag", None)
189+
190+
return summary
191+
192+
193+
def summary_entry_to_sdk_info(
194+
entry: SessionSummaryEntry, project_path: str | None
195+
) -> SDKSessionInfo | None:
196+
"""Convert a :class:`SessionSummaryEntry` to :class:`SDKSessionInfo`.
197+
198+
Returns ``None`` for sidechain sessions or sessions with no extractable
199+
summary, matching ``_parse_session_info_from_lite``'s filtering.
200+
"""
201+
data = entry["data"]
202+
if data.get("is_sidechain"):
203+
return None
204+
205+
first_prompt = (
206+
data.get("first_prompt")
207+
if data.get("first_prompt_locked")
208+
else data.get("command_fallback")
209+
) or None
210+
custom_title = data.get("custom_title") or data.get("ai_title") or None
211+
summary = (
212+
custom_title
213+
or data.get("last_prompt")
214+
or data.get("summary_hint")
215+
or first_prompt
216+
)
217+
if not summary:
218+
return None
219+
220+
return SDKSessionInfo(
221+
session_id=entry["session_id"],
222+
summary=summary,
223+
last_modified=entry["mtime"],
224+
# file_size is a JSONL byte count — meaningful only for the local-disk
225+
# path (see SDKSessionInfo.file_size). Stores have no equivalent.
226+
file_size=None,
227+
custom_title=custom_title,
228+
first_prompt=first_prompt,
229+
git_branch=data.get("git_branch") or None,
230+
cwd=data.get("cwd") or project_path or None,
231+
tag=data.get("tag") or None,
232+
created_at=data.get("created_at"),
233+
)

0 commit comments

Comments
 (0)