Skip to content

Commit 3202869

Browse files
committed
Add feed-memex command: live MCP memory via memex-mvp integration
End-to-end verified: ran on real data, 16 sessions feed memex's inbox chokidar watcher, all 1381 dialogue messages indexed in SQLite + FTS5, memex_search returns hits from Cowork sessions immediately. Search hit: 'SberBusiness' → cowork-77f2b3f4 transcript. The most-automated workflow described in the strategy deck slide 11 is now real. WHAT IT DOES `claude-backup feed-memex` finds every Code + Cowork session, parses each via our existing parser (real-format-aware, dialogue-only filter, thinking-signature stripping), and writes a clean JSONL to ~/.memex/inbox/ in the FLAT shape memex's claude-code parser expects: {"role":"user","content":"...","timestamp":"...","id":"..."} {"role":"assistant","content":"...","timestamp":"...","id":"..."} Filename convention: code-<8-char-id>.jsonl — Claude Code sessions cowork-<8-char-id>.jsonl — Claude Cowork sessions Memex picks them up via chokidar within seconds, indexes via SQLite+FTS5, and exposes them to any MCP-compatible AI agent (Cursor, Cline, Claude Code, Continue, Zed) through memex_search/memex_recent/memex_get_conversation. WHY JSONL NOT MARKDOWN Investigated both. Markdown would require a new parser in memex AND lose structured timestamps + role + id fields (would need regex from heading markers, fragile). Memex's existing claude-code parser already reads flat top-level role/content/timestamp — feeding clean JSONL composes cleanly with zero patches in memex (cowork-prefix support is a tiny bonus patch). WHY NOT JUST SYMLINK RAW JSONL Memex's existing claude-code parser was written against the original spec (top-level role/content). Real Claude Code uses nested message.role/content, which memex's parser silently drops as 'no text'. Symlinking raw JSONL would feed memex an empty index. Pre-processing in claude-backup uses our real-format-aware parser, applies dialogue_text() to filter tool_use / thinking / signatures, then emits flat-format JSONL memex can index. DESIGN CHOICES IN SERVICE OF AUTOMATION - Stable msg ids (sha1 of role+timestamp+content[:200]) so re-runs of feed-memex don't duplicate (memex's UNIQUE(source, conversation_id, msg_id) constraint handles the rest). - Dialogue-only filter: tool_use / tool_result / thinking blocks all stripped, plus role gating (user/assistant only — drops legacy flat tool_result messages with string content that previously slipped through is_dialogue_message). - Two-source aware: scanner already discovers both Code and Cowork; the prefix in filename ('code-' vs 'cowork-') lets memex tag source correctly when running with the cowork-prefix patch. CODE - cli.py: feed_memex_cmd, _to_memex_record helper, DEFAULT_MEMEX_INBOX constant; --dry-run and --inbox flags. - 8 new tests covering: file creation, prefix conventions for both sources, JSONL record shape (matches memex parser expectations), noise stripping (tool_use, thinking, signatures), msg id stability across runs, dry-run behaviour, missing-memex error path. 114/114 tests pass. DOCS README + README.ru gain a new 'Live MCP memory' subsection under the handoff section, explaining the Cursor-agent workflow and the filename convention.
1 parent dee20d0 commit 3202869

4 files changed

Lines changed: 363 additions & 1 deletion

File tree

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,19 @@ Workflow:
223223

224224
A 200-message session typically packs into 80–200 KB of text — fine for context windows of any modern hosted assistant.
225225

226+
### Live MCP memory — `feed-memex`
227+
228+
If you run [memex-mvp](https://github.com/parallelclaw/memex-mvp) (a separate local MCP server), `claude-backup feed-memex` writes a clean dialogue-only JSONL of every session into memex's inbox folder (`~/.memex/inbox/`). Memex picks them up via `chokidar`, indexes via SQLite + FTS5, and exposes them through MCP to **any compatible AI agent** — Cursor, Cline, Claude Code, Continue, Zed.
229+
230+
```bash
231+
claude-backup feed-memex # write all sessions to ~/.memex/inbox/
232+
claude-backup feed-memex --dry-run # show what would be written
233+
```
234+
235+
Output is idempotent — re-run anytime, memex dedupes by stable msg_id. Once set up, your Cursor agent can just `memex_search("the migration we discussed in April")` and surface real results from your past Code/Cowork conversations. **Zero paste.** Zero context-switching.
236+
237+
Filename convention: `code-<8char>.jsonl` for Claude Code sessions, `cowork-<8char>.jsonl` for Cowork. Memex distinguishes the two via the prefix and tags them with separate `source` values, so you can filter `memex_search` by source.
238+
226239
### Rescue bundle (`rescue`) — the banned-user escape hatch
227240

228241
`handoff` is for one session. `rescue` is for **all of them at once** — built specifically for the situation where Anthropic suspends your account and you need to keep working. Your local files survive (Anthropic only revokes API access, not your disk), so you can package the lot and hand it to a different AI provider.

README.ru.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,19 @@ Workflow:
223223

224224
Сессия из 200 сообщений обычно укладывается в 80–200 КБ текста — нормально для context window любого современного хостед-ассистента.
225225

226+
### Живая MCP-память — `feed-memex`
227+
228+
Если у вас установлен [memex-mvp](https://github.com/parallelclaw/memex-mvp) (отдельный локальный MCP-сервер), команда `claude-backup feed-memex` пишет clean dialogue-only JSONL каждой сессии в memex'овский inbox (`~/.memex/inbox/`). Memex подхватывает через `chokidar`, индексирует в SQLite + FTS5, и даёт доступ через MCP **любому совместимому AI-агенту** — Cursor, Cline, Claude Code, Continue, Zed.
229+
230+
```bash
231+
claude-backup feed-memex # записать все сессии в ~/.memex/inbox/
232+
claude-backup feed-memex --dry-run # показать что будет записано
233+
```
234+
235+
Вывод идемпотентный — запускайте сколько угодно, memex дедуплицирует по стабильным msg_id. После настройки ваш Cursor-агент может вызвать `memex_search("миграция о которой мы говорили в апреле")` и получить реальные результаты из прошлых Code/Cowork разговоров. **Никакой пасты.** Никаких переключений контекста.
236+
237+
Конвенция имён: `code-<8char>.jsonl` для Claude Code сессий, `cowork-<8char>.jsonl` для Cowork. Memex различает их по префиксу и помечает разными `source`-значениями — можно фильтровать `memex_search` по источнику.
238+
226239
### Rescue-пакет (`rescue`) — escape hatch для забаненного юзера
227240

228241
`handoff` это для одной сессии. `rescue` — это для **всех сразу**, специально под ситуацию когда Anthropic заблокировал аккаунт и нужно продолжать работать. Локальные файлы переживают бан (Anthropic блокирует только API-доступ, не диск), так что можно упаковать всё и отдать другому AI-провайдеру.

claude_backup/cli.py

Lines changed: 115 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
from __future__ import annotations
44

5+
import hashlib
6+
import json
57
import sys
68
from pathlib import Path
79

@@ -16,7 +18,7 @@
1618
render_rescue_index,
1719
render_rescue_readme,
1820
)
19-
from .parser import parse_session
21+
from .parser import dialogue_text, is_dialogue_message, parse_session
2022
from .scanner import (
2123
ProjectInfo,
2224
SOURCE_CODE,
@@ -320,6 +322,118 @@ def rescue_cmd(ctx: click.Context, output: Path | None, lang: str) -> None:
320322
)
321323

322324

325+
DEFAULT_MEMEX_INBOX = Path.home() / ".memex" / "inbox"
326+
327+
328+
@main.command(
329+
"feed-memex",
330+
help=(
331+
"Write clean dialogue-only JSONL exports of every session to "
332+
"memex's inbox folder (default ~/.memex/inbox/). Memex (a local "
333+
"MCP server, see github.com/parallelclaw/memex-mvp) will index "
334+
"them via FTS5 and expose them to any MCP-compatible AI agent "
335+
"(Cursor, Cline, Claude Code, Continue, Zed) for live querying."
336+
),
337+
)
338+
@click.option(
339+
"--inbox",
340+
type=click.Path(path_type=Path),
341+
default=DEFAULT_MEMEX_INBOX,
342+
show_default=True,
343+
help="Memex inbox path.",
344+
)
345+
@click.option(
346+
"--dry-run",
347+
is_flag=True,
348+
default=False,
349+
help="Show what would be written without writing.",
350+
)
351+
@click.pass_context
352+
def feed_memex_cmd(ctx: click.Context, inbox: Path, dry_run: bool) -> None:
353+
if not inbox.parent.exists() and not dry_run:
354+
click.echo(
355+
f"Memex install not found at {inbox.parent}. Install it first:\n"
356+
" https://github.com/parallelclaw/memex-mvp",
357+
err=True,
358+
)
359+
sys.exit(1)
360+
if not dry_run:
361+
inbox.mkdir(parents=True, exist_ok=True)
362+
363+
projects = _safe_scan(ctx.obj.get("claude_home"), ctx.obj.get("cowork_home"))
364+
written = 0
365+
skipped = 0
366+
total_msgs = 0
367+
368+
for project in projects:
369+
for session in project.sessions:
370+
if session.jsonl_path is None or not session.jsonl_path.exists():
371+
continue
372+
373+
short_id = session.session_id[:8]
374+
prefix = "cowork" if session.source == SOURCE_COWORK else "code"
375+
target = inbox / f"{prefix}-{short_id}.jsonl"
376+
377+
messages = parse_session(session.jsonl_path)
378+
dialogue = [
379+
m
380+
for m in messages
381+
if m.role in ("user", "assistant") and is_dialogue_message(m)
382+
]
383+
if not dialogue:
384+
skipped += 1
385+
continue
386+
387+
lines = [
388+
_to_memex_record(m, session.session_id, prefix)
389+
for m in dialogue
390+
]
391+
392+
if dry_run:
393+
click.echo(
394+
f"would write: {target.name} ({len(lines)} msgs · "
395+
f"{session.title or session.first_prompt or '(untitled)'})"
396+
)
397+
else:
398+
with target.open("w", encoding="utf-8") as f:
399+
for line in lines:
400+
f.write(json.dumps(line, ensure_ascii=False) + "\n")
401+
written += 1
402+
total_msgs += len(lines)
403+
404+
if dry_run:
405+
click.echo(f"\nDry run. Would write {written} files; skipped {skipped} empty.")
406+
return
407+
408+
click.echo(
409+
f"\n✅ Fed memex: {written} files, {total_msgs} dialogue messages."
410+
f"\n inbox: {inbox}"
411+
f"\n memex (if running) will pick them up via chokidar within seconds."
412+
f"\n Re-run anytime — output is idempotent (memex dedupes by msg_id)."
413+
)
414+
415+
416+
def _to_memex_record(msg, session_id: str, prefix: str) -> dict:
417+
"""Serialize a Message into the flat shape memex's claude-code parser expects.
418+
419+
Memex looks at top-level `role`, `content` (string), `timestamp`, `id`.
420+
We use a stable hash for `id` so re-feeding deduplicates via memex's
421+
UNIQUE constraint on (source, conversation_id, msg_id).
422+
"""
423+
text = dialogue_text(msg)
424+
msg_id_seed = f"{msg.role}|{msg.timestamp}|{text[:200]}"
425+
msg_id = hashlib.sha1(msg_id_seed.encode("utf-8")).hexdigest()[:16]
426+
record = {
427+
"role": msg.role,
428+
"content": text,
429+
"timestamp": msg.timestamp,
430+
"id": f"{prefix}-{session_id[:8]}-{msg_id}",
431+
}
432+
if msg.model:
433+
record["model"] = msg.model
434+
return record
435+
436+
323437
def _project_subdir(project: ProjectInfo) -> Path:
324438
"""Pick the directory name for a project under <output>/<source>/.
325439

tests/test_feed_memex.py

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
"""Tests for the `feed-memex` command — writes clean JSONL to memex's inbox.
2+
3+
Memex is a separate local MCP server (github.com/parallelclaw/memex-mvp).
4+
This command produces dialogue-only JSONL files in memex's inbox folder
5+
in the flat format memex's existing claude-code parser expects:
6+
7+
{"role":"user","content":"...","timestamp":"...","id":"..."}
8+
{"role":"assistant","content":"...","timestamp":"...","id":"..."}
9+
10+
Each session becomes one .jsonl file named `code-<short>.jsonl` or
11+
`cowork-<short>.jsonl`. The prefix lets memex distinguish sources.
12+
"""
13+
14+
from __future__ import annotations
15+
16+
import json
17+
from pathlib import Path
18+
19+
from click.testing import CliRunner
20+
21+
from claude_backup.cli import main
22+
23+
24+
def test_feed_memex_creates_jsonl_per_session(
25+
claude_home: Path, tmp_path: Path
26+
) -> None:
27+
runner = CliRunner()
28+
inbox = tmp_path / "memex" / "inbox"
29+
inbox.parent.mkdir(parents=True, exist_ok=True)
30+
result = runner.invoke(
31+
main,
32+
[
33+
"--claude-home",
34+
str(claude_home),
35+
"feed-memex",
36+
"--inbox",
37+
str(inbox),
38+
],
39+
)
40+
assert result.exit_code == 0, result.output
41+
assert inbox.is_dir()
42+
files = list(inbox.glob("*.jsonl"))
43+
assert len(files) >= 1
44+
45+
46+
def test_feed_memex_uses_code_prefix_for_code_sessions(
47+
claude_home: Path, tmp_path: Path
48+
) -> None:
49+
runner = CliRunner()
50+
inbox = tmp_path / "memex" / "inbox"
51+
inbox.parent.mkdir(parents=True, exist_ok=True)
52+
result = runner.invoke(
53+
main,
54+
[
55+
"--claude-home",
56+
str(claude_home),
57+
"feed-memex",
58+
"--inbox",
59+
str(inbox),
60+
],
61+
)
62+
assert result.exit_code == 0, result.output
63+
code_files = list(inbox.glob("code-*.jsonl"))
64+
assert len(code_files) >= 1, "expected at least one code-*.jsonl file"
65+
66+
67+
def test_feed_memex_uses_cowork_prefix_for_cowork_sessions(
68+
claude_home: Path, tmp_path: Path
69+
) -> None:
70+
"""Test fixture has Cowork sessions in fixtures-cowork/. Once it's in the
71+
cowork-home, the feed-memex command should produce cowork-* files."""
72+
cowork_root = Path(__file__).parent / "fixtures-cowork"
73+
runner = CliRunner()
74+
inbox = tmp_path / "memex" / "inbox"
75+
inbox.parent.mkdir(parents=True, exist_ok=True)
76+
result = runner.invoke(
77+
main,
78+
[
79+
"--claude-home",
80+
str(claude_home),
81+
"--cowork-home",
82+
str(cowork_root),
83+
"feed-memex",
84+
"--inbox",
85+
str(inbox),
86+
],
87+
)
88+
assert result.exit_code == 0, result.output
89+
cowork_files = list(inbox.glob("cowork-*.jsonl"))
90+
assert len(cowork_files) >= 1, "expected at least one cowork-*.jsonl file"
91+
92+
93+
def test_feed_memex_record_shape_matches_memex_parser(
94+
claude_home: Path, tmp_path: Path
95+
) -> None:
96+
"""Each line must have top-level `role` + `content` (string) + `timestamp`
97+
+ `id`. That's what memex's claude-code parser reads (server.js:215-263)."""
98+
runner = CliRunner()
99+
inbox = tmp_path / "inbox"
100+
inbox.mkdir()
101+
runner.invoke(
102+
main,
103+
[
104+
"--claude-home",
105+
str(claude_home),
106+
"feed-memex",
107+
"--inbox",
108+
str(inbox),
109+
],
110+
)
111+
files = list(inbox.glob("code-*.jsonl"))
112+
assert files, "no code-* files written"
113+
114+
with files[0].open("r", encoding="utf-8") as f:
115+
for line in f:
116+
line = line.strip()
117+
if not line:
118+
continue
119+
rec = json.loads(line)
120+
assert "role" in rec and isinstance(rec["role"], str)
121+
assert "content" in rec and isinstance(rec["content"], str)
122+
assert rec["content"].strip(), "empty content should not be written"
123+
assert "id" in rec
124+
# timestamp may be empty string if missing in source
125+
assert "timestamp" in rec
126+
127+
128+
def test_feed_memex_strips_tool_use_and_thinking(
129+
claude_home: Path, tmp_path: Path
130+
) -> None:
131+
"""Ensures the output is dialogue-only — no tool_use, tool_result, or
132+
thinking signature noise pollutes the memex index."""
133+
runner = CliRunner()
134+
inbox = tmp_path / "inbox"
135+
inbox.mkdir()
136+
runner.invoke(
137+
main,
138+
[
139+
"--claude-home",
140+
str(claude_home),
141+
"feed-memex",
142+
"--inbox",
143+
str(inbox),
144+
],
145+
)
146+
for f in inbox.glob("*.jsonl"):
147+
content = f.read_text(encoding="utf-8")
148+
assert "[tool_use:" not in content
149+
assert "tool_result" not in content
150+
assert '"signature"' not in content
151+
152+
153+
def test_feed_memex_msg_ids_stable_across_runs(
154+
claude_home: Path, tmp_path: Path
155+
) -> None:
156+
"""Re-running feed-memex must produce identical msg ids so memex's
157+
UNIQUE(source, conversation_id, msg_id) dedupe works on re-import."""
158+
runner = CliRunner()
159+
inbox = tmp_path / "inbox"
160+
inbox.mkdir()
161+
162+
runner.invoke(
163+
main,
164+
["--claude-home", str(claude_home), "feed-memex", "--inbox", str(inbox)],
165+
)
166+
sample_file = next(inbox.glob("code-*.jsonl"))
167+
first_run = sample_file.read_text(encoding="utf-8")
168+
169+
# Re-run
170+
runner.invoke(
171+
main,
172+
["--claude-home", str(claude_home), "feed-memex", "--inbox", str(inbox)],
173+
)
174+
second_run = sample_file.read_text(encoding="utf-8")
175+
assert first_run == second_run, "msg ids must be deterministic"
176+
177+
178+
def test_feed_memex_dry_run_does_not_write(
179+
claude_home: Path, tmp_path: Path
180+
) -> None:
181+
runner = CliRunner()
182+
inbox = tmp_path / "inbox"
183+
inbox.parent.mkdir(parents=True, exist_ok=True)
184+
result = runner.invoke(
185+
main,
186+
[
187+
"--claude-home",
188+
str(claude_home),
189+
"feed-memex",
190+
"--inbox",
191+
str(inbox),
192+
"--dry-run",
193+
],
194+
)
195+
assert result.exit_code == 0, result.output
196+
assert "Dry run" in result.output or "would write" in result.output.lower()
197+
# No files written
198+
assert not inbox.exists() or list(inbox.glob("*.jsonl")) == []
199+
200+
201+
def test_feed_memex_errors_when_memex_dir_missing(tmp_path: Path) -> None:
202+
"""If the memex parent doesn't exist (memex not installed), error clearly."""
203+
runner = CliRunner()
204+
fake_inbox = tmp_path / "no-memex-here" / "inbox"
205+
# Create empty Claude Code home so scan succeeds
206+
claude_home = tmp_path / "code"
207+
(claude_home / "fake-project").mkdir(parents=True)
208+
(claude_home / "fake-project" / "x.jsonl").write_text(
209+
'{"role":"user","content":"hi","timestamp":"2026-01-01T00:00:00Z"}\n'
210+
)
211+
result = runner.invoke(
212+
main,
213+
[
214+
"--claude-home",
215+
str(claude_home),
216+
"feed-memex",
217+
"--inbox",
218+
str(fake_inbox),
219+
],
220+
)
221+
assert result.exit_code == 1
222+
assert "memex" in result.output.lower()

0 commit comments

Comments
 (0)