Skip to content

feat(scripts): cross-process persistence mode for recall benchmark (Run 3)#1209

Merged
silversurfer562 merged 1 commit into
mainfrom
claude/quirky-mestorf-5a26a8
Jul 1, 2026
Merged

feat(scripts): cross-process persistence mode for recall benchmark (Run 3)#1209
silversurfer562 merged 1 commit into
mainfrom
claude/quirky-mestorf-5a26a8

Conversation

@silversurfer562

Copy link
Copy Markdown
Member

Summary

Follow-up to #1208, closing the starter's recommended next step: Runs 1–2 of the memory recall benchmark captured and queried within the same PersonalMemory instance/process, leaving cross-process persistence an untested assumption. This adds a --phase persistence mode that captures the corpus in one subprocess, lets it exit, then evaluates from a brand-new PersonalMemory instance in a second subprocess against the same on-disk global_root.

Result (logged as Run 3 in the spec's decisions.md)

Identical to Run 2 in every dimension — hit@1 18/18 (100%), hit@3 18/18 (100%), byte-identical positive/negative score distributions. Recall is fully file-backed and survives process death; the "probably fine mechanically" assumption is now a measured fact.

Changes

  • scripts/memory_recall_eval.py: split run_benchmark() into capture_corpus() + evaluate(); add --phase {all,persistence,capture,evaluate} CLI (default all preserves Runs 1–2 behavior exactly). Subprocess results pass via a JSON file, not stdout — attune_rag's structlog lines print to stdout and corrupt inline JSON.
  • docs/specs/memory-recall-eval/decisions.md: Run 3 entry with method, numbers, and verdict.

Verification

Both modes run green locally with identical output:

  • --phase all (default): hit@1 18/18, hit@3 18/18 — matches Run 2
  • --phase persistence: hit@1 18/18, hit@3 18/18 — same distributions

🤖 Generated with Claude Code

…un 3)

Adds --phase {all,persistence,capture,evaluate} to
scripts/memory_recall_eval.py. Persistence mode captures the corpus in
one subprocess, lets it exit, then evaluates from a brand-new
PersonalMemory instance in a second subprocess against the same
on-disk global root.

Result: identical to Run 2 in every dimension (hit@1 18/18, hit@3
18/18, same score distributions) - recall is fully file-backed and
survives process death. Logged as Run 3 in
docs/specs/memory-recall-eval/decisions.md.

Results pass between subprocesses via a JSON file rather than stdout
because attune_rag's structlog output prints to stdout.
@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
attune-ai.dev Ready Ready Preview, Comment Jul 1, 2026 10:42pm
website Ready Ready Preview, Comment Jul 1, 2026 10:42pm

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@silversurfer562 silversurfer562 merged commit 225a756 into main Jul 1, 2026
36 of 37 checks passed
@silversurfer562 silversurfer562 deleted the claude/quirky-mestorf-5a26a8 branch July 1, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant