feat(scripts): cross-process persistence mode for recall benchmark (Run 3) by silversurfer562 · Pull Request #1209 · Smart-AI-Memory/attune-ai

silversurfer562 · 2026-07-01T22:41:58Z

Summary

Follow-up to #1208, closing the starter's recommended next step: Runs 1–2 of the memory recall benchmark captured and queried within the same PersonalMemory instance/process, leaving cross-process persistence an untested assumption. This adds a --phase persistence mode that captures the corpus in one subprocess, lets it exit, then evaluates from a brand-new PersonalMemory instance in a second subprocess against the same on-disk global_root.

Result (logged as Run 3 in the spec's decisions.md)

Identical to Run 2 in every dimension — hit@1 18/18 (100%), hit@3 18/18 (100%), byte-identical positive/negative score distributions. Recall is fully file-backed and survives process death; the "probably fine mechanically" assumption is now a measured fact.

Changes

scripts/memory_recall_eval.py: split run_benchmark() into capture_corpus() + evaluate(); add --phase {all,persistence,capture,evaluate} CLI (default all preserves Runs 1–2 behavior exactly). Subprocess results pass via a JSON file, not stdout — attune_rag's structlog lines print to stdout and corrupt inline JSON.
docs/specs/memory-recall-eval/decisions.md: Run 3 entry with method, numbers, and verdict.

Verification

Both modes run green locally with identical output:

--phase all (default): hit@1 18/18, hit@3 18/18 — matches Run 2
--phase persistence: hit@1 18/18, hit@3 18/18 — same distributions

🤖 Generated with Claude Code

…un 3) Adds --phase {all,persistence,capture,evaluate} to scripts/memory_recall_eval.py. Persistence mode captures the corpus in one subprocess, lets it exit, then evaluates from a brand-new PersonalMemory instance in a second subprocess against the same on-disk global root. Result: identical to Run 2 in every dimension (hit@1 18/18, hit@3 18/18, same score distributions) - recall is fully file-backed and survives process death. Logged as Run 3 in docs/specs/memory-recall-eval/decisions.md. Results pass between subprocesses via a JSON file rather than stdout because attune_rag's structlog output prints to stdout.

vercel · 2026-07-01T22:42:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
attune-ai.dev	Ready	Preview, Comment	Jul 1, 2026 10:42pm
website	Ready	Preview, Comment	Jul 1, 2026 10:42pm

codecov · 2026-07-01T22:50:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions Bot added the documentation Improvements or additions to documentation label Jul 1, 2026

vercel Bot deployed to Preview – attune-ai.dev July 1, 2026 22:42 View deployment

vercel Bot deployed to Preview – website July 1, 2026 22:42 View deployment

silversurfer562 merged commit 225a756 into main Jul 1, 2026
36 of 37 checks passed

silversurfer562 deleted the claude/quirky-mestorf-5a26a8 branch July 1, 2026 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(scripts): cross-process persistence mode for recall benchmark (Run 3)#1209

feat(scripts): cross-process persistence mode for recall benchmark (Run 3)#1209
silversurfer562 merged 1 commit into
mainfrom
claude/quirky-mestorf-5a26a8

silversurfer562 commented Jul 1, 2026

Uh oh!

vercel Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

silversurfer562 commented Jul 1, 2026

Summary

Result (logged as Run 3 in the spec's decisions.md)

Changes

Verification

Uh oh!

vercel Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jul 1, 2026 •

edited

Loading