docs(memory): first real curated-memory dogfood — friction entries + find_similar spot-check (#1210)

silversurfer562 · web-flow · commit 97bfcb8e671d · 2026-07-01T19:43:37.000-04:00
First real add_finding() captures through the 4 curated NodeTypes (PR #1207), written to ~/.attune/memory/curated_graph.json and receipt-verified from a fresh instance. Logs three friction points (repo-tracked cwd-relative default path; RELATED_TO direction defaults dropping the symmetric read; find_similar signature + threshold=0.5 muting realistic queries) plus the clean fits, and closes the session-starter's find_similar sanity-check item in the recall-eval spec (alive, not query()-dead — default tuning issue).
diff --git a/docs/specs/memory-nodetype-friction-log/decisions.md b/docs/specs/memory-nodetype-friction-log/decisions.md
@@ -36,5 +36,83 @@ mistake a pre-existing implementation gap for a taxonomy-fit problem.
 
 ---
 
-_(no friction entries yet — log starts on first real `add_finding()`
-call using one of the 4 new types)_
+## 2026-07-01 — First real captures: 4 nodes, one per type
+
+**What was recorded** (real session findings, written to
+`~/.attune/memory/curated_graph.json` via the real public API, receipt
+verified by reloading from a brand-new `MemoryGraph` instance):
+
+- `PROJECT_CONTEXT` — "PersonalMemory recall is file-backed and
+  survives process death" (the Run 3 verdict, PR #1209)
+- `REFERENCE` — pointer to the canonical benchmark numbers
+  (`docs/specs/memory-recall-eval/decisions.md`)
+- `FEEDBACK` — "prefer real round-trip tests over mocks when touching
+  attune.memory" (Patrick-endorsed, both #1208 bugs were mock-blind)
+- `USER_CONTEXT` — Patrick's stated active priority (2026-07-01): make
+  the memory system genuinely work for the agent
+- Two `RELATED_TO` edges (`REFERENCE → PROJECT_CONTEXT`,
+  `FEEDBACK → PROJECT_CONTEXT`)
+
+**Clean fits:** the 4-type taxonomy matched all four captures with no
+forcing — nothing needed a fifth type, nothing straddled two types.
+`status="active"` persisted and reloaded correctly (the #1208 fix
+working live). `severity` staying unset read naturally. Tags carried
+fine. `workflow=""` for curated nodes worked as documented.
+
+**Friction A — storage location (the biggest one).** `MemoryGraph`'s
+default path is `patterns/memory_graph.json`: **cwd-relative and
+git-tracked**. Curated *cross-session* memory written through the
+default would either get committed into the repo or stranded inside
+whichever worktree the session happened to run in (this repo runs many
+parallel worktrees). Workaround: explicit
+`path=~/.attune/memory/curated_graph.json`. Signal: curated memory
+needs a durable default home outside the repo, distinct from
+workflow-findings graphs — the current default is shaped for
+per-project workflow state, not cross-session memory.
+
+**Friction B — `RELATED_TO` direction ergonomics.** The natural
+`[[link]]` usage — add an edge, then ask the *target* node for related
+nodes — silently returns `[]`: `RELATED_TO` is declared symmetric
+(`REVERSE_EDGE_TYPES` maps it to itself) but `add_edge` defaults
+`bidirectional=False` and `find_related` defaults
+`direction="outgoing"`, so symmetry exists only if the writer or
+reader remembers to ask for it. Workaround: `direction="both"` at read
+time (verified: returns both linked nodes). Signal: for curated
+memory, symmetric edge types should traverse both directions by
+default; the current defaults quietly drop half the link graph.
+
+**Friction C — `find_similar` signature + threshold (also closes the
+session-starter's #3 sanity-check).** Two parts. (a) Signature
+inconsistency: `find_related` takes a node id, `find_similar` takes a
+finding *dict* — first call attempt passed an id and died with
+`AttributeError: 'str' object has no attribute 'get'`. (b) The default
+`threshold=0.5` over Jaccard word-overlap mutes realistic queries:
+"recall benchmark persistence numbers" scored 0.301 against the
+project-context node and 0.125 against the reference node — both
+filtered out at the default; only a near-verbatim name clears 0.5
+(verbatim self-match = 1.0, so this is NOT the `PersonalMemory.query()`
+dead-path class — the mechanism works, the default tuning makes it
+effectively silent). Workaround: `threshold≈0.25`, or use
+`PersonalMemory.query()` for text recall. Cross-referenced in
+`docs/specs/memory-recall-eval/decisions.md`.
+
+**Minor:** `EdgeType` lives in `attune.memory.edges`, not
+`attune.memory.nodes` (where the curated-memory docstring pointing at
+it lives) — cost one failed import.
+
+---
+
+## Adjacent observations (not R1-scope — different subsystem)
+
+- **2026-07-01 — cross-project recall noise in the stash/recall
+  surface.** A session resume in this repo surfaced "recent findings
+  from this project" that included precious-metals-conversation
+  findings ("silver volatility over 20 years", "consider reinvesting
+  dividends") plus one entry too context-free to act on ("Identity
+  rewrite may be more done than the memory note implies").
+  Recency-ranked recall with no topical/project gate pulls whatever
+  was stashed last, and project attribution leaked across sessions.
+  This is the stash/recall-hook subsystem, not `add_finding()`, so it
+  is logged here as adjacent evidence only — but it is the same
+  product question this spec exists to answer (does recalled memory
+  read as trustworthy?), and today the answer on that surface was no.
diff --git a/docs/specs/memory-recall-eval/decisions.md b/docs/specs/memory-recall-eval/decisions.md
@@ -130,3 +130,27 @@ a measured fact. The single-process default (`--phase all`) reproduces
 the same numbers, so the two methodologies are interchangeable for
 future runs; use `--phase persistence` when the change under test
 touches serialization or file layout.
+
+## 2026-07-01 — Sanity check: `MemoryGraph.find_similar` (deferred non-goal, spot-checked)
+
+The non-goals deferred `MemoryGraph` recall to a future experiment;
+given Run 1 found `PersonalMemory.query()` silently dead, a cheap
+spot-check was worth it. Done during the first real curated-memory
+captures (see
+`docs/specs/memory-nodetype-friction-log/decisions.md`, Friction C):
+
+- **Not the dead-path class.** A verbatim node name self-matches at
+  score 1.0 — the mechanism (Jaccard word-overlap over
+  name/description, type/file bonuses) works.
+- **But the default `threshold=0.5` mutes realistic queries.** A
+  natural paraphrase ("recall benchmark persistence numbers") scored
+  0.301 / 0.125 against nodes it should plausibly find — both filtered
+  at the default. Callers get `[]` for anything short of near-verbatim
+  names, which *reads* like the query() bug from the outside.
+- **Workaround:** pass `threshold≈0.25`, or use
+  `PersonalMemory.query()` for text recall.
+
+No full ground-truthed benchmark run for `find_similar` yet — this was
+a spot-check, not Run 4. If curated-graph usage grows (friction-log
+spec), rerun this corpus's methodology against `find_similar` with a
+tuned threshold.