Skip to content

Stale FTS index entries persist after reset --reindex and reindex --search #765

@BunkerBuster73

Description

@BunkerBuster73

Summary

After a sequence of write_note(overwrite=True) failures, manual filesystem recovery, and basic-memory reset --reindex, the FTS search index continues to return entries that no longer correspond to records in the entities table. search returns these phantoms, but read_note and fetch correctly report Document not found for the same identifiers. Re-running basic-memory reindex --search does not clean them up.

Environment

  • Basic Memory version: 0.20.3 (latest stable, installed via uv tool)
  • Python: 3.14.3
  • OS: macOS (Apple Silicon, M3)
  • MCP transport: stdio (default), launched via Claude Desktop
  • Config: env: test, semantic_search_enabled: true, semantic_embedding_model: bge-small-en-v1.5, two local projects (main and note), single shared SQLite DB at ~/.basic-memory/memory.db
  • Note: project had last_sync: null and ~332 unindexed files prior to the incident (preexisting tech debt)

Steps to reproduce (approximate — full repro not attempted)

The exact trigger sequence in our case:

  1. Database in a state where basic-memory status shows hundreds of +new files (sync never completed).
  2. From an MCP client, call write_note(directory="notes/ai", title="<TitleA>", overwrite=True) intending to update an existing note in notes/ai/.
  3. The tool succeeds but writes to a different existing file at notes/progetti/<unrelated-name>.md instead of the intended target. The previous content of that file is destroyed and replaced with <TitleA> content. (This is itself a possible secondary issue with permalink/title fuzzy resolution under unsynced state.)
  4. Restore the affected file from filesystem backup (Time Machine).
  5. Manually rm the originally-intended file (which had been created in the wrong location anyway).
  6. Run basic-memory reset --reindex (confirmed y). Output reports "Database reset complete" and "Rebuilding search index for 2 project(s)" with embedding model download.
  7. basic-memory statusNo changes
  8. Run basic-memory reindex --search for good measure. Reports "Full-text search index rebuilt" for both projects.

Expected behavior

After reset --reindex plus reindex --search, the FTS index should contain entries only for files that currently exist in the filesystem and are present in the entities table. Titles in search results should match the current frontmatter.title (or auto-derived title) of the underlying entity.

Actual behavior

search returns phantom entries:

  • A permalink for a file that was rm'd before reset still appears in search results, with its pre-incident title.
  • A permalink whose underlying file was restored from Time Machine appears in search results with the wrong title (the title of the unrelated content that briefly overwrote it earlier).

read_note and fetch on the same identifiers correctly return:

{"metadata": {"error": "Document not found"}}

…or, for the file that exists with restored content, return the correct content with the correct frontmatter title — but the search index continues to advertise the stale title.

So: entities table is consistent with filesystem, FTS index is not.

Diagnostic evidence

$ basic-memory --version
Basic Memory CLI, version 0.20.3

$ basic-memory status
note: Status
└── No changes

Search returns phantom — file does not exist on disk:

$ # via MCP: search("<query>") returns id
$ # "notes/ai/<file-A-permalink>"
$ # with title "<file-A old title>"

$ ls "/Users/<me>/note-basic-memory/notes/ai/" | grep -i <file-A-prefix>
<file-A-current>.md

Note: only the current file exists; the old permalink file was rm'd.

fetch on the phantom permalink:

$ # via MCP: fetch("notes/ai/<file-A-permalink>")
$ # returns: {"metadata": {"error": "Document not found"}}

But search still returns it.

Workarounds attempted (none successful for the FTS phantoms)

  1. basic-memory reset --reindex — phantoms persist
  2. basic-memory reindex --search — phantoms persist
  3. touch <file> on affected files to bump mtime — status reports No changes (sync seems to verify by content checksum, not mtime, so touch is a no-op)
  4. Killing zombie basic-memory mcp processes from prior Claude Desktop sessions (we had 10 accumulated; killed 7) — no effect on FTS state

We have not yet tried pkill -9 -f "basic-memory mcp" + Claude Desktop full restart, which would force a fresh MCP server. That might or might not clear it depending on whether the FTS phantoms are persisted to the SQLite DB or held in MCP server in-memory cache.

Hypotheses (offered tentatively)

  • The FTS rebuild may be reading from a snapshot/cache that wasn't invalidated by reset. Possibly the reset drops entities and FTS tables but leaves a parallel structure (vector store / embedding cache) that the search path consults first.
  • Alternatively, when write_note(overwrite=True) writes to an unintended target due to fuzzy permalink/title resolution under desynced state, it may insert FTS rows that are never linked to a proper entity row, and these orphan FTS rows survive entities-driven rebuilds.
  • The fact that v0.20.3 already includes fix(core): exclude stale entity rows from embedding coverage stats #675 ("exclude stale entity rows from embedding coverage stats") suggests the maintainer is already aware of related stale-row issues; this may be an adjacent edge case not yet covered.

Suggested next steps

If the maintainer can confirm whether reset is supposed to fully truncate the FTS virtual table (and any embedding tables), that would help narrow the hypothesis. A --vacuum or --full-rebuild flag for reset that also drops/recreates FTS5 virtual tables and embedding storage might be a useful escape hatch for users in this state.

Thank you for the great work on this project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions