Skip to content

[BUG] Bug: no such module: vec0 when running bm reindex --embeddings #829

@gangshi-github

Description

@gangshi-github

Bug: no such module: vec0 when running bm reindex --embeddings

Environment

  • OS: Windows 11
  • Python: 3.12.10
  • basic-memory: 0.20.3
  • sqlite-vec: v0.1.9
  • Installation: uv tool install basic-memory --with sqlite-vec

Description

Running basic-memory reindex --embeddings fails with:

OperationalError: (sqlite3.OperationalError) no such module: vec0
[SQL: DELETE FROM search_vector_embeddings WHERE rowid IN (
  SELECT id FROM search_vector_chunks 
  WHERE project_id = ? AND entity_id NOT IN (SELECT id FROM entity WHERE project_id = ?)
)]
[parameters: (2, 2)]

Root Cause

In search_service.py, the reindex_vectors() method calls _purge_stale_search_rows() before sync_entity_vectors_batch():

async def reindex_vectors(self, progress_callback=None) -> dict:
    entities = await self.entity_repository.find_all()
    entity_ids = [entity.id for entity in entities]

    # BUG: This runs BEFORE sqlite-vec extension is loaded!
    await self._purge_stale_search_rows()

    # _ensure_sqlite_vec_loaded is called inside this
    batch_result = await self.repository.sync_entity_vectors_batch(
        entity_ids,
        progress_callback=progress_callback,
    )

_purge_stale_search_rows() executes DELETE FROM search_vector_embeddings (a vec0 virtual table) without first calling _ensure_sqlite_vec_loaded(). The sqlite-vec extension is only loaded lazily inside sync_entity_vectors_batch() -> _ensure_vector_tables() -> _ensure_sqlite_vec_loaded().

Additional Issue: bm reset doesn't fully clean vec0 tables

After basic-memory reset, the vec0 virtual tables (search_vector_embeddings, search_vector_embeddings_chunks, search_vector_embeddings_info, search_vector_embeddings_rowids, search_vector_embeddings_vector_chunks00) remain in the database because the reset operation also doesn't load the sqlite-vec extension before attempting to drop them. This means subsequent reindex operations still encounter the stale vec0 tables.

Verification

I confirmed that sqlite-vec loads correctly in both sync and async modes:

# Sync sqlite3 - works
import sqlite3, sqlite_vec
conn = sqlite3.connect(":memory:")
conn.enable_load_extension(True)
conn.load_extension(sqlite_vec.loadable_path())
print(conn.execute("SELECT vec_version()").fetchone())  # ('v0.1.9',)

# aiosqlite via SQLAlchemy - works
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
engine = create_async_engine("sqlite+aiosqlite:///:memory:")
async with AsyncSession(engine) as session:
    async_connection = await session.connection()
    raw_connection = await async_connection.get_raw_connection()
    driver_connection = raw_connection.driver_connection
    await driver_connection.enable_load_extension(True)
    await driver_connection.load_extension(sqlite_vec.loadable_path())
    result = await session.execute(text("SELECT vec_version()"))
    print(result.fetchone())  # ('v0.1.9',)

So the extension itself is fine - the bug is purely that _purge_stale_search_rows() doesn't ensure it's loaded before querying vec0 tables.

Suggested Fix

Add _ensure_sqlite_vec_loaded() call in _purge_stale_search_rows() before operating on vec0 virtual tables:

async def _purge_stale_search_rows(self) -> None:
    from basic_memory.repository.sqlite_search_repository import SQLiteSearchRepository
    from sqlalchemy import text

    project_id = self.repository.project_id
    stale_entity_filter = (
        "entity_id NOT IN (SELECT id FROM entity WHERE project_id = :project_id)"
    )
    params = {"project_id": project_id}

    # Delete stale search_index rows
    await self.repository.execute_query(...)

    # SQLite vec has no CASCADE - must delete embeddings before chunks
    if isinstance(self.repository, SQLiteSearchRepository):
        # FIX: Ensure sqlite-vec extension is loaded
        async with db.scoped_session(self.repository.session_maker) as vec_session:
            await self.repository._ensure_sqlite_vec_loaded(vec_session)

        await self.repository.execute_query(
            text(
                "DELETE FROM search_vector_embeddings WHERE rowid IN ("
                "SELECT id FROM search_vector_chunks "
                "WHERE project_id = :project_id AND {stale_entity_filter})"
            ),
            params,
        )

Workaround

  1. Disable semantic search in config.json: "semantic_search_enabled": false
  2. Use bm reindex --search (full-text only, no embeddings)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions