You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a defer_embeddings option that stores content without computing
embeddings or FTS entries, so callers (e.g. a dashboard upload) can add
files instantly without an embedding model and index them later from a
background process.
- memory_set_option('defer_embeddings', 1): memory_add_* functions only
store content in dbmem_content; requires save_content=1
- memory_embed_pending([limit]): embeds pending rows in batches, one
SAVEPOINT per file, so an interrupted worker can be safely retried;
rekeys rows whose stored hash no longer matches the current
preserve_duplicate_paths scope
- memory_pending_count(): number of rows awaiting embeddings, for
progress reporting
- memory_list_files(): file nodes now include an "indexed" boolean
- content parsing to zero chunks (e.g. whitespace-only) now inserts a
zero-length sentinel row in dbmem_vault marking it processed, so it
exits the pending state and memory_reindex stops re-parsing it
(sqlite-vector >= 0.9.80 skips undersized blobs during scans)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: API.md
+58-1Lines changed: 58 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -311,6 +311,7 @@ Indexes caller-provided file content without reading from the filesystem.
311
311
- With `preserve_duplicate_paths=1`, an empty `content` value and a trailing slash in `path` creates an explicit empty directory marker, for example `memory_add_content('dirname/', '')`
312
312
- Directory markers are stored in `dbmem_content` with a trailing slash path, are shown as directories by `memory_list_files()`, and are not indexed for search
313
313
- Directory marker paths cannot contain non-empty content and cannot conflict with a file path of the same name
314
+
- With `defer_embeddings=1`, content is stored without computing embeddings or FTS entries (no embedding model required); generate them later with `memory_embed_pending()`
314
315
- Available even when compiled with `DBMEM_OMIT_IO`
315
316
316
317
**Example:**
@@ -433,11 +434,12 @@ Returns a JSON tree with the indexed directories and files stored in `dbmem_cont
433
434
- Directory nodes are derived from indexed file paths and explicit directory markers
434
435
- Path separators are normalized to `/` in the returned JSON
435
436
- Sibling nodes are sorted with directories first, then files; each group is alphabetical
437
+
- File nodes include an `indexed` boolean: `false` while content is waiting for embedding generation (see `defer_embeddings` and `memory_embed_pending()`), `true` otherwise
Generates embeddings and FTS entries for content stored without them (see the `defer_embeddings` option).
656
+
657
+
**Parameters:**
658
+
| Parameter | Type | Required | Description |
659
+
|-----------|------|----------|-------------|
660
+
|`limit`| INTEGER | No | Maximum number of pending content rows to process in this call (must be positive). When omitted, all pending rows are processed |
661
+
662
+
**Returns:** INTEGER - Number of pending content rows processed
663
+
664
+
**Notes:**
665
+
- Requires an embedding model configured with `memory_set_model()` or loaded from persisted provider/model settings
666
+
- A content row is pending when it has a non-empty stored `value` and no `dbmem_vault` entries
667
+
- Each row is processed in its own SAVEPOINT transaction, so a row is either fully indexed or untouched; a failed or interrupted call can simply be retried and other connections can observe per-file progress while a batch is running
668
+
- Content whose parsing produces no chunks (e.g. whitespace-only text) is marked as processed so it is not retried
669
+
- Designed for background workers: call in a loop with a small `limit` and poll `memory_pending_count()` to report progress
670
+
- Returns 0 when nothing is pending
671
+
672
+
**Example:**
673
+
```sql
674
+
-- store content instantly, without embeddings
675
+
SELECT memory_set_option('defer_embeddings', 1);
676
+
SELECT memory_add_content('docs/api.md', '# API\nUploaded from the dashboard.');
677
+
678
+
-- later, from a background process: embed in batches of 10
679
+
SELECT memory_embed_pending(10);
680
+
681
+
-- or process the whole backlog in one call
682
+
SELECT memory_embed_pending();
683
+
```
684
+
685
+
---
686
+
687
+
#### `memory_pending_count()`
688
+
689
+
Returns the number of content rows waiting for embedding generation.
690
+
691
+
**Parameters:** None
692
+
693
+
**Returns:** INTEGER - Number of pending content rows
694
+
695
+
**Notes:**
696
+
- Counts rows with a non-empty stored `value` and no `dbmem_vault` entries
697
+
- Useful for progress reporting: `1 - pending/total` while a `memory_embed_pending()` loop is running
698
+
- Empty files and directory markers are never counted as pending
699
+
700
+
**Example:**
701
+
```sql
702
+
SELECT memory_pending_count();
703
+
```
704
+
705
+
---
706
+
651
707
### `memory_search`
652
708
653
709
A virtual table for performing hybrid semantic search.
| `cache_max_entries` | INTEGER | 0 | Max cache entries (0 = no limit). When exceeded, oldest entries are evicted |
847
903
| `search_oversample` | INTEGER | 0 | Search oversampling multiplier (0 = no oversampling). When set, retrieves N * multiplier candidates from each index before merging down to N final results |
848
904
| `preserve_duplicate_paths` | INTEGER | 0 | Preserve distinct logical paths for identical or empty content. When enabled, `dbmem_content.hash` is path-scoped and identifies an entry rather than only the raw content |
905
+
| `defer_embeddings` | INTEGER | 0 | Store content without computing embeddings or FTS entries. Deferred content is invisible to search until processed with `memory_embed_pending()` or `memory_reindex()`. Requires `save_content=1` |
Copy file name to clipboardExpand all lines: README.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -245,6 +245,25 @@ Directory markers are listed as directories, materialized as directories by `mem
245
245
246
246
This makes all sync functions safe to call repeatedly - for example, on a cron schedule or at agent startup - with minimal overhead.
247
247
248
+
## Deferred Embeddings
249
+
250
+
For interactive workflows (e.g. a dashboard upload) where content should appear immediately and embeddings can be computed later by a background process, enable deferred mode:
251
+
252
+
```sql
253
+
-- store content instantly: no embedding model needed, nothing is computed
254
+
SELECT memory_set_option('defer_embeddings', 1);
255
+
SELECT memory_add_content('docs/api.md', '# API\nUploaded from the dashboard.');
256
+
257
+
-- pending files are visible right away ("indexed":false in the JSON tree)
258
+
SELECT memory_list_files();
259
+
260
+
-- later, from a background worker: embed in batches and report progress
261
+
SELECT memory_embed_pending(10); -- returns rows processed in this batch
262
+
SELECT memory_pending_count(); -- rows still waiting
263
+
```
264
+
265
+
Deferred content is stored in `dbmem_content` but is invisible to `memory_search` until it is embedded. Each file is embedded in its own transaction, so a file is either fully indexed or still pending — an interrupted worker can simply be restarted, and other connections can watch progress while a batch runs.
266
+
248
267
## Agent Memory Sync
249
268
250
269
Multiple agents can share and merge knowledge without any coordination. Each agent works independently with its own local SQLite database, syncing through a shared [SQLiteCloud](https://sqlitecloud.io/) managed database when connectivity is available.
0 commit comments