You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the preserve_duplicate_paths option for virtual-file/editor workflows that need distinct logical paths even when content is identical or empty.
When enabled with SELECT memory_set_option('preserve_duplicate_paths', 1), storage hashes are scoped by path so dbmem_content can keep separate rows while the embedding cache still reuses chunk embeddings by text.
Fix empty content handling so memory_add_content() and memory_add_file() can store zero-length entries without producing chunks, and keep default deduplication behavior unchanged when the option is 0.
Document the option, bump the extension version to 1.3.2, and cover default dedupe, duplicate preservation, and empty file/content behavior with unit tests.
Copy file name to clipboardExpand all lines: API.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ sqlite-memory enables semantic search over text content stored in SQLite. It:
35
35
36
36
## Sync Behavior
37
37
38
-
All `memory_add_*` functions use **content-hash change detection** to avoid redundant embedding computation. Each piece of content is hashed before processing — if the hash already exists in the database, the content is skipped.
38
+
By default, all `memory_add_*` functions use **content-hash change detection** to avoid redundant embedding computation. Each piece of content is hashed before processing — if the hash already exists in the database, the content is skipped. Set `preserve_duplicate_paths=1` to store distinct logical paths even when their content is identical or empty.
| `cache_max_entries` | INTEGER | 0 | Max cache entries (0 = no limit). When exceeded, oldest entries are evicted |
830
834
| `search_oversample` | INTEGER | 0 | Search oversampling multiplier (0 = no oversampling). When set, retrieves N * multiplier candidates from each index before merging down to N final results |
835
+
| `preserve_duplicate_paths` | INTEGER | 0 | Preserve distinct logical paths for identical or empty content. When enabled, `dbmem_content.hash` is path-scoped and identifies an entry rather than only the raw content |
Copy file name to clipboardExpand all lines: README.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -210,7 +210,7 @@ memories = recall("what's the project timeline")
210
210
211
211
## Intelligent Sync
212
212
213
-
All`memory_add_*` functions use content-hash change detection to avoid redundant work:
213
+
By default, all`memory_add_*` functions use content-hash change detection to avoid redundant work:
214
214
215
215
-**`memory_add_text`**: Computes a hash of the content. If the same content was already indexed, it is skipped entirely. No duplicate embeddings are ever created.
216
216
-**`memory_add_file`**: Reads the file and hashes its content. If the file was previously indexed with different content, the old entry (chunks, embeddings, FTS) is atomically replaced. Unchanged files are skipped. Absolute file paths are stored as portable logical suffixes, while the original local path is retained only in local metadata.
@@ -219,6 +219,14 @@ All `memory_add_*` functions use content-hash change detection to avoid redundan
219
219
1.**Cleanup**: Removes database entries for files that no longer exist on disk
220
220
2.**Scan**: Recursively processes all matching files - adding new ones, replacing modified ones, and skipping unchanged ones. Stored paths are relative to the scanned directory root, with local provenance retained only in local metadata.
221
221
222
+
For virtual-file or editor workflows that need separate logical paths even when content is identical or empty, enable path-preserving storage:
In this mode, `dbmem_content.hash` identifies the stored entry and is scoped by path.
229
+
222
230
`memory_add_text()`, `memory_add_file()`, and `memory_add_content()` each run inside a SQLite SAVEPOINT transaction. `memory_add_directory()` performs its cleanup pass transactionally and then processes each file in its own transaction. If one file fails, that file rolls back cleanly and previously-committed files remain valid; there are no partially-indexed rows or orphaned chunk/FTS entries for the failed file.
223
231
224
232
This makes all sync functions safe to call repeatedly - for example, on a cron schedule or at agent startup - with minimal overhead.
0 commit comments