Skip to content

Commit bc0ae1f

Browse files
committed
Merge branch 'main' into add-platform-builds
2 parents 81a343c + 8ab9114 commit bc0ae1f

File tree

9 files changed

+1003
-89
lines changed

9 files changed

+1003
-89
lines changed

.gitignore

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,7 @@ yarn-error.log*
4747

4848
# System
4949
.DS_Store
50-
Thumbs.db
51-
CLAUDE.md
52-
53-
# Dart/Flutter
54-
.dart_tool/
55-
pubspec.lock
56-
.flutter-plugins
57-
.flutter-plugins-dependencies
58-
packages/flutter/native_libraries/
50+
test/unittest.dSYM/Contents/Info.plist
51+
test/unittest.dSYM/Contents/Resources/DWARF/unittest
52+
test/unittest.dSYM/Contents/Resources/Relocations/aarch64/unittest.yml
53+
/build

API.md

Lines changed: 94 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ A SQLite extension that provides semantic memory capabilities with hybrid search
55
## Table of Contents
66

77
- [Overview](#overview)
8+
- [Sync Behavior](#sync-behavior)
89
- [Loading the Extension](#loading-the-extension)
910
- [SQL Functions](#sql-functions)
1011
- [General Functions](#general-functions)
@@ -29,6 +30,31 @@ sqlite-memory enables semantic search over text content stored in SQLite. It:
2930

3031
---
3132

33+
## Sync Behavior
34+
35+
All `memory_sync_*` functions use **content-hash change detection** to avoid redundant embedding computation. Each piece of content is hashed before processing — if the hash already exists in the database, the content is skipped.
36+
37+
### Change Detection
38+
39+
| Scenario | Behavior |
40+
|----------|----------|
41+
| New content | Chunked, embedded, and indexed |
42+
| Unchanged content | Skipped (hash match) |
43+
| Modified file | Old entry atomically deleted, new content reindexed |
44+
| Deleted file | Entry removed during directory sync |
45+
46+
### Transactional Safety
47+
48+
Every sync operation is wrapped in a SQLite **SAVEPOINT** transaction. If any step fails (embedding error, disk issue, constraint violation), the entire operation rolls back. This guarantees:
49+
50+
- **No partially-indexed files** — content is either fully indexed or not at all
51+
- **No orphaned chunks** — embeddings and FTS entries are always consistent with `dbmem_content`
52+
- **Safe to retry** — a failed sync leaves the database in its previous valid state
53+
54+
This makes all sync functions idempotent and safe to call repeatedly (e.g., on a schedule or at application startup).
55+
56+
---
57+
3258
## Loading the Extension
3359

3460
### Dynamic Loading (Recommended)
@@ -174,9 +200,9 @@ SELECT memory_get_option('provider');
174200

175201
### Memory Management Functions
176202

177-
#### `memory_add_text(content TEXT [, context TEXT])`
203+
#### `memory_sync_text(content TEXT [, context TEXT])`
178204

179-
Adds text content to memory.
205+
Syncs text content to memory. Duplicate content (same hash) is skipped automatically.
180206

181207
**Parameters:**
182208
| Parameter | Type | Required | Description |
@@ -189,23 +215,24 @@ Adds text content to memory.
189215
**Notes:**
190216
- Content is chunked based on `max_tokens` and `overlay_tokens` settings
191217
- Each chunk is embedded and stored in `dbmem_vault`
192-
- Content hash prevents duplicate storage
218+
- Content hash prevents duplicate storage — calling with the same content is a no-op
219+
- Runs inside a SAVEPOINT transaction (see [Sync Behavior](#sync-behavior))
193220
- Sets `created_at` timestamp automatically
194221

195222
**Example:**
196223
```sql
197224
-- Add text without context
198-
SELECT memory_add_text('SQLite is a C-language library that implements a small, fast, self-contained SQL database engine.');
225+
SELECT memory_sync_text('SQLite is a C-language library that implements a small, fast, self-contained SQL database engine.');
199226

200227
-- Add text with context
201-
SELECT memory_add_text('Important meeting notes from 2024-01-15...', 'meetings');
228+
SELECT memory_sync_text('Important meeting notes from 2024-01-15...', 'meetings');
202229
```
203230

204231
---
205232

206-
#### `memory_add_file(path TEXT [, context TEXT])`
233+
#### `memory_sync_file(path TEXT [, context TEXT])`
207234

208-
Adds a file to memory.
235+
Syncs a file to memory. Unchanged files are skipped; modified files are atomically replaced.
209236

210237
**Parameters:**
211238
| Parameter | Type | Required | Description |
@@ -218,39 +245,51 @@ Adds a file to memory.
218245
**Notes:**
219246
- Only processes files matching configured extensions (default: `md,mdx`)
220247
- File path is stored in `dbmem_content.path`
248+
- If the file was previously indexed with different content, the old entry (chunks, embeddings, FTS) is deleted and new content is reindexed — all within a single SAVEPOINT transaction (see [Sync Behavior](#sync-behavior))
221249
- Not available when compiled with `DBMEM_OMIT_IO`
222250

223251
**Example:**
224252
```sql
225-
SELECT memory_add_file('/docs/readme.md');
226-
SELECT memory_add_file('/docs/api.md', 'documentation');
253+
SELECT memory_sync_file('/docs/readme.md');
254+
SELECT memory_sync_file('/docs/api.md', 'documentation');
227255
```
228256

229257
---
230258

231-
#### `memory_add_directory(path TEXT [, context TEXT])`
259+
#### `memory_sync_directory(path TEXT [, context TEXT])`
232260

233-
Recursively adds all matching files from a directory.
261+
Synchronizes a directory with memory. Adds new files, reindexes modified files, and removes entries for deleted files.
234262

235263
**Parameters:**
236264
| Parameter | Type | Required | Description |
237265
|-----------|------|----------|-------------|
238266
| `path` | TEXT | Yes | Full path to the directory |
239267
| `context` | TEXT | No | Optional context label applied to all files |
240268

241-
**Returns:** INTEGER - Number of files processed
269+
**Returns:** INTEGER - Number of new files processed
242270

243271
**Notes:**
244272
- Recursively scans subdirectories
245273
- Only processes files matching configured extensions
274+
- **Phase 1 — Cleanup**: Removes entries for files that no longer exist on disk
275+
- **Phase 2 — Scan**: Processes all matching files:
276+
- **New files** are chunked, embedded, and added to the index
277+
- **Unchanged files** are skipped (content hash match)
278+
- **Modified files** have their old entries atomically replaced with new content
279+
- Each file is processed inside its own SAVEPOINT transaction (see [Sync Behavior](#sync-behavior))
280+
- Safe to call repeatedly — only changed content triggers embedding computation
246281
- Not available when compiled with `DBMEM_OMIT_IO`
247282

248283
**Example:**
249284
```sql
250-
SELECT memory_add_directory('/path/to/docs');
251-
-- Returns: 42 (number of files added)
285+
SELECT memory_sync_directory('/path/to/docs');
286+
-- Returns: 42 (number of new files processed)
287+
288+
SELECT memory_sync_directory('/project/notes', 'project-notes');
252289

253-
SELECT memory_add_directory('/project/notes', 'project-notes');
290+
-- Safe to call again — unchanged files are skipped
291+
SELECT memory_sync_directory('/path/to/docs');
292+
-- Returns: 0 (nothing changed)
254293
```
255294

256295
---
@@ -319,6 +358,7 @@ Deletes all memories from the database.
319358
**Notes:**
320359
- Clears `dbmem_content`, `dbmem_vault`, and `dbmem_vault_fts`
321360
- Does not delete settings from `dbmem_settings`
361+
- Does not clear the embedding cache (`dbmem_cache`)
322362
- Uses SAVEPOINT transaction for atomicity
323363

324364
**Example:**
@@ -328,6 +368,35 @@ SELECT memory_clear();
328368

329369
---
330370

371+
#### `memory_cache_clear([provider TEXT, model TEXT])`
372+
373+
Clears the embedding cache.
374+
375+
**Parameters:**
376+
| Parameter | Type | Required | Description |
377+
|-----------|------|----------|-------------|
378+
| `provider` | TEXT | No | Provider name to clear cache for |
379+
| `model` | TEXT | No | Model name to clear cache for |
380+
381+
**Returns:** INTEGER - Number of cache entries deleted
382+
383+
**Notes:**
384+
- With 0 arguments: clears the entire embedding cache
385+
- With 2 arguments: clears cache entries for a specific provider/model combination
386+
- The embedding cache stores computed embeddings keyed by (text hash, provider, model) to avoid redundant computation
387+
- Safe to call at any time — does not affect stored memories
388+
389+
**Example:**
390+
```sql
391+
-- Clear entire cache
392+
SELECT memory_cache_clear();
393+
394+
-- Clear cache for a specific provider/model
395+
SELECT memory_cache_clear('openai', 'text-embedding-3-small');
396+
```
397+
398+
---
399+
331400
### `memory_search`
332401

333402
A virtual table for performing hybrid semantic search.
@@ -395,6 +464,9 @@ AND context = 'meetings';
395464
| `text_weight` | REAL | 0.5 | Weight for FTS in scoring |
396465
| `min_score` | REAL | 0.7 | Minimum score threshold for results |
397466
| `update_access` | INTEGER | 1 | Update last_accessed on search |
467+
| `embedding_cache` | INTEGER | 1 | Cache embeddings to avoid redundant computation |
468+
| `cache_max_entries` | INTEGER | 0 | Max cache entries (0 = no limit). When exceeded, oldest entries are evicted |
469+
| `search_oversample` | INTEGER | 0 | Search oversampling multiplier (0 = no oversampling). When set, retrieves N * multiplier candidates from each index before merging down to N final results |
398470

399471
---
400472

@@ -404,7 +476,7 @@ The extension tracks two timestamps for each memory:
404476

405477
### `created_at`
406478

407-
- Set automatically when content is added via `memory_add_text`, `memory_add_file`, or `memory_add_directory`
479+
- Set automatically when content is added via `memory_sync_text`, `memory_sync_file`, or `memory_sync_directory`
408480
- Stored as Unix timestamp (seconds since 1970-01-01 00:00:00 UTC)
409481
- Never updated after initial creation
410482

@@ -445,8 +517,8 @@ SELECT memory_set_option('max_tokens', 512);
445517
SELECT memory_set_option('min_score', 0.75);
446518

447519
-- Add content
448-
SELECT memory_add_text('SQLite is a C library that provides a lightweight disk-based database.', 'sqlite-docs');
449-
SELECT memory_add_directory('/docs/sqlite', 'sqlite-docs');
520+
SELECT memory_sync_text('SQLite is a C library that provides a lightweight disk-based database.', 'sqlite-docs');
521+
SELECT memory_sync_directory('/docs/sqlite', 'sqlite-docs');
450522

451523
-- Search
452524
SELECT path, snippet, ranking
@@ -474,9 +546,9 @@ SELECT memory_clear();
474546

475547
```sql
476548
-- Add memories with different contexts
477-
SELECT memory_add_text('Meeting notes...', 'meetings');
478-
SELECT memory_add_text('API documentation...', 'api-docs');
479-
SELECT memory_add_text('Tutorial content...', 'tutorials');
549+
SELECT memory_sync_text('Meeting notes...', 'meetings');
550+
SELECT memory_sync_text('API documentation...', 'api-docs');
551+
SELECT memory_sync_text('Tutorial content...', 'tutorials');
480552

481553
-- Search within a context
482554
SELECT * FROM memory_search
@@ -546,6 +618,6 @@ Errors can be caught using standard SQLite error handling mechanisms.
546618

547619
```sql
548620
-- Example error handling in application code
549-
SELECT memory_add_text(123); -- Error: expects TEXT parameter
621+
SELECT memory_sync_text(123); -- Error: expects TEXT parameter
550622
SELECT memory_delete('abc'); -- Error: expects INTEGER parameter
551623
```

CLAUDE.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# CLAUDE.md
2+
3+
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
4+
5+
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
6+
7+
## 1. Think Before Coding
8+
9+
**Don't assume. Don't hide confusion. Surface tradeoffs.**
10+
11+
Before implementing:
12+
- State your assumptions explicitly. If uncertain, ask.
13+
- If multiple interpretations exist, present them - don't pick silently.
14+
- If a simpler approach exists, say so. Push back when warranted.
15+
- If something is unclear, stop. Name what's confusing. Ask.
16+
17+
## 2. Simplicity First
18+
19+
**Minimum code that solves the problem. Nothing speculative.**
20+
21+
- No features beyond what was asked.
22+
- No abstractions for single-use code.
23+
- No "flexibility" or "configurability" that wasn't requested.
24+
- No error handling for impossible scenarios.
25+
- If you write 200 lines and it could be 50, rewrite it.
26+
27+
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
28+
29+
## 3. Surgical Changes
30+
31+
**Touch only what you must. Clean up only your own mess.**
32+
33+
When editing existing code:
34+
- Don't "improve" adjacent code, comments, or formatting.
35+
- Don't refactor things that aren't broken.
36+
- Match existing style, even if you'd do it differently.
37+
- If you notice unrelated dead code, mention it - don't delete it.
38+
39+
When your changes create orphans:
40+
- Remove imports/variables/functions that YOUR changes made unused.
41+
- Don't remove pre-existing dead code unless asked.
42+
43+
The test: Every changed line should trace directly to the user's request.
44+
45+
## 4. Goal-Driven Execution
46+
47+
**Define success criteria. Loop until verified.**
48+
49+
Transform tasks into verifiable goals:
50+
- "Add validation" → "Write tests for invalid inputs, then make them pass"
51+
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
52+
- "Refactor X" → "Ensure tests pass before and after"
53+
54+
For multi-step tasks, state a brief plan:
55+
```
56+
1. [Step] → verify: [check]
57+
2. [Step] → verify: [check]
58+
3. [Step] → verify: [check]
59+
```
60+
61+
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
62+
63+
---
64+
65+
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.

0 commit comments

Comments
 (0)