Skip to content
This repository was archived by the owner on Feb 18, 2026. It is now read-only.

Commit f3d2b1f

Browse files
committed
feat(diskann_vtab): implement Phase 1 virtual table with MATCH search
Rewrites diskann_vtab.c with proper xCreate/xConnect split, MATCH-based ANN search with k and LIMIT support, ROWID lookup for DELETE, shadow table protection (iVersion=3 + xShadowName), and xDestroy for DROP. Key fixes: two-pass xBestIndex to handle SQLite's non-deterministic constraint ordering, SAVEPOINT tolerance for vtab context (SQLITE_BUSY), and %w quoting for table names in ROWID lookup SQL. 145 tests (126 C API + 19 vtab), ASan + Valgrind clean.
1 parent fcfdf55 commit f3d2b1f

8 files changed

Lines changed: 1625 additions & 325 deletions

README.md

Lines changed: 97 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@ npm install better-sqlite3
5555

5656
## Quick Start
5757

58-
### With @photostructure/sqlite
58+
### Virtual Table Interface (Recommended)
59+
60+
The virtual table interface provides standard SQL operations with full query planner integration:
5961

6062
```typescript
6163
import { DatabaseSync } from "@photostructure/sqlite";
@@ -64,26 +66,44 @@ import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
6466
const db = new DatabaseSync(":memory:", { allowExtension: true });
6567
loadDiskAnnExtension(db);
6668

67-
// Create index for 128-dimensional vectors
69+
// Create virtual table for 128-dimensional vectors
6870
db.exec(`
69-
SELECT diskann_create('my_index', 'my_db', 128, 64, 1.2, 32);
71+
CREATE VIRTUAL TABLE embeddings USING diskann(
72+
dimension=128,
73+
metric=cosine
74+
)
7075
`);
7176

72-
// Insert vector
77+
// Insert vectors with explicit rowid
7378
const vector = new Float32Array(128);
74-
db.prepare("SELECT diskann_insert(?, ?, ?)").run("my_index", "my_db", 1, vector);
79+
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vector);
7580

76-
// Search for 10 nearest neighbors
81+
// Search for 10 nearest neighbors using MATCH operator
7782
const results = db
7883
.prepare(
7984
`
80-
SELECT rowid, distance
81-
FROM diskann_search('my_index', 'my_db', ?, 10, 100)
82-
`
85+
SELECT rowid, distance
86+
FROM embeddings
87+
WHERE vector MATCH ? AND k = 10
88+
`
8389
)
8490
.all(vector);
91+
92+
// Delete a vector
93+
db.prepare("DELETE FROM embeddings WHERE rowid = ?").run(1);
94+
95+
// Drop the entire index
96+
db.exec("DROP TABLE embeddings");
8597
```
8698

99+
**Virtual table features**:
100+
101+
- Standard SQL INSERT/DELETE/DROP operations
102+
- MATCH operator for ANN search with `k` parameter
103+
- LIMIT support for capping results
104+
- Automatic shadow table management
105+
- Full transactional consistency
106+
87107
### With better-sqlite3
88108

89109
```typescript
@@ -93,13 +113,21 @@ import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
93113
const db = new Database(":memory:");
94114
loadDiskAnnExtension(db);
95115

96-
// Now you can use DiskANN functions
116+
// Create virtual table
97117
db.exec(`
98118
CREATE VIRTUAL TABLE embeddings USING diskann(
99119
dimension=512,
100120
metric=cosine
101121
)
102122
`);
123+
124+
// Insert and search work the same as above
125+
const vector = new Float32Array(512);
126+
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vector);
127+
128+
const results = db
129+
.prepare("SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10")
130+
.all(vector);
103131
```
104132

105133
### With node:sqlite (Node 22.5+, experimental)
@@ -111,15 +139,48 @@ import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
111139
const db = new DatabaseSync(":memory:", { allowExtension: true });
112140
loadDiskAnnExtension(db);
113141

114-
// Now you can use DiskANN functions
142+
// Create virtual table
115143
db.exec(`
116144
CREATE VIRTUAL TABLE embeddings USING diskann(
117145
dimension=512,
118146
metric=cosine
119147
)
120148
`);
149+
150+
// Insert and search work the same as above
151+
const vector = new Float32Array(512);
152+
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vector);
153+
154+
const results = db
155+
.prepare("SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10")
156+
.all(vector);
121157
```
122158

159+
### C API (Advanced)
160+
161+
For direct C API usage, the lower-level functions are still available:
162+
163+
```c
164+
// Create index
165+
diskann_create_index(db, "main", "my_index", &config);
166+
167+
// Open index
168+
DiskAnnIndex *idx;
169+
diskann_open_index(db, "main", "my_index", &idx);
170+
171+
// Insert vector
172+
diskann_insert(idx, rowid, vector, dims);
173+
174+
// Search
175+
DiskAnnResult results[10];
176+
int count = diskann_search(idx, query, dims, 10, results);
177+
178+
// Close
179+
diskann_close_index(idx);
180+
```
181+
182+
See [`src/diskann.h`](./src/diskann.h) for full C API documentation.
183+
123184
## Why DiskANN?
124185
125186
Most SQLite vector extensions either:
@@ -134,32 +195,36 @@ See [`_research/sqlite-vector-options.md`](./_research/sqlite-vector-options.md)
134195
135196
## API Reference
136197
137-
### C API
198+
### Virtual Table SQL
138199
139-
```c
140-
// Create index
141-
int diskann_create(const char *index_name, const char *db_name,
142-
int vector_dim, int max_neighbors,
143-
float pruning_alpha, int search_list_size);
200+
```sql
201+
-- Create index for N-dimensional vectors
202+
CREATE VIRTUAL TABLE table_name USING diskann(
203+
dimension=N, -- Required: vector dimensionality
204+
metric=euclidean|cosine|dot, -- Optional: distance metric (default: cosine)
205+
max_degree=64, -- Optional: max graph degree (default: 64)
206+
build_search_list_size=100 -- Optional: search quality (default: 100)
207+
);
144208
145-
// Insert vector
146-
int diskann_insert(const char *index_name, const char *db_name,
147-
sqlite3_int64 rowid, const float *vector);
209+
-- Insert vector (rowid required, no auto-increment)
210+
INSERT INTO table_name(rowid, vector) VALUES (?, ?);
148211
149-
// Search
150-
int diskann_search(const char *index_name, const char *db_name,
151-
const float *query, int k, int search_list_size,
152-
diskann_search_result **results, int *result_count);
212+
-- Search for k nearest neighbors using MATCH operator
213+
SELECT rowid, distance
214+
FROM table_name
215+
WHERE vector MATCH ? AND k = ?
216+
LIMIT ?; -- Optional: caps result count
153217
154-
// Delete vector
155-
int diskann_delete(const char *index_name, const char *db_name,
156-
sqlite3_int64 rowid);
218+
-- Delete vector
219+
DELETE FROM table_name WHERE rowid = ?;
157220
158-
// Destroy index
159-
int diskann_destroy(const char *index_name, const char *db_name);
221+
-- Drop entire index
222+
DROP TABLE table_name;
160223
```
161224

162-
Full API: [`src/diskann.h`](./src/diskann.h)
225+
### C API
226+
227+
For advanced usage, see [`src/diskann.h`](./src/diskann.h) for the full C API.
163228

164229
## Building from Source
165230

@@ -171,8 +236,8 @@ sudo apt-get install build-essential clang-tidy valgrind
171236
make all
172237

173238
# Test
174-
make test # C unit tests (126 tests)
175-
make test-stress # Stress tests (300k/100k vectors, ~30 min)
239+
make test # C unit tests
240+
make test-stress # Stress tests (~30 min)
176241
make asan # AddressSanitizer
177242
make valgrind # Memory leak detection
178243
npm test # TypeScript tests

_todo/20260210-virtual-table-with-filtering.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Implement a SQLite virtual table for DiskANN with metadata columns and filtered
99
- [x] Research & Planning
1010
- [x] Test Design
1111
- [x] Implementation Design
12-
- [ ] Test-First Development
12+
- [x] Test-First Development (Phase 1 COMPLETE ✅, Phase 2+ remaining)
1313
- [ ] Implementation
1414
- [ ] Integration
1515
- [ ] Cleanup & Documentation
@@ -48,8 +48,8 @@ Build phases execute sequentially. Each has its own 8-phase lifecycle:
4848

4949
| Phase | TPP | Tests | Description |
5050
| --------- | ----------------------------------------- | --------- | ---------------------------------------------- |
51-
| 0 | `20260210-vtab-phase0-entry-points.md` | 0 (infra) | Consolidate entry points, extract shared utils |
52-
| 1 | `20260210-vtab-phase1-basic-vtab.md` | 19 | CREATE/INSERT/SEARCH/DELETE via SQL |
51+
| 0 DONE | `20260210-vtab-phase0-entry-points.md` | 0 (infra) | Consolidate entry points, extract shared utils |
52+
| 1 DONE | `20260210-vtab-phase1-basic-vtab.md` | 19 | CREATE/INSERT/SEARCH/DELETE via SQL |
5353
| 2 | `20260210-vtab-phase2-metadata.md` | 13 | Metadata columns, schema persistence |
5454
| 3 | `20260210-vtab-phase3-filtered-search.md` | 16 | Filter during beam search, C API + SQL |
5555
| **Total** | | **48** | |
@@ -65,7 +65,7 @@ Phase 4 (Polish — TS bindings, JSON vectors, README) is tracked inline below.
6565
- **HIDDEN columns and SELECT \*.** HIDDEN cols don't appear in `SELECT *` but can be referenced by name. Metadata cols (NOT hidden) appear in `SELECT *`.
6666
- **Filtered-DiskANN paper + Microsoft Rust:** Non-matching nodes MUST still be visited (graph bridges). Filter only gates top-K insertion. `search_ctx_mark_visited()` sets `visited=1` and adds to visited list BEFORE the filter check.
6767
- **Beam width heuristic:** `max(search_list * 2, k * 4)` for filtered search. Tune from recall test results.
68-
- **xBestIndex argv assignment must be conditional.** Assign argvIndex sequentially for each present constraint (MATCH, K, LIMIT, filters). xFilter unpacks based on idxNum bitmask, not fixed positions.
68+
- **xBestIndex argv assignment must match xFilter consumption order.** SQLite presents constraints in arbitrary order. Use a two-pass approach: pass 1 records constraint positions, pass 2 assigns argvIndex in the fixed order xFilter expects (MATCH, K, LIMIT, ROWID, then filters). Assigning sequentially as encountered causes argv order mismatches that silently break xFilter.
6969

7070
## Solutions
7171

0 commit comments

Comments
 (0)