@@ -21,6 +21,8 @@ A standalone SQLite extension implementing the [DiskANN algorithm](https://githu
2121- Incremental insert/delete support
2222- Cross-platform: Linux, macOS, Windows (x64, arm64)
2323
24+ ** For smaller datasets** (< 100k vectors), consider [ @photostructure/sqlite-vec ] ( https://github.com/photostructure/sqlite-vec ) which uses exact brute-force search and requires no index building.
25+
2426## Database Compatibility
2527
2628This package works with multiple SQLite library implementations through duck typing:
@@ -58,7 +60,9 @@ npm install better-sqlite3
5860
5961## Quick Start
6062
61- ### Virtual Table Interface (Recommended)
63+ 📖 ** [ Complete Usage Guide] ( ./USAGE.md ) ** - Detailed examples, metadata filtering, performance tips
64+
65+ ### Basic Example
6266
6367The virtual table interface provides standard SQL operations with full query planner integration:
6468
@@ -101,361 +105,12 @@ db.exec("DROP TABLE embeddings");
101105
102106** Virtual table features** :
103107
104- - Standard SQL INSERT/DELETE/DROP operations
105- - MATCH operator for ANN search with ` k ` parameter
106- - LIMIT support for capping results
107- - Automatic shadow table management
108- - Full transactional consistency
109-
110- ### With better-sqlite3
111-
112- ``` typescript
113- import Database from " better-sqlite3" ;
114- import { loadDiskAnnExtension } from " @photostructure/sqlite-diskann" ;
115-
116- const db = new Database (" :memory:" );
117- loadDiskAnnExtension (db );
118-
119- // Create virtual table
120- db .exec (`
121- CREATE VIRTUAL TABLE embeddings USING diskann(
122- dimension=512,
123- metric=cosine
124- )
125- ` );
126-
127- // Insert and search work the same as above
128- const vector = new Float32Array (512 );
129- db .prepare (" INSERT INTO embeddings(rowid, vector) VALUES (?, ?)" ).run (1 , vector );
130-
131- const results = db
132- .prepare (" SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10" )
133- .all (vector );
134- ```
135-
136- ### With node: sqlite (Node 22.5+, experimental)
137-
138- ``` typescript
139- import { DatabaseSync } from " node:sqlite" ;
140- import { loadDiskAnnExtension } from " @photostructure/sqlite-diskann" ;
141-
142- const db = new DatabaseSync (" :memory:" , { allowExtension: true });
143- loadDiskAnnExtension (db );
144-
145- // Create virtual table
146- db .exec (`
147- CREATE VIRTUAL TABLE embeddings USING diskann(
148- dimension=512,
149- metric=cosine
150- )
151- ` );
152-
153- // Insert and search work the same as above
154- const vector = new Float32Array (512 );
155- db .prepare (" INSERT INTO embeddings(rowid, vector) VALUES (?, ?)" ).run (1 , vector );
156-
157- const results = db
158- .prepare (" SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10" )
159- .all (vector );
160- ```
161-
162- ## Metadata Columns and Filtered Search
163-
164- Add metadata columns to enable filtered vector search. Filters are evaluated ** during** graph traversal using the Filtered-DiskANN algorithm - not before or after search.
165-
166- ### Creating an Index with Metadata
167-
168- ``` typescript
169- import { DatabaseSync } from " @photostructure/sqlite" ;
170- import { loadDiskAnnExtension } from " @photostructure/sqlite-diskann" ;
171-
172- const db = new DatabaseSync (" :memory:" , { allowExtension: true });
173- loadDiskAnnExtension (db );
174-
175- // Create index with metadata columns
176- db .exec (`
177- CREATE VIRTUAL TABLE photos USING diskann(
178- dimension=512,
179- metric=cosine,
180- category TEXT,
181- year INTEGER,
182- score REAL
183- )
184- ` );
185- ```
186-
187- ** Supported column types** : ` TEXT ` , ` INTEGER ` , ` REAL ` , ` BLOB `
188-
189- ** Reserved names** : Cannot use ` vector ` , ` distance ` , ` k ` , or ` rowid ` as metadata column names
190-
191- ### Inserting Vectors with Metadata
192-
193- ``` typescript
194- const embedding = new Float32Array (512 ); // Your vector embedding
195-
196- db .prepare (
197- " INSERT INTO photos(rowid, vector, category, year, score) VALUES (?, ?, ?, ?, ?)"
198- ).run (1 , embedding , " landscape" , 2024 , 0.95 );
199-
200- db .prepare (
201- " INSERT INTO photos(rowid, vector, category, year, score) VALUES (?, ?, ?, ?, ?)"
202- ).run (2 , embedding , " portrait" , 2023 , 0.87 );
203- ```
204-
205- ### Searching with Metadata Filters
206-
207- Metadata filters are evaluated ** during beam search** , not as a post-filter. This ensures correct recall even with selective filters.
208-
209- ``` typescript
210- const query = new Float32Array (512 );
211-
212- // Filter by category
213- const landscapes = db
214- .prepare (
215- `
216- SELECT rowid, distance, category, year
217- FROM photos
218- WHERE vector MATCH ? AND k = 10 AND category = 'landscape'
219- `
220- )
221- .all (query );
222-
223- // Multiple filters
224- const recent = db
225- .prepare (
226- `
227- SELECT rowid, distance, category, year, score
228- FROM photos
229- WHERE vector MATCH ? AND k = 10
230- AND category = 'landscape'
231- AND year >= 2023
232- AND score > 0.8
233- `
234- )
235- .all (query );
236-
237- // Range filters
238- const filtered = db
239- .prepare (
240- `
241- SELECT rowid, distance, category
242- FROM photos
243- WHERE vector MATCH ? AND k = 10 AND year BETWEEN 2020 AND 2024
244- `
245- )
246- .all (query );
247- ```
248-
249- ** Supported filter operators** : ` = ` , ` != ` , ` < ` , ` <= ` , ` > ` , ` >= ` , ` BETWEEN ` , ` IN `
250-
251- ### TypeScript Helper Functions
252-
253- ``` typescript
254- import { createDiskAnnIndex } from " @photostructure/sqlite-diskann" ;
255-
256- // Create index with metadata columns
257- createDiskAnnIndex (db , " photos" , {
258- dimension: 512 ,
259- metric: " cosine" ,
260- metadataColumns: [
261- { name: " category" , type: " TEXT" },
262- { name: " year" , type: " INTEGER" },
263- { name: " score" , type: " REAL" },
264- ],
265- });
266-
267- // Insert using raw SQL for metadata
268- const vec = new Float32Array (512 );
269- db .prepare (" INSERT INTO photos(rowid, vector, category, year) VALUES (?, ?, ?, ?)" ).run (
270- 1 ,
271- vec ,
272- " landscape" ,
273- 2024
274- );
275-
276- // Search with filters (use raw SQL)
277- const results = db
278- .prepare (
279- `
280- SELECT rowid, distance, category, year
281- FROM photos
282- WHERE vector MATCH ? AND k = 10 AND category = ?
283- `
284- )
285- .all (vec , " landscape" );
286- ```
287-
288- ## MATCH Operator Syntax
289-
290- The ` MATCH ` operator triggers ANN search. It must be combined with the ` k ` parameter.
291-
292- ### Basic Search
293-
294- ``` sql
295- SELECT rowid, distance
296- FROM embeddings
297- WHERE vector MATCH < vector_blob> AND k = < neighbor_count>
298- ```
299-
300- - ` vector MATCH <blob> ` : Triggers ANN search with the query vector (must be BLOB)
301- - ` k = <number> ` : Number of nearest neighbors to return
302- - Results are automatically sorted by distance (ascending)
303-
304- ### With LIMIT
305-
306- ``` sql
307- -- LIMIT caps result rows, not search beam width
308- SELECT rowid, distance
309- FROM embeddings
310- WHERE vector MATCH ? AND k = 100
311- LIMIT 10 -- Returns closest 10 of the 100 candidates
312- ```
313-
314- ** Note** : ` k ` controls the search beam width (quality), ` LIMIT ` controls result count.
315-
316- ### With Metadata Filters
317-
318- ``` sql
319- -- Filters are evaluated DURING graph traversal (Filtered-DiskANN)
320- SELECT rowid, distance, category, year
321- FROM photos
322- WHERE vector MATCH ? AND k = 50 AND category = ' landscape' AND year > 2020
323- ```
324-
325- ** How filtering works** :
326-
327- 1 . Graph traversal visits all nodes (respecting graph edges as bridges)
328- 2 . Only matching nodes are added to the top-k results
329- 3 . Non-matching nodes are still traversed (to reach matching nodes elsewhere)
330- 4 . Returns up to k matching results
331-
332- ### Invalid Queries
333-
334- ``` sql
335- -- ❌ Missing k parameter
336- SELECT rowid, distance FROM embeddings WHERE vector MATCH ?
337-
338- -- ❌ k without MATCH
339- SELECT rowid, distance FROM embeddings WHERE k = 10
340-
341- -- ❌ Wrong column type (vector must be BLOB, not TEXT)
342- SELECT rowid, distance FROM embeddings WHERE vector MATCH ' [1.0, 2.0, ...]' AND k = 10
343- ```
344-
345- ## Performance Tips
346-
347- ### Index Metadata Columns
348-
349- For fast filtered search, create SQLite indexes on metadata columns you filter by:
350-
351- ``` sql
352- -- Create index with metadata columns
353- CREATE VIRTUAL TABLE photos USING diskann(
354- dimension= 512 , metric= cosine, category TEXT , year INTEGER
355- );
356-
357- -- Add index on frequently filtered columns in the shadow table
358- -- Shadow table name pattern: {tableName}_attrs
359- CREATE INDEX idx_photos_category ON photos_attrs(category);
360- CREATE INDEX idx_photos_year ON photos_attrs(year);
361- CREATE INDEX idx_photos_combined ON photos_attrs(category, year);
362- ```
363-
364- ** Why** : Metadata is stored in a shadow table named ` {tableName}_attrs ` (e.g., ` photos_attrs ` for a table named ` photos ` ). SQLite indexes on this shadow table speed up the pre-filtering step before beam search.
365-
366- ** When to index** :
367-
368- - ✅ Columns used in WHERE clauses (e.g., ` category = 'landscape' ` )
369- - ✅ High-cardinality columns (many unique values)
370- - ✅ Selective filters (< 50% of rows match)
371- - ❌ Low-cardinality columns (e.g., boolean flags)
372- - ❌ Columns rarely used in filters
373-
374- ### Tuning Search Parameters
375-
376- ``` sql
377- -- Create index with tuned parameters
378- CREATE VIRTUAL TABLE embeddings USING diskann(
379- dimension= 512 ,
380- metric= cosine,
381- max_degree= 64 , -- Graph connectivity (default: 64)
382- build_search_list_size= 100 -- Beam width during insert (default: 100)
383- );
384- ```
385-
386- - ** ` max_degree ` ** : Higher values improve recall but increase memory and index size
387- - Default: 64
388- - Range: 16-128
389- - Recommendation: 64 for most use cases
390-
391- - ** ` build_search_list_size ` ** : Higher values improve index quality but slow down inserts
392- - Default: 100
393- - Range: 50-200
394- - Recommendation: 100 for balanced performance
395-
396- ### Vector Format
397-
398- Use ` Float32Array ` for best performance:
399-
400- ``` typescript
401- // ✅ Good - direct binary encoding
402- const vec = new Float32Array (512 );
403- db .prepare (" INSERT INTO embeddings(rowid, vector) VALUES (?, ?)" ).run (1 , vec );
404-
405- // ✅ Also good - automatic conversion
406- const vecArray = [0.1 , 0.2 , 0.3 , ... ]; // number[]
407- insertVector (db , " embeddings" , 1 , vecArray ); // Converts to Float32Array internally
408- ```
409-
410- ### Batch Operations
411-
412- Use transactions for bulk inserts:
413-
414- ``` typescript
415- db .exec (" BEGIN TRANSACTION" );
416- const stmt = db .prepare (" INSERT INTO embeddings(rowid, vector) VALUES (?, ?)" );
417- for (let i = 0 ; i < 10000 ; i ++ ) {
418- stmt .run (i , vectors [i ]);
419- }
420- db .exec (" COMMIT" );
421- ```
422-
423- ### C API (Advanced)
424-
425- For direct C API usage, the lower-level functions are still available:
426-
427- ``` c
428- // Create index
429- diskann_create_index (db, "main", "my_index", &config);
430-
431- // Open index
432- DiskAnnIndex * idx;
433- diskann_open_index(db, "main", "my_index", &idx);
434-
435- // Insert vector
436- diskann_insert(idx, rowid, vector, dims);
437-
438- // Search
439- DiskAnnResult results[ 10] ;
440- int count = diskann_search(idx, query, dims, 10, results);
441-
442- // Close
443- diskann_close_index(idx);
444- ```
445-
446- See [`src/diskann.h`](./src/diskann.h) for full C API documentation.
447-
448- ## Why DiskANN?
449-
450- Most SQLite vector extensions either:
451-
452- - Use brute-force (doesn't scale to millions of vectors)
453- - Require separate index files (no transactional consistency, crash recovery)
454- - Have licensing restrictions (Elastic License, etc.)
455-
456- DiskANN stores the entire graph index inside SQLite using shadow tables, providing true ACID guarantees and single-file databases.
457-
458- See [`_research/sqlite-vector-options.md`](./_research/sqlite-vector-options.md) for comparison with alternatives.
108+ See [ USAGE.md] ( ./USAGE.md ) for:
109+ - Examples with better-sqlite3 and node: sqlite
110+ - Metadata columns and filtered search
111+ - MATCH operator syntax and query patterns
112+ - Performance tuning and optimization tips
113+ - C API usage
459114
460115## API Reference
461116
@@ -531,8 +186,6 @@ Derived from libSQL's DiskANN implementation:
531186- Copyright 2024 the libSQL authors
532187- Copyright 2026 PhotoStructure Inc.
533188
534- See [ LICENSE] ( ./LICENSE ) for full text.
535-
536189## Links
537190
538191- [ DiskANN Paper (Microsoft Research)] ( https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf )
0 commit comments