Skip to content
This repository was archived by the owner on Feb 18, 2026. It is now read-only.

Commit 4fb2021

Browse files
committed
feat(docs): update documentation to include usage guide and enhance TypeScript API reference
1 parent 922d939 commit 4fb2021

3 files changed

Lines changed: 404 additions & 359 deletions

File tree

README.md

Lines changed: 11 additions & 358 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ A standalone SQLite extension implementing the [DiskANN algorithm](https://githu
2121
- Incremental insert/delete support
2222
- Cross-platform: Linux, macOS, Windows (x64, arm64)
2323

24+
**For smaller datasets** (< 100k vectors), consider [@photostructure/sqlite-vec](https://github.com/photostructure/sqlite-vec) which uses exact brute-force search and requires no index building.
25+
2426
## Database Compatibility
2527

2628
This package works with multiple SQLite library implementations through duck typing:
@@ -58,7 +60,9 @@ npm install better-sqlite3
5860

5961
## Quick Start
6062

61-
### Virtual Table Interface (Recommended)
63+
📖 **[Complete Usage Guide](./USAGE.md)** - Detailed examples, metadata filtering, performance tips
64+
65+
### Basic Example
6266

6367
The virtual table interface provides standard SQL operations with full query planner integration:
6468

@@ -101,361 +105,12 @@ db.exec("DROP TABLE embeddings");
101105

102106
**Virtual table features**:
103107

104-
- Standard SQL INSERT/DELETE/DROP operations
105-
- MATCH operator for ANN search with `k` parameter
106-
- LIMIT support for capping results
107-
- Automatic shadow table management
108-
- Full transactional consistency
109-
110-
### With better-sqlite3
111-
112-
```typescript
113-
import Database from "better-sqlite3";
114-
import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
115-
116-
const db = new Database(":memory:");
117-
loadDiskAnnExtension(db);
118-
119-
// Create virtual table
120-
db.exec(`
121-
CREATE VIRTUAL TABLE embeddings USING diskann(
122-
dimension=512,
123-
metric=cosine
124-
)
125-
`);
126-
127-
// Insert and search work the same as above
128-
const vector = new Float32Array(512);
129-
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vector);
130-
131-
const results = db
132-
.prepare("SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10")
133-
.all(vector);
134-
```
135-
136-
### With node:sqlite (Node 22.5+, experimental)
137-
138-
```typescript
139-
import { DatabaseSync } from "node:sqlite";
140-
import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
141-
142-
const db = new DatabaseSync(":memory:", { allowExtension: true });
143-
loadDiskAnnExtension(db);
144-
145-
// Create virtual table
146-
db.exec(`
147-
CREATE VIRTUAL TABLE embeddings USING diskann(
148-
dimension=512,
149-
metric=cosine
150-
)
151-
`);
152-
153-
// Insert and search work the same as above
154-
const vector = new Float32Array(512);
155-
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vector);
156-
157-
const results = db
158-
.prepare("SELECT rowid, distance FROM embeddings WHERE vector MATCH ? AND k = 10")
159-
.all(vector);
160-
```
161-
162-
## Metadata Columns and Filtered Search
163-
164-
Add metadata columns to enable filtered vector search. Filters are evaluated **during** graph traversal using the Filtered-DiskANN algorithm - not before or after search.
165-
166-
### Creating an Index with Metadata
167-
168-
```typescript
169-
import { DatabaseSync } from "@photostructure/sqlite";
170-
import { loadDiskAnnExtension } from "@photostructure/sqlite-diskann";
171-
172-
const db = new DatabaseSync(":memory:", { allowExtension: true });
173-
loadDiskAnnExtension(db);
174-
175-
// Create index with metadata columns
176-
db.exec(`
177-
CREATE VIRTUAL TABLE photos USING diskann(
178-
dimension=512,
179-
metric=cosine,
180-
category TEXT,
181-
year INTEGER,
182-
score REAL
183-
)
184-
`);
185-
```
186-
187-
**Supported column types**: `TEXT`, `INTEGER`, `REAL`, `BLOB`
188-
189-
**Reserved names**: Cannot use `vector`, `distance`, `k`, or `rowid` as metadata column names
190-
191-
### Inserting Vectors with Metadata
192-
193-
```typescript
194-
const embedding = new Float32Array(512); // Your vector embedding
195-
196-
db.prepare(
197-
"INSERT INTO photos(rowid, vector, category, year, score) VALUES (?, ?, ?, ?, ?)"
198-
).run(1, embedding, "landscape", 2024, 0.95);
199-
200-
db.prepare(
201-
"INSERT INTO photos(rowid, vector, category, year, score) VALUES (?, ?, ?, ?, ?)"
202-
).run(2, embedding, "portrait", 2023, 0.87);
203-
```
204-
205-
### Searching with Metadata Filters
206-
207-
Metadata filters are evaluated **during beam search**, not as a post-filter. This ensures correct recall even with selective filters.
208-
209-
```typescript
210-
const query = new Float32Array(512);
211-
212-
// Filter by category
213-
const landscapes = db
214-
.prepare(
215-
`
216-
SELECT rowid, distance, category, year
217-
FROM photos
218-
WHERE vector MATCH ? AND k = 10 AND category = 'landscape'
219-
`
220-
)
221-
.all(query);
222-
223-
// Multiple filters
224-
const recent = db
225-
.prepare(
226-
`
227-
SELECT rowid, distance, category, year, score
228-
FROM photos
229-
WHERE vector MATCH ? AND k = 10
230-
AND category = 'landscape'
231-
AND year >= 2023
232-
AND score > 0.8
233-
`
234-
)
235-
.all(query);
236-
237-
// Range filters
238-
const filtered = db
239-
.prepare(
240-
`
241-
SELECT rowid, distance, category
242-
FROM photos
243-
WHERE vector MATCH ? AND k = 10 AND year BETWEEN 2020 AND 2024
244-
`
245-
)
246-
.all(query);
247-
```
248-
249-
**Supported filter operators**: `=`, `!=`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IN`
250-
251-
### TypeScript Helper Functions
252-
253-
```typescript
254-
import { createDiskAnnIndex } from "@photostructure/sqlite-diskann";
255-
256-
// Create index with metadata columns
257-
createDiskAnnIndex(db, "photos", {
258-
dimension: 512,
259-
metric: "cosine",
260-
metadataColumns: [
261-
{ name: "category", type: "TEXT" },
262-
{ name: "year", type: "INTEGER" },
263-
{ name: "score", type: "REAL" },
264-
],
265-
});
266-
267-
// Insert using raw SQL for metadata
268-
const vec = new Float32Array(512);
269-
db.prepare("INSERT INTO photos(rowid, vector, category, year) VALUES (?, ?, ?, ?)").run(
270-
1,
271-
vec,
272-
"landscape",
273-
2024
274-
);
275-
276-
// Search with filters (use raw SQL)
277-
const results = db
278-
.prepare(
279-
`
280-
SELECT rowid, distance, category, year
281-
FROM photos
282-
WHERE vector MATCH ? AND k = 10 AND category = ?
283-
`
284-
)
285-
.all(vec, "landscape");
286-
```
287-
288-
## MATCH Operator Syntax
289-
290-
The `MATCH` operator triggers ANN search. It must be combined with the `k` parameter.
291-
292-
### Basic Search
293-
294-
```sql
295-
SELECT rowid, distance
296-
FROM embeddings
297-
WHERE vector MATCH <vector_blob> AND k = <neighbor_count>
298-
```
299-
300-
- `vector MATCH <blob>`: Triggers ANN search with the query vector (must be BLOB)
301-
- `k = <number>`: Number of nearest neighbors to return
302-
- Results are automatically sorted by distance (ascending)
303-
304-
### With LIMIT
305-
306-
```sql
307-
-- LIMIT caps result rows, not search beam width
308-
SELECT rowid, distance
309-
FROM embeddings
310-
WHERE vector MATCH ? AND k = 100
311-
LIMIT 10 -- Returns closest 10 of the 100 candidates
312-
```
313-
314-
**Note**: `k` controls the search beam width (quality), `LIMIT` controls result count.
315-
316-
### With Metadata Filters
317-
318-
```sql
319-
-- Filters are evaluated DURING graph traversal (Filtered-DiskANN)
320-
SELECT rowid, distance, category, year
321-
FROM photos
322-
WHERE vector MATCH ? AND k = 50 AND category = 'landscape' AND year > 2020
323-
```
324-
325-
**How filtering works**:
326-
327-
1. Graph traversal visits all nodes (respecting graph edges as bridges)
328-
2. Only matching nodes are added to the top-k results
329-
3. Non-matching nodes are still traversed (to reach matching nodes elsewhere)
330-
4. Returns up to k matching results
331-
332-
### Invalid Queries
333-
334-
```sql
335-
-- ❌ Missing k parameter
336-
SELECT rowid, distance FROM embeddings WHERE vector MATCH ?
337-
338-
-- ❌ k without MATCH
339-
SELECT rowid, distance FROM embeddings WHERE k = 10
340-
341-
-- ❌ Wrong column type (vector must be BLOB, not TEXT)
342-
SELECT rowid, distance FROM embeddings WHERE vector MATCH '[1.0, 2.0, ...]' AND k = 10
343-
```
344-
345-
## Performance Tips
346-
347-
### Index Metadata Columns
348-
349-
For fast filtered search, create SQLite indexes on metadata columns you filter by:
350-
351-
```sql
352-
-- Create index with metadata columns
353-
CREATE VIRTUAL TABLE photos USING diskann(
354-
dimension=512, metric=cosine, category TEXT, year INTEGER
355-
);
356-
357-
-- Add index on frequently filtered columns in the shadow table
358-
-- Shadow table name pattern: {tableName}_attrs
359-
CREATE INDEX idx_photos_category ON photos_attrs(category);
360-
CREATE INDEX idx_photos_year ON photos_attrs(year);
361-
CREATE INDEX idx_photos_combined ON photos_attrs(category, year);
362-
```
363-
364-
**Why**: Metadata is stored in a shadow table named `{tableName}_attrs` (e.g., `photos_attrs` for a table named `photos`). SQLite indexes on this shadow table speed up the pre-filtering step before beam search.
365-
366-
**When to index**:
367-
368-
- ✅ Columns used in WHERE clauses (e.g., `category = 'landscape'`)
369-
- ✅ High-cardinality columns (many unique values)
370-
- ✅ Selective filters (< 50% of rows match)
371-
- ❌ Low-cardinality columns (e.g., boolean flags)
372-
- ❌ Columns rarely used in filters
373-
374-
### Tuning Search Parameters
375-
376-
```sql
377-
-- Create index with tuned parameters
378-
CREATE VIRTUAL TABLE embeddings USING diskann(
379-
dimension=512,
380-
metric=cosine,
381-
max_degree=64, -- Graph connectivity (default: 64)
382-
build_search_list_size=100 -- Beam width during insert (default: 100)
383-
);
384-
```
385-
386-
- **`max_degree`**: Higher values improve recall but increase memory and index size
387-
- Default: 64
388-
- Range: 16-128
389-
- Recommendation: 64 for most use cases
390-
391-
- **`build_search_list_size`**: Higher values improve index quality but slow down inserts
392-
- Default: 100
393-
- Range: 50-200
394-
- Recommendation: 100 for balanced performance
395-
396-
### Vector Format
397-
398-
Use `Float32Array` for best performance:
399-
400-
```typescript
401-
// ✅ Good - direct binary encoding
402-
const vec = new Float32Array(512);
403-
db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)").run(1, vec);
404-
405-
// ✅ Also good - automatic conversion
406-
const vecArray = [0.1, 0.2, 0.3, ...]; // number[]
407-
insertVector(db, "embeddings", 1, vecArray); // Converts to Float32Array internally
408-
```
409-
410-
### Batch Operations
411-
412-
Use transactions for bulk inserts:
413-
414-
```typescript
415-
db.exec("BEGIN TRANSACTION");
416-
const stmt = db.prepare("INSERT INTO embeddings(rowid, vector) VALUES (?, ?)");
417-
for (let i = 0; i < 10000; i++) {
418-
stmt.run(i, vectors[i]);
419-
}
420-
db.exec("COMMIT");
421-
```
422-
423-
### C API (Advanced)
424-
425-
For direct C API usage, the lower-level functions are still available:
426-
427-
```c
428-
// Create index
429-
diskann_create_index(db, "main", "my_index", &config);
430-
431-
// Open index
432-
DiskAnnIndex *idx;
433-
diskann_open_index(db, "main", "my_index", &idx);
434-
435-
// Insert vector
436-
diskann_insert(idx, rowid, vector, dims);
437-
438-
// Search
439-
DiskAnnResult results[10];
440-
int count = diskann_search(idx, query, dims, 10, results);
441-
442-
// Close
443-
diskann_close_index(idx);
444-
```
445-
446-
See [`src/diskann.h`](./src/diskann.h) for full C API documentation.
447-
448-
## Why DiskANN?
449-
450-
Most SQLite vector extensions either:
451-
452-
- Use brute-force (doesn't scale to millions of vectors)
453-
- Require separate index files (no transactional consistency, crash recovery)
454-
- Have licensing restrictions (Elastic License, etc.)
455-
456-
DiskANN stores the entire graph index inside SQLite using shadow tables, providing true ACID guarantees and single-file databases.
457-
458-
See [`_research/sqlite-vector-options.md`](./_research/sqlite-vector-options.md) for comparison with alternatives.
108+
See [USAGE.md](./USAGE.md) for:
109+
- Examples with better-sqlite3 and node:sqlite
110+
- Metadata columns and filtered search
111+
- MATCH operator syntax and query patterns
112+
- Performance tuning and optimization tips
113+
- C API usage
459114

460115
## API Reference
461116

@@ -531,8 +186,6 @@ Derived from libSQL's DiskANN implementation:
531186
- Copyright 2024 the libSQL authors
532187
- Copyright 2026 PhotoStructure Inc.
533188

534-
See [LICENSE](./LICENSE) for full text.
535-
536189
## Links
537190

538191
- [DiskANN Paper (Microsoft Research)](https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)

0 commit comments

Comments
 (0)