The vss extension adds HNSW (Hierarchical Navigable Small Worlds) indexing for approximate nearest-neighbor search on fixed-size float arrays. This makes react-native-duckdb the first mobile database with native on-device vector search.
Add vss to your extension list:
{
"react-native-duckdb": {
"build": {
"extensions": ["core_functions", "vss"]
}
}
}For Expo, use the plugin config in app.json:
["react-native-duckdb", { "extensions": ["core_functions", "vss"] }]Load the extension before creating indexes:
LOAD 'vss';Store embedding vectors using DuckDB's fixed-size FLOAT[N] array type:
CREATE TABLE embeddings (
id INTEGER,
label VARCHAR,
vec FLOAT[384]
);
INSERT INTO embeddings VALUES
(1, 'hello world', [0.1, 0.2, ...]::FLOAT[384]);The dimension N must be fixed at table creation time. Use FLOAT[N], not FLOAT[].
-- L2 / euclidean distance (default)
CREATE INDEX idx ON embeddings USING HNSW (vec);
-- Cosine distance (recommended for normalized embeddings)
CREATE INDEX idx_cosine ON embeddings USING HNSW (vec)
WITH (metric = 'cosine');
-- Inner product
CREATE INDEX idx_ip ON embeddings USING HNSW (vec)
WITH (metric = 'ip');Use the distance function that matches your index metric, with ORDER BY ... LIMIT N:
-- Cosine distance
SELECT id, label,
array_cosine_distance(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;
-- L2 / euclidean distance
SELECT id, label,
array_distance(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;
-- Negative inner product
SELECT id, label,
array_negative_inner_product(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;| Metric | SQL function | Index parameter | Best for |
|---|---|---|---|
| Cosine | array_cosine_distance(a, b) |
metric = 'cosine' |
Normalized embeddings (sentence-transformers, OpenAI) |
| L2 (Euclidean) | array_distance(a, b) |
metric = 'l2sq' |
Raw feature vectors, image embeddings |
| Inner Product | array_negative_inner_product(a, b) |
metric = 'ip' |
Maximum inner product search, dot-product similarity |
Use EXPLAIN to confirm the query plan includes HNSW_INDEX_SCAN:
EXPLAIN SELECT id FROM embeddings
ORDER BY array_cosine_distance(vec, [0.1, 0.2, ...]::FLOAT[384])
LIMIT 10;
-- Look for HNSW_INDEX_SCAN in the outputIf you see a sequential scan instead, check that:
- The distance function matches the index metric
- The query includes a
LIMITclause - The
ORDER BYexpression uses the indexed column
Store document embeddings from sentence-transformers or OpenAI, then find the most semantically similar documents to a query:
SELECT id, title,
array_cosine_distance(embedding, $query_vec) AS distance
FROM documents
ORDER BY distance
LIMIT 10;Retrieve relevant context chunks for local LLM inference without network calls. Store your knowledge base embeddings on-device and query them with zero latency:
SELECT chunk_text FROM knowledge_base
ORDER BY array_cosine_distance(embedding, $question_vec)
LIMIT 5;Find similar items based on feature vectors -- product recommendations, content suggestions, or user-to-user similarity, all computed locally:
SELECT product_id, name,
array_distance(features, $user_vec) AS distance
FROM products
ORDER BY distance
LIMIT 20;LOAD 'vss';
CREATE TABLE items (
id INTEGER PRIMARY KEY,
title VARCHAR,
embedding FLOAT[4]
);
INSERT INTO items VALUES
(1, 'Red shoes', [0.9, 0.1, 0.0, 0.2]::FLOAT[4]),
(2, 'Blue jacket', [0.1, 0.8, 0.3, 0.1]::FLOAT[4]),
(3, 'Red hat', [0.85, 0.15, 0.05, 0.25]::FLOAT[4]),
(4, 'Green pants', [0.2, 0.3, 0.9, 0.1]::FLOAT[4]);
CREATE INDEX items_vec_idx ON items USING HNSW (embedding)
WITH (metric = 'cosine');
-- Find items most similar to "Red shoes"
SELECT id, title,
array_cosine_distance(embedding, [0.9, 0.1, 0.0, 0.2]::FLOAT[4]) AS distance
FROM items
ORDER BY distance
LIMIT 3;HNSW indexes are in-memory only. Indexes are not persisted across database close/reopen. They must be recreated each session. For in-memory databases this is a non-issue since the data itself is also ephemeral.
LIMIT clause required for acceleration. HNSW index acceleration only activates when the query includes ORDER BY <distance_function>(...) LIMIT N. Without LIMIT, DuckDB falls back to brute-force sequential scanning.
Distance function must match index metric. A cosine index only accelerates array_cosine_distance() queries. Using array_distance() on a cosine index falls back to brute-force. Always pair the correct function with the correct metric.
Fixed-size arrays only. Vector columns must use FLOAT[N] (fixed-size), not FLOAT[] (variable-size). DuckDB will reject HNSW index creation on variable-size arrays.
Binary size impact. Minimal -- VSS is header-only C++ (usearch + simsimd libraries vendored in the extension). No external system dependencies.
Part of react-native-duckdb -- see API Reference for the full API.