Vector Similarity Search

The vss extension adds HNSW (Hierarchical Navigable Small Worlds) indexing for approximate nearest-neighbor search on fixed-size float arrays. This makes react-native-duckdb the first mobile database with native on-device vector search.

Enable the extension

Add vss to your extension list:

{
  "react-native-duckdb": {
    "build": {
      "extensions": ["core_functions", "vss"]
    }
  }
}

For Expo, use the plugin config in app.json:

["react-native-duckdb", { "extensions": ["core_functions", "vss"] }]

Load the extension before creating indexes:

LOAD 'vss';

Create tables with vector columns

Store embedding vectors using DuckDB's fixed-size FLOAT[N] array type:

CREATE TABLE embeddings (
    id INTEGER,
    label VARCHAR,
    vec FLOAT[384]
);

INSERT INTO embeddings VALUES
    (1, 'hello world', [0.1, 0.2, ...]::FLOAT[384]);

The dimension N must be fixed at table creation time. Use FLOAT[N], not FLOAT[].

Create HNSW indexes

-- L2 / euclidean distance (default)
CREATE INDEX idx ON embeddings USING HNSW (vec);

-- Cosine distance (recommended for normalized embeddings)
CREATE INDEX idx_cosine ON embeddings USING HNSW (vec)
    WITH (metric = 'cosine');

-- Inner product
CREATE INDEX idx_ip ON embeddings USING HNSW (vec)
    WITH (metric = 'ip');

Query nearest neighbors

Use the distance function that matches your index metric, with ORDER BY ... LIMIT N:

-- Cosine distance
SELECT id, label,
    array_cosine_distance(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;

-- L2 / euclidean distance
SELECT id, label,
    array_distance(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;

-- Negative inner product
SELECT id, label,
    array_negative_inner_product(vec, $query_vec) AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;

Distance metrics

Metric	SQL function	Index parameter	Best for
Cosine	`array_cosine_distance(a, b)`	`metric = 'cosine'`	Normalized embeddings (sentence-transformers, OpenAI)
L2 (Euclidean)	`array_distance(a, b)`	`metric = 'l2sq'`	Raw feature vectors, image embeddings
Inner Product	`array_negative_inner_product(a, b)`	`metric = 'ip'`	Maximum inner product search, dot-product similarity

Verify index usage

Use EXPLAIN to confirm the query plan includes HNSW_INDEX_SCAN:

EXPLAIN SELECT id FROM embeddings
ORDER BY array_cosine_distance(vec, [0.1, 0.2, ...]::FLOAT[384])
LIMIT 10;
-- Look for HNSW_INDEX_SCAN in the output

If you see a sequential scan instead, check that:

The distance function matches the index metric
The query includes a LIMIT clause
The ORDER BY expression uses the indexed column

Use cases

Semantic search

Store document embeddings from sentence-transformers or OpenAI, then find the most semantically similar documents to a query:

SELECT id, title,
    array_cosine_distance(embedding, $query_vec) AS distance
FROM documents
ORDER BY distance
LIMIT 10;

On-device RAG

Retrieve relevant context chunks for local LLM inference without network calls. Store your knowledge base embeddings on-device and query them with zero latency:

SELECT chunk_text FROM knowledge_base
ORDER BY array_cosine_distance(embedding, $question_vec)
LIMIT 5;

Recommendations

Find similar items based on feature vectors -- product recommendations, content suggestions, or user-to-user similarity, all computed locally:

SELECT product_id, name,
    array_distance(features, $user_vec) AS distance
FROM products
ORDER BY distance
LIMIT 20;

Full example

LOAD 'vss';

CREATE TABLE items (
    id INTEGER PRIMARY KEY,
    title VARCHAR,
    embedding FLOAT[4]
);

INSERT INTO items VALUES
    (1, 'Red shoes',    [0.9, 0.1, 0.0, 0.2]::FLOAT[4]),
    (2, 'Blue jacket',  [0.1, 0.8, 0.3, 0.1]::FLOAT[4]),
    (3, 'Red hat',      [0.85, 0.15, 0.05, 0.25]::FLOAT[4]),
    (4, 'Green pants',  [0.2, 0.3, 0.9, 0.1]::FLOAT[4]);

CREATE INDEX items_vec_idx ON items USING HNSW (embedding)
    WITH (metric = 'cosine');

-- Find items most similar to "Red shoes"
SELECT id, title,
    array_cosine_distance(embedding, [0.9, 0.1, 0.0, 0.2]::FLOAT[4]) AS distance
FROM items
ORDER BY distance
LIMIT 3;

Pitfalls

HNSW indexes are in-memory only. Indexes are not persisted across database close/reopen. They must be recreated each session. For in-memory databases this is a non-issue since the data itself is also ephemeral.

LIMIT clause required for acceleration. HNSW index acceleration only activates when the query includes ORDER BY <distance_function>(...) LIMIT N. Without LIMIT, DuckDB falls back to brute-force sequential scanning.

Distance function must match index metric. A cosine index only accelerates array_cosine_distance() queries. Using array_distance() on a cosine index falls back to brute-force. Always pair the correct function with the correct metric.

Fixed-size arrays only. Vector columns must use FLOAT[N] (fixed-size), not FLOAT[] (variable-size). DuckDB will reject HNSW index creation on variable-size arrays.

Binary size impact. Minimal -- VSS is header-only C++ (usearch + simsimd libraries vendored in the extension). No external system dependencies.

Part of react-native-duckdb -- see API Reference for the full API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Similarity Search

Enable the extension

Create tables with vector columns

Create HNSW indexes

Query nearest neighbors

Distance metrics

Verify index usage

Use cases

Semantic search

On-device RAG

Recommendations

Full example

Pitfalls

FilesExpand file tree

vss.md

Latest commit

History

vss.md

File metadata and controls

Vector Similarity Search

Enable the extension

Create tables with vector columns

Create HNSW indexes

Query nearest neighbors

Distance metrics

Verify index usage

Use cases

Semantic search

On-device RAG

Recommendations

Full example

Pitfalls