Without an index, pgvector performs an exact nearest-neighbor scan — it computes the distance between the query vector and every row in the table. This is accurate but O(n). For tables with millions of embeddings, sequential scans take seconds or longer per query.
Vector indexes trade a small amount of accuracy (recall) for dramatically faster queries. Both IVFFlat and HNSW are approximate nearest neighbor (ANN) algorithms — they may miss some true nearest neighbors but return results orders of magnitude faster.
Exam tip: If a question asks about "exact nearest neighbor search" in pgvector, the answer involves no index (sequential scan). Any index-backed query is approximate.
| Term | Definition |
|---|---|
| Recall | Fraction of true nearest neighbors returned by the ANN query (1.0 = perfect) |
| Latency | Time to execute the query |
| QPS | Queries per second the system can sustain |
Higher recall requires more computation (more probes, larger ef_search), which increases latency. The goal is to find the sweet spot for your workload — typically recall >= 0.95 is acceptable for most applications.
- Build phase: k-means clustering divides all vectors into
listsclusters based on proximity - Each cluster stores a centroid (center point) and a flat list of all vectors assigned to that cluster
- Query phase: The query vector is compared to all centroids, the closest
probesclusters are selected, and only vectors within those clusters are scanned
Build: vectors → k-means → lists clusters with centroids
Query: query_vec → find closest centroids → scan probes clusters → return top-k
| Parameter | Description | Guideline |
|---|---|---|
lists |
Number of clusters to create | < 1M rows: rows / 1000; >= 1M rows: sqrt(rows) |
-- Example: 500K rows → lists = 500
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 500);
-- Example: 4M rows → lists = 2000 (sqrt(4000000) ≈ 2000)
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 2000);| Parameter | Default | Description |
|---|---|---|
ivfflat.probes |
1 | Number of clusters to scan at query time |
-- Increase probes for better recall (default 1 is too low for production)
SET ivfflat.probes = 10;
-- Guideline: probes = sqrt(lists) is a good starting point
-- For lists = 500: probes ≈ 22
-- For lists = 2000: probes ≈ 45- Requires data before building: k-means needs vectors to compute centroids. Building on an empty table creates an empty index — queries will return no results.
- Not incremental: New rows inserted after index creation are added to the closest cluster, but clusters are never rebalanced. Over time, clusters become uneven and recall degrades.
- Rebuild periodically: After significant data changes (>20% new data), drop and recreate the index to rebalance clusters.
- Faster build time: k-means is computationally cheaper than HNSW graph construction.
Exam tip: If a question describes a scenario where "new data is continuously inserted and recall has degraded over time," the answer involves rebuilding the IVFFlat index.
- Build phase: Constructs a multi-layer graph where each vector is a node
- Top layers are sparse (few nodes, long-range connections for coarse navigation)
- Bottom layer is dense (all nodes, short-range connections for fine-grained search)
- Query phase: Search starts at the top layer, greedily navigates toward the query vector, then descends to lower layers for refinement
Layer 3: o -------- o (few nodes, long jumps)
Layer 2: o --- o --- o --- o
Layer 1: o - o - o - o - o - o - o
Layer 0: o o o o o o o o o o o o o (all nodes, short connections)
| Parameter | Default | Description |
|---|---|---|
m |
16 | Maximum number of connections per node per layer |
ef_construction |
64 | Size of the dynamic candidate list during index build |
-- Default parameters (good starting point)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Higher quality index (slower build, better recall)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);Parameter effects:
| Change | Build Time | Index Size | Query Recall | Query Speed |
|---|---|---|---|---|
Higher m |
Slower | Larger | Better | Slightly slower |
Higher ef_construction |
Slower | Same | Better | No effect |
| Parameter | Default | Description |
|---|---|---|
hnsw.ef_search |
40 | Size of the dynamic candidate list during query |
-- Increase ef_search for better recall
SET hnsw.ef_search = 100;
-- ef_search must be >= the number of results requested (LIMIT)
-- If LIMIT 50, ef_search must be at least 50- Can build on an empty table: The graph structure accommodates incremental inserts without degradation.
- Incremental by design: New vectors are inserted into the graph at all appropriate layers. No periodic rebuilds needed.
- Higher memory during build: Constructing the graph requires more working memory than IVFFlat k-means.
- Larger index size: Each node stores
mconnections per layer, making HNSW indexes larger than IVFFlat. - Better recall out of the box: HNSW typically achieves higher recall than IVFFlat with default parameters.
Exam tip: HNSW is the recommended index type for most workloads on Azure Database for PostgreSQL. Choose IVFFlat only when build time is the primary constraint.
| Property | IVFFlat | HNSW |
|---|---|---|
| Algorithm | k-means clustering + flat scan | Multi-layer navigable graph |
| Build on empty table | No (needs data for k-means) | Yes |
| Incremental inserts | Degrades over time | Maintains quality |
| Build time | Faster | Slower (2-3x typical) |
| Index size | Smaller | Larger (2-3x typical) |
| Default recall | Lower (probes=1) | Higher (ef_search=40) |
| Best recall achievable | High (with many probes) | Very high |
| Rebuild needed | Yes, after significant data changes | No |
| Query latency | Good (depends on probes) | Better (typically faster) |
| Memory during build | Moderate | Higher |
- Data is loaded in bulk and rarely changes
- Index build time is a constraint (large dataset, tight maintenance window)
- You can schedule periodic index rebuilds
- Acceptable recall with tuned probes
- Data arrives incrementally (streaming inserts)
- Best possible recall is required
- Cannot tolerate recall degradation between rebuilds
- Query latency is the primary concern
The operator class in the CREATE INDEX statement must match the distance operator used in queries. If they do not match, PostgreSQL falls back to a sequential scan.
| Distance Operator | Operator Class | Metric | ORDER BY Direction |
|---|---|---|---|
<-> |
vector_l2_ops |
L2 (Euclidean) distance | ASC (smaller = closer) |
<=> |
vector_cosine_ops |
Cosine distance | ASC (0 = identical) |
<#> |
vector_ip_ops |
Negative inner product | ASC (most negative = highest similarity) |
-- Index for cosine distance queries
CREATE INDEX idx_cosine ON items USING hnsw (embedding vector_cosine_ops);
-- This query USES the index:
SELECT * FROM items ORDER BY embedding <=> query_vec LIMIT 10;
-- This query DOES NOT use the index (wrong operator):
SELECT * FROM items ORDER BY embedding <-> query_vec LIMIT 10;Exam tip: A common exam scenario presents an index created with vector_cosine_ops but a query using <-> (L2 distance). The index is not used because the operator class does not match. Always verify operator class alignment.
| Metric | Best For | Notes |
|---|---|---|
Cosine distance (<=>) |
Text embeddings, semantic search | Insensitive to vector magnitude; most common for NLP |
L2 distance (<->) |
Image embeddings, spatial data | Sensitive to magnitude; measures absolute distance |
Inner product (<#>) |
Pre-normalized vectors, recommendation | Equivalent to cosine for normalized vectors; fastest computation |
Exam tip: Azure OpenAI embeddings are L2-normalized. For normalized vectors, cosine distance and inner product produce the same ranking. Cosine distance is the conventional choice for text search.
| Parameter | Default | Recommendation | Effect |
|---|---|---|---|
maintenance_work_mem |
64 MB | 1-2 GB for large indexes | Memory available for index build operations |
max_parallel_maintenance_workers |
2 | 4-8 (based on available cores) | Parallel workers for index builds (HNSW only in pgvector 0.7+) |
-- Increase memory for a large HNSW build
SET maintenance_work_mem = '2GB';
SET max_parallel_maintenance_workers = 7;
-- Build the index (uses increased resources)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Reset to defaults after build
RESET maintenance_work_mem;
RESET max_parallel_maintenance_workers;Build time depends on dataset size, dimensions, index parameters, and available compute. Rough guidelines for 1536-dimension vectors on a Memory Optimized E4ds (4 vCPUs):
| Rows | IVFFlat Build | HNSW Build |
|---|---|---|
| 100K | ~10 seconds | ~30 seconds |
| 1M | ~2 minutes | ~5 minutes |
| 10M | ~20 minutes | ~60 minutes |
Exam tip: For large datasets (>1M rows), increase maintenance_work_mem before building HNSW indexes. Insufficient memory causes the build to spill to disk and slow down dramatically.
For initial data loads, build the index after inserting all data, not before:
-- Step 1: Create table without index
CREATE TABLE items (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- Step 2: Bulk load all data
COPY items (content, embedding) FROM STDIN;
-- ... load data ...
-- Step 3: Build index on populated table
SET maintenance_work_mem = '2GB';
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
RESET maintenance_work_mem;This is faster because:
- IVFFlat computes better centroids with all data present
- HNSW inserts are slightly faster during bulk graph construction than incremental inserts
- No index maintenance overhead during the COPY operation
-- Check if an index is being used
EXPLAIN ANALYZE
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS distance
FROM items
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;What to look for in the output:
| Plan Node | Meaning |
|---|---|
Index Scan using idx_hnsw |
HNSW index is being used |
Index Scan using idx_ivfflat |
IVFFlat index is being used |
Seq Scan |
No index used (sequential scan) |
Sort |
Results are being sorted after scan (index not covering the ORDER BY) |
- No index exists on the vector column
- Wrong operator class: Index uses
vector_cosine_opsbut query uses<->(L2) - Missing ORDER BY ... LIMIT: Vector indexes require
ORDER BY distance LIMIT kto be used - Planner chooses seq scan: For small tables, PostgreSQL may decide a sequential scan is faster
- Filter before ordering: A restrictive WHERE clause may cause the planner to filter first, then sort
-- Force index usage for testing (not recommended for production)
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ... ORDER BY embedding <=> query LIMIT 10;
SET enable_seqscan = on;Exam tip: Vector indexes in pgvector only accelerate ORDER BY ... LIMIT k queries. A query without LIMIT or without ORDER BY the distance operator will not use the vector index.
Hybrid search uses both semantic similarity (vector) and keyword matching (full-text search) to improve result relevance.
-- Add a tsvector column for full-text search
ALTER TABLE items ADD COLUMN content_tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON items USING GIN (content_tsv);
-- Hybrid query: combine vector similarity with text relevance
SELECT id, content,
embedding <=> query_vec AS vector_distance,
ts_rank(content_tsv, plainto_tsquery('english', 'azure openai')) AS text_rank
FROM items
WHERE content_tsv @@ plainto_tsquery('english', 'azure openai')
ORDER BY embedding <=> query_vec
LIMIT 10;| Strategy | Approach | Tradeoff |
|---|---|---|
| Pre-filter | WHERE category = 'ai' before vector search |
Fewer vectors to search; may miss results if filter is too restrictive |
| Post-filter | Vector search first, then filter results | Better recall; may return fewer than k results after filtering |
-- Pre-filter: filter first, then vector search
-- Works well when the filter is not too selective
SELECT id, content
FROM items
WHERE category = 'ai'
ORDER BY embedding <=> query_vec
LIMIT 10;
-- Post-filter approach (subquery)
SELECT * FROM (
SELECT id, content, category
FROM items
ORDER BY embedding <=> query_vec
LIMIT 100 -- over-fetch to account for filtering
) sub
WHERE category = 'ai'
LIMIT 10;Exam tip: Pre-filtering with a WHERE clause on indexed columns (B-tree) combined with vector ORDER BY can leverage partial index scans. If the filter is very selective, PostgreSQL may choose a sequential scan because fewer rows need to be checked.
For very large tables (>10M vectors), consider partitioning by a non-vector column:
-- Range partition by date
CREATE TABLE items (
id SERIAL,
content TEXT,
embedding vector(1536),
created_at DATE
) PARTITION BY RANGE (created_at);
CREATE TABLE items_2024 PARTITION OF items
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Each partition gets its own vector index
CREATE INDEX ON items_2024 USING hnsw (embedding vector_cosine_ops);Benefits:
- Smaller indexes per partition (faster builds, better cache utilization)
- Queries that include a partition key in WHERE skip irrelevant partitions
- Old partitions can be dropped without rebuilding indexes
Reducing embedding dimensions lowers storage, index size, and query time:
# Use the dimensions parameter with text-embedding-3-* models
response = client.embeddings.create(
input=["text to embed"],
model="text-embedding-3-small",
dimensions=512 # reduced from default 1536
)| Dimensions | Storage per Vector | Index Size Impact | Recall Impact |
|---|---|---|---|
| 1536 (default) | 6,152 bytes | Baseline | Baseline |
| 768 | 3,080 bytes | ~50% smaller | Slight decrease |
| 512 | 2,056 bytes | ~33% of baseline | Moderate decrease |
| 256 | 1,032 bytes | ~17% of baseline | Noticeable decrease |
Exam tip: Only text-embedding-3-small and text-embedding-3-large support the dimensions parameter. text-embedding-ada-002 always outputs 1536 dimensions.
| Feature | pgvector 0.5.x | pgvector 0.7.x |
|---|---|---|
| Max dimensions | 2,000 | 16,000 |
| HNSW index | Yes | Yes |
| IVFFlat index | Yes | Yes |
| Parallel HNSW build | No | Yes |
| Halfvec (float16) type | No | Yes |
| Sparse vectors (sparsevec) | No | Yes |
pgvector 0.7+ supports halfvec (float16 vectors) which halve storage at the cost of precision:
-- Create a halfvec column (half the storage of vector)
CREATE TABLE items_half (
id SERIAL PRIMARY KEY,
embedding halfvec(1536)
);
-- Indexes work the same way
CREATE INDEX ON items_half USING hnsw (embedding halfvec_cosine_ops);| Type | Storage per Value | Storage per 1536-dim Vector |
|---|---|---|
vector (float32) |
4 bytes | 6,152 bytes |
halfvec (float16) |
2 bytes | 3,080 bytes |
Exam tip: halfvec uses different operator classes: halfvec_l2_ops, halfvec_cosine_ops, halfvec_ip_ops. Do not mix vector_*_ops with halfvec columns.