Vector Search Optimization - Study Notes

1. Vector Index Fundamentals

Why Vector Indexes Matter

Without an index, pgvector performs an exact nearest-neighbor scan — it computes the distance between the query vector and every row in the table. This is accurate but O(n). For tables with millions of embeddings, sequential scans take seconds or longer per query.

Vector indexes trade a small amount of accuracy (recall) for dramatically faster queries. Both IVFFlat and HNSW are approximate nearest neighbor (ANN) algorithms — they may miss some true nearest neighbors but return results orders of magnitude faster.

Exam tip: If a question asks about "exact nearest neighbor search" in pgvector, the answer involves no index (sequential scan). Any index-backed query is approximate.

Recall vs Latency Tradeoff

Term	Definition
Recall	Fraction of true nearest neighbors returned by the ANN query (1.0 = perfect)
Latency	Time to execute the query
QPS	Queries per second the system can sustain

Higher recall requires more computation (more probes, larger ef_search), which increases latency. The goal is to find the sweet spot for your workload — typically recall >= 0.95 is acceptable for most applications.

2. IVFFlat (Inverted File with Flat Compression)

How IVFFlat Works

Build phase: k-means clustering divides all vectors into lists clusters based on proximity
Each cluster stores a centroid (center point) and a flat list of all vectors assigned to that cluster
Query phase: The query vector is compared to all centroids, the closest probes clusters are selected, and only vectors within those clusters are scanned

Build: vectors → k-means → lists clusters with centroids
Query: query_vec → find closest centroids → scan probes clusters → return top-k

IVFFlat Build Parameters

Parameter	Description	Guideline
`lists`	Number of clusters to create	< 1M rows: `rows / 1000`; >= 1M rows: `sqrt(rows)`

-- Example: 500K rows → lists = 500
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 500);

-- Example: 4M rows → lists = 2000 (sqrt(4000000) ≈ 2000)
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 2000);

IVFFlat Query Parameters

Parameter	Default	Description
`ivfflat.probes`	1	Number of clusters to scan at query time

-- Increase probes for better recall (default 1 is too low for production)
SET ivfflat.probes = 10;

-- Guideline: probes = sqrt(lists) is a good starting point
-- For lists = 500: probes ≈ 22
-- For lists = 2000: probes ≈ 45

IVFFlat Key Characteristics

Requires data before building: k-means needs vectors to compute centroids. Building on an empty table creates an empty index — queries will return no results.
Not incremental: New rows inserted after index creation are added to the closest cluster, but clusters are never rebalanced. Over time, clusters become uneven and recall degrades.
Rebuild periodically: After significant data changes (>20% new data), drop and recreate the index to rebalance clusters.
Faster build time: k-means is computationally cheaper than HNSW graph construction.

Exam tip: If a question describes a scenario where "new data is continuously inserted and recall has degraded over time," the answer involves rebuilding the IVFFlat index.

3. HNSW (Hierarchical Navigable Small World)

How HNSW Works

Build phase: Constructs a multi-layer graph where each vector is a node
Top layers are sparse (few nodes, long-range connections for coarse navigation)
Bottom layer is dense (all nodes, short-range connections for fine-grained search)
Query phase: Search starts at the top layer, greedily navigates toward the query vector, then descends to lower layers for refinement

Layer 3:  o -------- o  (few nodes, long jumps)
Layer 2:  o --- o --- o --- o
Layer 1:  o - o - o - o - o - o - o
Layer 0:  o o o o o o o o o o o o o  (all nodes, short connections)

HNSW Build Parameters

Parameter	Default	Description
`m`	16	Maximum number of connections per node per layer
`ef_construction`	64	Size of the dynamic candidate list during index build

-- Default parameters (good starting point)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Higher quality index (slower build, better recall)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

Parameter effects:

Change	Build Time	Index Size	Query Recall	Query Speed
Higher `m`	Slower	Larger	Better	Slightly slower
Higher `ef_construction`	Slower	Same	Better	No effect

HNSW Query Parameters

Parameter	Default	Description
`hnsw.ef_search`	40	Size of the dynamic candidate list during query

-- Increase ef_search for better recall
SET hnsw.ef_search = 100;

-- ef_search must be >= the number of results requested (LIMIT)
-- If LIMIT 50, ef_search must be at least 50

HNSW Key Characteristics

Can build on an empty table: The graph structure accommodates incremental inserts without degradation.
Incremental by design: New vectors are inserted into the graph at all appropriate layers. No periodic rebuilds needed.
Higher memory during build: Constructing the graph requires more working memory than IVFFlat k-means.
Larger index size: Each node stores m connections per layer, making HNSW indexes larger than IVFFlat.
Better recall out of the box: HNSW typically achieves higher recall than IVFFlat with default parameters.

Exam tip: HNSW is the recommended index type for most workloads on Azure Database for PostgreSQL. Choose IVFFlat only when build time is the primary constraint.

4. IVFFlat vs HNSW Comparison

Property	IVFFlat	HNSW
Algorithm	k-means clustering + flat scan	Multi-layer navigable graph
Build on empty table	No (needs data for k-means)	Yes
Incremental inserts	Degrades over time	Maintains quality
Build time	Faster	Slower (2-3x typical)
Index size	Smaller	Larger (2-3x typical)
Default recall	Lower (probes=1)	Higher (ef_search=40)
Best recall achievable	High (with many probes)	Very high
Rebuild needed	Yes, after significant data changes	No
Query latency	Good (depends on probes)	Better (typically faster)
Memory during build	Moderate	Higher

When to Choose IVFFlat

Data is loaded in bulk and rarely changes
Index build time is a constraint (large dataset, tight maintenance window)
You can schedule periodic index rebuilds
Acceptable recall with tuned probes

When to Choose HNSW

Data arrives incrementally (streaming inserts)
Best possible recall is required
Cannot tolerate recall degradation between rebuilds
Query latency is the primary concern

5. Distance Metrics and Operator Classes

Operator Classes for Index Creation

The operator class in the CREATE INDEX statement must match the distance operator used in queries. If they do not match, PostgreSQL falls back to a sequential scan.

Distance Operator	Operator Class	Metric	ORDER BY Direction
`<->`	`vector_l2_ops`	L2 (Euclidean) distance	ASC (smaller = closer)
`<=>`	`vector_cosine_ops`	Cosine distance	ASC (0 = identical)
`<#>`	`vector_ip_ops`	Negative inner product	ASC (most negative = highest similarity)

-- Index for cosine distance queries
CREATE INDEX idx_cosine ON items USING hnsw (embedding vector_cosine_ops);

-- This query USES the index:
SELECT * FROM items ORDER BY embedding <=> query_vec LIMIT 10;

-- This query DOES NOT use the index (wrong operator):
SELECT * FROM items ORDER BY embedding <-> query_vec LIMIT 10;

Exam tip: A common exam scenario presents an index created with vector_cosine_ops but a query using <-> (L2 distance). The index is not used because the operator class does not match. Always verify operator class alignment.

Choosing a Distance Metric

Metric	Best For	Notes
Cosine distance (`<=>`)	Text embeddings, semantic search	Insensitive to vector magnitude; most common for NLP
L2 distance (`<->`)	Image embeddings, spatial data	Sensitive to magnitude; measures absolute distance
Inner product (`<#>`)	Pre-normalized vectors, recommendation	Equivalent to cosine for normalized vectors; fastest computation

Exam tip: Azure OpenAI embeddings are L2-normalized. For normalized vectors, cosine distance and inner product produce the same ranking. Cosine distance is the conventional choice for text search.

6. Index Build Performance

Server Parameters for Index Builds

Parameter	Default	Recommendation	Effect
`maintenance_work_mem`	64 MB	1-2 GB for large indexes	Memory available for index build operations
`max_parallel_maintenance_workers`	2	4-8 (based on available cores)	Parallel workers for index builds (HNSW only in pgvector 0.7+)

-- Increase memory for a large HNSW build
SET maintenance_work_mem = '2GB';
SET max_parallel_maintenance_workers = 7;

-- Build the index (uses increased resources)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Reset to defaults after build
RESET maintenance_work_mem;
RESET max_parallel_maintenance_workers;

Build Time Estimates

Build time depends on dataset size, dimensions, index parameters, and available compute. Rough guidelines for 1536-dimension vectors on a Memory Optimized E4ds (4 vCPUs):

Rows	IVFFlat Build	HNSW Build
100K	~10 seconds	~30 seconds
1M	~2 minutes	~5 minutes
10M	~20 minutes	~60 minutes

Exam tip: For large datasets (>1M rows), increase maintenance_work_mem before building HNSW indexes. Insufficient memory causes the build to spill to disk and slow down dramatically.

Batch Build Strategy

For initial data loads, build the index after inserting all data, not before:

-- Step 1: Create table without index
CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)
);

-- Step 2: Bulk load all data
COPY items (content, embedding) FROM STDIN;
-- ... load data ...

-- Step 3: Build index on populated table
SET maintenance_work_mem = '2GB';
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
RESET maintenance_work_mem;

This is faster because:

IVFFlat computes better centroids with all data present
HNSW inserts are slightly faster during bulk graph construction than incremental inserts
No index maintenance overhead during the COPY operation

7. Query Performance Analysis

Using EXPLAIN ANALYZE

-- Check if an index is being used
EXPLAIN ANALYZE
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS distance
FROM items
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

What to look for in the output:

Plan Node	Meaning
`Index Scan using idx_hnsw`	HNSW index is being used
`Index Scan using idx_ivfflat`	IVFFlat index is being used
`Seq Scan`	No index used (sequential scan)
`Sort`	Results are being sorted after scan (index not covering the ORDER BY)

Common Reasons for Sequential Scan

No index exists on the vector column
Wrong operator class: Index uses vector_cosine_ops but query uses <-> (L2)
Missing ORDER BY ... LIMIT: Vector indexes require ORDER BY distance LIMIT k to be used
Planner chooses seq scan: For small tables, PostgreSQL may decide a sequential scan is faster
Filter before ordering: A restrictive WHERE clause may cause the planner to filter first, then sort

-- Force index usage for testing (not recommended for production)
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ... ORDER BY embedding <=> query LIMIT 10;
SET enable_seqscan = on;

Exam tip: Vector indexes in pgvector only accelerate ORDER BY ... LIMIT k queries. A query without LIMIT or without ORDER BY the distance operator will not use the vector index.

8. Hybrid Search

Combining Vector and Full-Text Search

Hybrid search uses both semantic similarity (vector) and keyword matching (full-text search) to improve result relevance.

-- Add a tsvector column for full-text search
ALTER TABLE items ADD COLUMN content_tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON items USING GIN (content_tsv);

-- Hybrid query: combine vector similarity with text relevance
SELECT id, content,
    embedding <=> query_vec AS vector_distance,
    ts_rank(content_tsv, plainto_tsquery('english', 'azure openai')) AS text_rank
FROM items
WHERE content_tsv @@ plainto_tsquery('english', 'azure openai')
ORDER BY embedding <=> query_vec
LIMIT 10;

Pre-filtering vs Post-filtering

Strategy	Approach	Tradeoff
Pre-filter	`WHERE category = 'ai'` before vector search	Fewer vectors to search; may miss results if filter is too restrictive
Post-filter	Vector search first, then filter results	Better recall; may return fewer than k results after filtering

-- Pre-filter: filter first, then vector search
-- Works well when the filter is not too selective
SELECT id, content
FROM items
WHERE category = 'ai'
ORDER BY embedding <=> query_vec
LIMIT 10;

-- Post-filter approach (subquery)
SELECT * FROM (
    SELECT id, content, category
    FROM items
    ORDER BY embedding <=> query_vec
    LIMIT 100  -- over-fetch to account for filtering
) sub
WHERE category = 'ai'
LIMIT 10;

Exam tip: Pre-filtering with a WHERE clause on indexed columns (B-tree) combined with vector ORDER BY can leverage partial index scans. If the filter is very selective, PostgreSQL may choose a sequential scan because fewer rows need to be checked.

9. Scaling Strategies

Partitioning

For very large tables (>10M vectors), consider partitioning by a non-vector column:

-- Range partition by date
CREATE TABLE items (
    id SERIAL,
    content TEXT,
    embedding vector(1536),
    created_at DATE
) PARTITION BY RANGE (created_at);

CREATE TABLE items_2024 PARTITION OF items
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Each partition gets its own vector index
CREATE INDEX ON items_2024 USING hnsw (embedding vector_cosine_ops);

Benefits:

Smaller indexes per partition (faster builds, better cache utilization)
Queries that include a partition key in WHERE skip irrelevant partitions
Old partitions can be dropped without rebuilding indexes

Dimension Reduction

Reducing embedding dimensions lowers storage, index size, and query time:

# Use the dimensions parameter with text-embedding-3-* models
response = client.embeddings.create(
    input=["text to embed"],
    model="text-embedding-3-small",
    dimensions=512  # reduced from default 1536
)

Dimensions	Storage per Vector	Index Size Impact	Recall Impact
1536 (default)	6,152 bytes	Baseline	Baseline
768	3,080 bytes	~50% smaller	Slight decrease
512	2,056 bytes	~33% of baseline	Moderate decrease
256	1,032 bytes	~17% of baseline	Noticeable decrease

Exam tip: Only text-embedding-3-small and text-embedding-3-large support the dimensions parameter. text-embedding-ada-002 always outputs 1536 dimensions.

10. pgvector Version Considerations

Version-Specific Features

Feature	pgvector 0.5.x	pgvector 0.7.x
Max dimensions	2,000	16,000
HNSW index	Yes	Yes
IVFFlat index	Yes	Yes
Parallel HNSW build	No	Yes
Halfvec (float16) type	No	Yes
Sparse vectors (sparsevec)	No	Yes

Halfvec for Reduced Storage

pgvector 0.7+ supports halfvec (float16 vectors) which halve storage at the cost of precision:

-- Create a halfvec column (half the storage of vector)
CREATE TABLE items_half (
    id SERIAL PRIMARY KEY,
    embedding halfvec(1536)
);

-- Indexes work the same way
CREATE INDEX ON items_half USING hnsw (embedding halfvec_cosine_ops);

Type	Storage per Value	Storage per 1536-dim Vector
`vector` (float32)	4 bytes	6,152 bytes
`halfvec` (float16)	2 bytes	3,080 bytes

Exam tip: halfvec uses different operator classes: halfvec_l2_ops, halfvec_cosine_ops, halfvec_ip_ops. Do not mix vector_*_ops with halfvec columns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Search Optimization - Study Notes

1. Vector Index Fundamentals

Why Vector Indexes Matter

Recall vs Latency Tradeoff

2. IVFFlat (Inverted File with Flat Compression)

How IVFFlat Works

IVFFlat Build Parameters

IVFFlat Query Parameters

IVFFlat Key Characteristics

3. HNSW (Hierarchical Navigable Small World)

How HNSW Works

HNSW Build Parameters

HNSW Query Parameters

HNSW Key Characteristics

4. IVFFlat vs HNSW Comparison

When to Choose IVFFlat

When to Choose HNSW

5. Distance Metrics and Operator Classes

Operator Classes for Index Creation

Choosing a Distance Metric

6. Index Build Performance

Server Parameters for Index Builds

Build Time Estimates

Batch Build Strategy

7. Query Performance Analysis

Using EXPLAIN ANALYZE

Common Reasons for Sequential Scan

8. Hybrid Search

Combining Vector and Full-Text Search

Pre-filtering vs Post-filtering

9. Scaling Strategies

Partitioning

Dimension Reduction

10. pgvector Version Considerations

Version-Specific Features

Halfvec for Reduced Storage

FilesExpand file tree

notes.md

Latest commit

History

notes.md

File metadata and controls

Vector Search Optimization - Study Notes

1. Vector Index Fundamentals

Why Vector Indexes Matter

Recall vs Latency Tradeoff

2. IVFFlat (Inverted File with Flat Compression)

How IVFFlat Works

IVFFlat Build Parameters

IVFFlat Query Parameters

IVFFlat Key Characteristics

3. HNSW (Hierarchical Navigable Small World)

How HNSW Works

HNSW Build Parameters

HNSW Query Parameters

HNSW Key Characteristics

4. IVFFlat vs HNSW Comparison

When to Choose IVFFlat

When to Choose HNSW

5. Distance Metrics and Operator Classes

Operator Classes for Index Creation

Choosing a Distance Metric

6. Index Build Performance

Server Parameters for Index Builds

Build Time Estimates

Batch Build Strategy

7. Query Performance Analysis

Using EXPLAIN ANALYZE

Common Reasons for Sequential Scan

8. Hybrid Search

Combining Vector and Full-Text Search

Pre-filtering vs Post-filtering

9. Scaling Strategies

Partitioning

Dimension Reduction

10. pgvector Version Considerations

Version-Specific Features

Halfvec for Reduced Storage