Skip to content

Latest commit

 

History

History
237 lines (179 loc) · 6.47 KB

File metadata and controls

237 lines (179 loc) · 6.47 KB

Cheat Sheet - Vector Search Optimization

Index Creation Commands

HNSW (Recommended)

-- Default parameters (good starting point)
CREATE INDEX idx_hnsw ON items
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- High-quality index (slower build, better recall)
CREATE INDEX idx_hnsw_hq ON items
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

-- L2 distance index
CREATE INDEX idx_hnsw_l2 ON items
USING hnsw (embedding vector_l2_ops);

-- Inner product index
CREATE INDEX idx_hnsw_ip ON items
USING hnsw (embedding vector_ip_ops);

-- Halfvec index (pgvector 0.7+)
CREATE INDEX idx_hnsw_half ON items
USING hnsw (embedding halfvec_cosine_ops);

IVFFlat

-- Guideline: lists = rows/1000 for <1M rows, sqrt(rows) for >=1M rows

-- 100K rows → lists = 100
CREATE INDEX idx_ivfflat ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- 4M rows → lists = 2000
CREATE INDEX idx_ivfflat_large ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 2000);

Drop and Rebuild

-- Drop an index
DROP INDEX IF EXISTS idx_hnsw;

-- Rebuild IVFFlat after significant data changes
DROP INDEX idx_ivfflat;
CREATE INDEX idx_ivfflat ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 500);

-- Rebuild concurrently (no table lock, slower build)
CREATE INDEX CONCURRENTLY idx_hnsw ON items
USING hnsw (embedding vector_cosine_ops);

Query-Time Parameter Tuning

-- IVFFlat: increase probes for better recall (default = 1)
SET ivfflat.probes = 10;
-- Guideline: start with sqrt(lists)

-- HNSW: increase ef_search for better recall (default = 40)
SET hnsw.ef_search = 100;
-- Must be >= LIMIT value

-- Set per-transaction (resets after transaction ends)
BEGIN;
SET LOCAL ivfflat.probes = 20;
SELECT * FROM items ORDER BY embedding <=> $1 LIMIT 10;
COMMIT;

-- Set per-session
SET ivfflat.probes = 10;  -- lasts until connection closes

-- Reset to defaults
RESET ivfflat.probes;
RESET hnsw.ef_search;

Index Build Optimization

-- Increase memory for large index builds
SET maintenance_work_mem = '2GB';

-- Use parallel workers for HNSW builds (pgvector 0.7+)
SET max_parallel_maintenance_workers = 7;

-- Build the index
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

-- Reset after build
RESET maintenance_work_mem;
RESET max_parallel_maintenance_workers;

EXPLAIN ANALYZE Patterns

-- Check index usage
EXPLAIN ANALYZE
SELECT id, content
FROM items
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

-- Good output: "Index Scan using idx_hnsw"
-- Bad output: "Seq Scan" (index not used)

-- Force index usage for testing
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ...;
SET enable_seqscan = on;

Distance Operator ↔ Operator Class Matrix

Query Operator CREATE INDEX Operator Class Metric
<-> vector_l2_ops L2 / Euclidean
<=> vector_cosine_ops Cosine distance
<#> vector_ip_ops Negative inner product
<-> (halfvec) halfvec_l2_ops L2 (float16)
<=> (halfvec) halfvec_cosine_ops Cosine (float16)
<#> (halfvec) halfvec_ip_ops Inner product (float16)

Hybrid Search Snippets

-- Full-text search column (generated, auto-updates)
ALTER TABLE items ADD COLUMN content_tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON items USING GIN (content_tsv);

-- Hybrid: keyword filter + vector ranking
SELECT id, content,
    embedding <=> $1 AS distance
FROM items
WHERE content_tsv @@ plainto_tsquery('english', 'search terms')
ORDER BY embedding <=> $1
LIMIT 10;

-- Pre-filter by metadata + vector ranking
SELECT id, content
FROM items
WHERE category = 'ai'
ORDER BY embedding <=> $1
LIMIT 10;

Python — Query-Time Tuning

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("host=... dbname=... sslmode=require")
register_vector(conn)

# Set query-time parameters
with conn.cursor() as cur:
    cur.execute("SET hnsw.ef_search = 100")
    cur.execute(
        "SELECT id, content, embedding <=> %s AS distance "
        "FROM items ORDER BY embedding <=> %s LIMIT 10",
        (query_vec, query_vec)
    )
    results = cur.fetchall()

Important Defaults

Index Build Defaults

Setting Default
HNSW m 16
HNSW ef_construction 64
IVFFlat lists Must be specified
maintenance_work_mem 64 MB
max_parallel_maintenance_workers 2

Query-Time Defaults

Setting Default
HNSW ef_search 40
IVFFlat probes 1

Parameter Tuning Guidelines

Parameter Starting Point Effect of Increase
IVFFlat lists rows/1000 (< 1M), sqrt(rows) (>= 1M) More clusters, finer partitioning
IVFFlat probes sqrt(lists) Better recall, slower queries
HNSW m 16 Better recall, larger index, slower build
HNSW ef_construction 64 Better recall, slower build
HNSW ef_search 40 Better recall, slower queries

Common Exam Gotchas

Index Selection Gotchas

  1. IVFFlat cannot build on an empty table. k-means needs data for centroids. An empty-table IVFFlat index produces no results.
  2. HNSW can build on empty tables. The graph accommodates incremental inserts.
  3. IVFFlat needs periodic rebuilds after significant data changes; HNSW does not.
  4. HNSW is recommended for most workloads on Azure Flexible Server.

Operator Class Gotchas

  1. Operator class must match the query operator. Index with vector_cosine_ops only accelerates <=> queries, not <->.
  2. halfvec uses separate operator classes (halfvec_cosine_ops, not vector_cosine_ops).

Query Pattern Gotchas

  1. Vector indexes require ORDER BY ... LIMIT k. Without LIMIT, the planner does a sequential scan.
  2. probes = 1 is the IVFFlat default — far too low for production. Always increase probes.
  3. ef_search must be >= LIMIT. If LIMIT 50 but ef_search is 40, results are incomplete.

Build Performance Gotchas

  1. Default maintenance_work_mem (64 MB) is too low for large indexes. Increase to 1-2 GB before building.
  2. Parallel HNSW build requires pgvector 0.7+. Earlier versions build single-threaded.
  3. Build index AFTER bulk loading data. Inserting into an indexed table is slower than loading then indexing.