Benchmarking Guide

This guide explains how to use CrossVector's benchmarking tool to measure and compare performance across different database backends and embedding providers.

Overview

The benchmark tool (scripts/benchmark.py) provides comprehensive performance testing for:

4 Database Backends: PgVector, AstraDB, Milvus, ChromaDB
2 Embedding Providers: OpenAI, Gemini
7 Operation Types: Bulk create, individual create, vector search, metadata search, Query DSL operators, updates, deletes

Quick Start

Basic Usage

# Test all backends with both embeddings (10 documents)
python scripts/benchmark.py --num-docs 10

# Full benchmark with 1000 documents
python scripts/benchmark.py

# Test specific configuration
python scripts/benchmark.py --backends pgvector milvus --embedding-providers openai --num-docs 100

Command Line Options

python scripts/benchmark.py [OPTIONS]

Options:
  --num-docs INT                    Number of documents to test (default: 1000)
  --backends NAME [NAME ...]        Specific backends: pgvector, astradb, milvus, chroma
  --embedding-providers NAME        Embedding providers: openai, gemini
  --skip-slow                       Skip slow cloud backends (astradb, milvus) for faster testing
  --search-limit INT                Number of results to return in search operations (default: 100)
  --collection-name STR             Custom collection name (default: auto-generate with UUID8)
  --timeout INT                     Timeout per backend test in seconds (default: 60)
  --output PATH                     Output file path (default: benchmark.md)
  --use-fixtures PATH               Path to pre-generated fixtures JSON file
  --add-vectors                     Generate and add vectors to fixture documents

What Gets Measured

1. Upsert Performance

Measures throughput for batch document upsert with automatic embedding generation.

Metrics:

Duration (seconds)
Throughput (docs/sec)

2. Individual Create Performance

Tests single document creation with embedding generation.

Metrics:

Average duration per document

3. Vector Search Performance

Semantic similarity search using embedded queries.

Metrics:

Average query duration (10 queries tested)
Queries per second

4. Metadata-Only Search

Filtering without vector similarity (if supported by backend).

Metrics:

Average query duration
Support status

5. Query DSL Operators

Tests all 10 Query DSL operators:

Comparison: eq, ne, gt, gte, lt, lte
Array: in, nin
Logical: and (&), or (|)

Metrics:

Average operator query duration
Number of operators successfully tested

6. Update Operations

Document update performance.

Metrics:

Average update duration (100 updates tested)

7. Delete Operations

Batch deletion throughput.

Metrics:

Duration
Throughput (docs/sec)

Prerequisites

Required Environment Variables

Embedding Providers (at least one required):

# OpenAI
export OPENAI_API_KEY=sk-...

# Gemini
export GEMINI_API_KEY=AI...

Database Backends (optional, will skip if not configured):

# PgVector
export PGVECTOR_HOST=localhost
export PGVECTOR_PORT=5432
export PGVECTOR_DATABASE=vector_db
export PGVECTOR_USER=postgres
export PGVECTOR_PASSWORD=postgres
# Or use connection string:
export PGVECTOR_CONNECTION_STRING=postgresql://user:pass@host:port/db

# AstraDB
export ASTRADB_API_ENDPOINT=https://...apps.astra.datastax.com
export ASTRADB_APPLICATION_TOKEN=AstraCS:...

# Milvus
export MILVUS_API_ENDPOINT=https://...
export MILVUS_API_TOKEN=...

# ChromaDB (optional for cloud)
export CHROMA_HOST=api.trychroma.com
export CHROMA_API_KEY=ck-...
export CHROMA_TENANT=...
export CHROMA_DATABASE=Test

Running Benchmarks

Recommended Workflow

Step 1: Quick Verification (1-2 minutes)

Test that everything is configured correctly:

python scripts/benchmark.py --num-docs 1 --backends pgvector --embedding-providers openai

Step 2: Fast Comparison (5-10 minutes)

Compare all backends with small dataset:

python scripts/benchmark.py --num-docs 10

This runs 8 combinations (4 backends × 2 embeddings) with 10 documents each.

Step 3: Production Benchmark (30-60 minutes)

Full performance test with larger dataset:

python scripts/benchmark.py --num-docs 1000 --output benchmark_full.md

Note: This will:

Make ~1000+ API calls to embedding providers
Take 30-60 minutes depending on network and API rate limits
Cost approximately $0.10-0.20 in API fees

Targeted Benchmarks

Test Specific Backend

# Only PgVector with both embeddings
python scripts/benchmark.py --backends pgvector --num-docs 100

Test Specific Embedding

# All backends with only OpenAI
python scripts/benchmark.py --embedding-providers openai --num-docs 100

Compare Two Backends

# PgVector vs Milvus
python scripts/benchmark.py --backends pgvector milvus --num-docs 100

Understanding Results

Output Format

Results are saved as a markdown file (default: benchmark.md) with:

Performance Summary Table - Quick comparison across all combinations
Detailed Results - Individual metrics for each backend+embedding pair
Notes - Configuration and methodology details

Example Output

## Performance Summary

| Backend | Embedding | Model | Dim | Upsert | Search (avg) | Update (avg) | Delete (batch) | Status |
|---------|-----------|-------|-----|--------|--------------|--------------|----------------|--------|
| pgvector | openai | text-embedding-3-small | 1536 | 7.06s | 21.26ms | 6.21ms | 22.63ms | OK |
| astradb | openai | text-embedding-3-small | 1536 | 18.89s | 23.86s | 1.11s | 15.15s | OK |
| milvus | openai | text-embedding-3-small | 1536 | 7.94s | 654.43ms | 569.52ms | 2.17s | OK |
| chroma | openai | text-embedding-3-small | 1536 | 17.08s | 654.76ms | 1.23s | 4.73s | OK |
| pgvector | gemini | models/gemini-embedding-001 | 1536 | 6.65s | 18.72ms | 6.40ms | 20.25ms | OK |
| astradb | gemini | models/gemini-embedding-001 | 1536 | 11.25s | 6.71s | 903.37ms | 15.05s | OK |
| milvus | gemini | models/gemini-embedding-001 | 1536 | 6.14s | 571.90ms | 561.38ms | 1.91s | OK |
| chroma | gemini | models/gemini-embedding-001 | 1536 | 18.93s | 417.28ms | 1.24s | 4.63s | OK |

Interpreting Metrics

Bulk Create:

Lower duration = better
Higher throughput (docs/sec) = better
Gemini typically slower due to API rate limits

Search:

Lower average duration = better
Milvus typically fastest for vector search
Gemini often faster than OpenAI for search (smaller vectors)

Updates & Deletes:

Lower duration = better
PgVector typically fast for updates due to SQL efficiency

Query DSL Operators:

Should test 10/10 operators successfully
Duration typically <1ms for metadata operations

Performance Tips

For Better Results

Stable Network: Run benchmarks on stable network connection
Isolated Environment: Avoid running other heavy processes
Warm-up: First run may be slower due to cold starts
Multiple Runs: Run 2-3 times and use median values for important decisions

API Rate Limits

Be aware of rate limits:

OpenAI: 3,500 requests/min (Tier 2)
Gemini: 1,500 requests/min (free tier)

For large benchmarks (--num-docs 1000+), the tool will automatically pace requests.

Avoiding quota errors

If you see Gemini RESOURCE_EXHAUSTED, rerun with --embedding-providers openai or reduce --num-docs.
To avoid embedding API calls entirely, provide fixtures with vectors (e.g., --use-fixtures scripts/benchmark/data/openai_3.json --add-vectors) or let the tool generate static vectors when configured.
Keep long runs to a single backend to reduce concurrent calls (e.g., --backends pgvector).

Comparing Before/After Changes

When optimizing performance:

# Before changes
python scripts/benchmark.py --num-docs 100 --output benchmark_before.md

# Make your changes to code

# After changes
python scripts/benchmark.py --num-docs 100 --output benchmark_after.md

# Compare the two markdown files
diff benchmark_before.md benchmark_after.md

Or use a markdown diff tool for better visualization.

Troubleshooting

Backend Not Available

If you see:

AstraDB not available: Missing ASTRADB_API_ENDPOINT

Solution: Set the required environment variables or the backend will be skipped.

Embedding API Errors

If you see rate limit errors:

bulk_create failed: Rate limit exceeded

Solutions:

Reduce --num-docs
Wait and retry
Check API quota/billing

Slow Performance

If benchmarks are unexpectedly slow:

Check network latency to database
Verify database is not under load
Check API rate limits aren't being hit
Try reducing --num-docs for faster iterations

Advanced Usage

Custom Test Data

Modify scripts/benchmark.py to use custom test data:

# In generate_documents() function
SAMPLE_TEXTS = [
    "Your custom text 1",
    "Your custom text 2",
    # ...
]

Adding Custom Metrics

Extend benchmark_backend() method to add custom metrics:

# In BenchmarkRunner.benchmark_backend()
# After existing benchmarks, add:

# Custom metric
print("\nCustom Metric...")
duration, result = benchmark_operation("custom", lambda: engine.custom_operation())
results["custom_metric"] = {"duration": duration}

Cost Estimation

Approximate costs for running benchmarks:

Documents	OpenAI Cost	Gemini Cost	Total Time
10	$0.001	Free	2-5 min
100	$0.01	Free	10-15 min
1000	$0.10	Free	30-60 min

Note: Costs are approximate and depend on:

Embedding model used
Document text length
Current API pricing

For Gemini, the free tier typically covers benchmarking needs.

Best Practices

Start Small: Always test with --num-docs 10 first
Document Results: Save benchmark outputs with timestamps
Consistent Environment: Run on same machine/network for comparisons
Version Control: Commit benchmark results with code changes
CI/CD Integration: Consider running small benchmarks in CI for regression testing

Examples

Example 1: Quick Backend Comparison

# Compare PgVector and Milvus with 50 docs
python scripts/benchmark.py --backends pgvector milvus --num-docs 50

Example 2: Embedding Provider Comparison

# Test which embedding is faster for your use case
python scripts/benchmark.py --backends pgvector --num-docs 200

Example 3: Pre-Release Validation

# Full benchmark before major release
python scripts/benchmark.py --num-docs 1000 --output release_v1.0_benchmark.md

Example 4: Query Performance Focus

# Test with more documents to stress search performance
python scripts/benchmark.py --backends milvus --num-docs 5000

Contributing

Found a performance issue or want to add a new benchmark metric? See Contributing Guide.

FilesExpand file tree

benchmarking.md

Latest commit

History

benchmarking.md

File metadata and controls

Benchmarking Guide

Overview

Quick Start

Basic Usage

Command Line Options

What Gets Measured

1. Upsert Performance

2. Individual Create Performance

3. Vector Search Performance

4. Metadata-Only Search

5. Query DSL Operators

6. Update Operations

7. Delete Operations

Prerequisites

Required Environment Variables

Running Benchmarks

Recommended Workflow

Step 1: Quick Verification (1-2 minutes)

Step 2: Fast Comparison (5-10 minutes)

Step 3: Production Benchmark (30-60 minutes)

Targeted Benchmarks

Test Specific Backend

Test Specific Embedding

Compare Two Backends

Understanding Results

Output Format

Example Output

Interpreting Metrics

Performance Tips

For Better Results

API Rate Limits

Avoiding quota errors

Comparing Before/After Changes

Troubleshooting

Backend Not Available

Embedding API Errors

Slow Performance

Advanced Usage

Custom Test Data

Adding Custom Metrics

Cost Estimation

Best Practices

Examples

Example 1: Quick Backend Comparison

Example 2: Embedding Provider Comparison

Example 3: Pre-Release Validation

Example 4: Query Performance Focus

Contributing