This document describes the performance optimizations made to the FAISS vector search service and how to tune them for different environments.
Current Setting: 128 (optimized for typical systems)
How to adjust:
# In backend/services/metadata_loader.py or when initializing FAISSVectorSearch:
search = FAISSVectorSearch(default_batch_size=256) # Larger batch for GPU/high-memory systems
search = FAISSVectorSearch(default_batch_size=64) # Smaller batch for low-memory systemsBatch Size Recommendations:
| System | RAM | GPU | Recommended Size |
|---|---|---|---|
| Low-end (4GB RAM, CPU only) | <8GB | No | 32-64 |
| Mid-range (8GB RAM, CPU) | 8-16GB | No | 96-128 |
| High-end (16GB+ RAM) | 16GB+ | No | 128-256 |
| GPU-enabled | Any | Yes | 256-512 |
Performance Impact:
- Larger batch = faster throughput, more memory
- Smaller batch = slower throughput, less memory
- Sweet spot for CPU: 128
Current Behavior: Automatic
- Enabled by default
- Built incrementally during indexing
- Persisted to disk for reuse
- ~1 MB per 30,000 indicators
To disable caching (if storage is limited):
# In faiss_vector_search.py _save_index()
# Comment out: self._save_embedding_cache()Cache Hit Rate:
- First run: 0% (building cache)
- Second run: 10-20% (typical for metadata)
- Subsequent runs: Consistent, depends on data changes
File Structure:
backend/data/faiss_index/
├── economic_indicators.index # FAISS index (~45 MB)
├── economic_indicators_metadata.json # Indicator metadata (~200 KB)
├── economic_indicators_id_map.json # ID mapping (~50 KB)
└── economic_indicators_embedding_cache.json # Embeddings (~1 MB)
Storage Requirements:
- Total: ~50 MB for 30,000+ indicators
- Index growth: ~1.4 KB per indicator
- Cache growth: ~32 bytes per unique embedding
Look for these metrics in logs:
# Batch size confirmation
Default batch size: 128
# Throughput (texts/sec)
Embedding throughput: 203.1 texts/sec
# Target: >100 texts/sec with batch 128
# Cache statistics
Cache stats: 5234 hits / 31725 checks (16.5% hit rate)
Duplicates skipped: 2847
Check optimization metrics:
# Get current stats
curl http://localhost:3001/api/health | jq '.faiss_stats'
# Expected output:
{
"default_batch_size": 128,
"cache_entries": 26847,
"cache_hits": 5234,
"cache_misses": 26491,
"cache_hit_rate": 16.5,
"duplicates_skipped": 2847
}Possible causes:
- Batch size too small (default should be 128)
- System low on RAM (causing swapping)
- Disk I/O bottleneck (slow storage)
- CPU underpowered
Solutions:
# Increase batch size if you have RAM
search = FAISSVectorSearch(default_batch_size=256)
# Or reduce batch size if running out of memory
search = FAISSVectorSearch(default_batch_size=64)Possible causes:
- Cache is very large (unlikely with 30k indicators)
- Model not unloaded after indexing
- Other processes consuming memory
Solutions:
- Disable cache if needed (see "To disable caching" above)
- Ensure no other indexing operations are running
- Check system memory with
free -h
Possible causes:
- Cache file deleted
- Index cleared but cache not cleared
- Different texts being embedded
Solutions:
# Check if cache file exists
ls -lh backend/data/faiss_index/*cache*
# Verify cache loading in logs
grep "Loaded.*cached embeddings" /tmp/backend-production.logBatch Size: 128
Model: all-MiniLM-L6-v2 (384-dim)
Indicators: 31,725
Indexing Time: ~156 seconds (2.6 minutes)
Throughput: ~203 texts/sec
Cache Hit Rate: ~16.5% (first re-index)
Total Memory: ~50 MB
| Indicators | Expected Time |
|---|---|
| 1,000 | 8 seconds |
| 5,000 | 25 seconds |
| 10,000 | 50 seconds |
| 31,725 | 156 seconds (2.6 min) |
| 100,000 | ~500 seconds (8.3 min) |
# Use a different sentence-transformer model
search = FAISSVectorSearch(
model_name="sentence-transformers/all-mpnet-base-v2", # 768-dim, slower
default_batch_size=64 # Reduce batch for larger embeddings
)Model Comparison:
| Model | Dimension | Speed | Quality |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | ⚡⚡⚡ | ⭐⭐⭐⭐ |
| all-mpnet-base-v2 | 768 | ⚡⚡ | ⭐⭐⭐⭐⭐ |
| all-distilroberta-v1 | 768 | ⚡⚡ | ⭐⭐⭐⭐ |
# If GPU available, sentence-transformers uses it automatically
# No configuration needed - just ensure CUDA is installed
# To check GPU usage:
# nvidia-smi (during indexing)- During low-traffic periods (e.g., 2-4 AM)
- When system has plenty of free RAM
- When using cached embeddings (subsequent runs)
- Search queries use <5ms (unchanged by optimizations)
- Vector search is not a bottleneck
- Focus optimization efforts on indexing only
- During indexing, disk I/O happens at start/end
- Mid-indexing is CPU-bound (not disk-bound)
- Ensure
/tmpandbackend/data/are on fast storage
- Cache contains computed embeddings (pre-computed vectors)
- No sensitive data in embeddings
- Safe to share, archive, or version control
- JSON format prevents code execution attacks
- Changed from pickle (code execution risk) to JSON (safe)
- Automatic migration from old format
- No sensitive data exposed
- FAISS Optimization Summary - Detailed changes
- Vector Search Module - API reference
- Metadata Loader - Integration point
- Check logs:
/tmp/backend-production.log - Review test results:
python scripts/test_faiss_optimization.py - Monitor stats:
curl http://localhost:3001/api/health
Last Updated: November 22, 2025 Version: 1.0 (Initial optimization)