Complete reference for all configuration options in the Ask Render Anything Assistant.
- Environment Variables
- Pipeline Configuration
- Model Selection
- RAG Configuration
- Quality Thresholds
- Performance Tuning
These must be set for the application to run:
# OpenAI API key for embeddings and GPT models
OPENAI_API_KEY=sk-proj-...
# Anthropic API key for Claude models
ANTHROPIC_API_KEY=sk-ant-...
# Logfire token for observability
LOGFIRE_TOKEN=...
# PostgreSQL database URL (with pgvector extension)
DATABASE_URL=postgresql://user:password@host:5432/dbnameThese have sensible defaults but can be customized:
# Quality and Iteration Settings
QUALITY_THRESHOLD=85 # Minimum quality score (0-100)
ACCURACY_THRESHOLD=90 # Minimum accuracy score (0-100)
MAX_ITERATIONS=3 # Max refinement attempts
MAX_TOKENS=2000 # Answer generation token limit
# RAG Settings
RAG_TOP_K=10 # Number of documents to retrieve
SIMILARITY_THRESHOLD=0.75 # Minimum similarity score (0-1)
BM25_WEIGHT=0.4 # Weight for BM25 in hybrid search (0-1)
# Embedding Settings
EMBEDDING_MODEL=text-embedding-3-small # OpenAI embedding model
EMBEDDING_DIMENSIONS=1536 # Embedding vector dimensions
# Model Selection
ANSWER_MODEL=claude-sonnet-4-5-20250929 # Primary answer generation model
CLAIMS_MODEL=gpt-5.4-mini # Claims extraction model
ACCURACY_MODEL=claude-sonnet-4-20250514 # Accuracy checking model
EVAL_MODEL_OPENAI=gpt-5.4-mini # OpenAI evaluator model
EVAL_MODEL_ANTHROPIC=claude-sonnet-4-20250514 # Anthropic evaluator model
# Performance Settings
TIMEOUT_SECONDS=30 # Per-stage timeout
ENABLE_CACHING=true # Cache embeddings
LOG_LEVEL=INFO # Logging verbosity (DEBUG, INFO, WARNING, ERROR)
# Frontend Settings (for deployment)
VITE_API_URL=http://localhost:8000 # Backend API URLEdit backend/config.py to customize pipeline behavior:
class PipelineConfig:
# Quality thresholds
QUALITY_THRESHOLD = 85 # Minimum acceptable quality score
ACCURACY_THRESHOLD = 90 # Minimum technical accuracy
AGREEMENT_THRESHOLD = 10 # Max score difference between evaluators
# Performance tuning
MAX_ITERATIONS = 3 # Maximum refinement attempts
MAX_TOKENS = 2000 # Output token limit for generation
TIMEOUT_SECONDS = 30 # Per-stage timeout in seconds
# RAG configuration
RAG_TOP_K = 10 # Number of documents to retrieve
SIMILARITY_THRESHOLD = 0.75 # Minimum similarity score
BM25_WEIGHT = 0.4 # BM25 weight in hybrid search (0-1)
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSIONS = 1536
# Model selection
ANSWER_MODEL = "claude-sonnet-4-5-20250929"
CLAIMS_MODEL = "gpt-5.4-mini"
ACCURACY_MODEL = "claude-sonnet-4-20250514"
EVAL_MODELS = ["gpt-5.4-mini", "claude-sonnet-4-20250514"]# Embedding models
"text-embedding-3-small" # Recommended, good balance of cost/quality
"text-embedding-3-large" # Higher quality, more expensive
"text-embedding-ada-002" # Legacy model, not recommended
# Chat models
"gpt-5.4-mini" # Latest, recommended
"gpt-4-turbo" # Previous generation
"gpt-3.5-turbo" # Cheapest, lower quality"claude-sonnet-4-5-20250929" # Latest Sonnet, best quality
"claude-sonnet-4-20250514" # Previous Sonnet version
"claude-haiku-4" # Fast and cheap
"claude-opus-4" # Most capable, most expensiveCost-Optimized:
ANSWER_MODEL = "claude-haiku-4" # Cheapest Claude
CLAIMS_MODEL = "gpt-3.5-turbo" # Cheapest OpenAI
ACCURACY_MODEL = "gpt-5.4-mini" # Balance
EVAL_MODEL_OPENAI = "gpt-3.5-turbo"
EVAL_MODEL_ANTHROPIC = "claude-haiku-4"Quality-Optimized:
ANSWER_MODEL = "claude-opus-4" # Best Anthropic
CLAIMS_MODEL = "gpt-5.4-mini" # Best OpenAI for structured output
ACCURACY_MODEL = "claude-sonnet-4-5-20250929"
EVAL_MODEL_OPENAI = "gpt-5.4-mini"
EVAL_MODEL_ANTHROPIC = "claude-opus-4"Balanced (Default):
ANSWER_MODEL = "claude-sonnet-4-5-20250929" # Good balance
CLAIMS_MODEL = "gpt-5.4-mini" # Fast and cheap
ACCURACY_MODEL = "claude-sonnet-4-20250514" # Reliable
EVAL_MODEL_OPENAI = "gpt-5.4-mini"
EVAL_MODEL_ANTHROPIC = "claude-sonnet-4-20250514"# Number of documents to retrieve
RAG_TOP_K = 10 # Increase for more context, decrease for faster retrieval
# Minimum similarity threshold (0-1)
SIMILARITY_THRESHOLD = 0.75 # Lower = more permissive, higher = more strict
# BM25 weight (0-1)
# 0 = pure semantic search
# 1 = pure BM25 lexical search
# 0.4 = 60% semantic, 40% BM25 (recommended)
BM25_WEIGHT = 0.4
# RRF constant (for combining rankings)
RRF_K = 60 # Lower = more weight to top-ranked items# Embedding model
EMBEDDING_MODEL = "text-embedding-3-small"
# Embedding dimensions
EMBEDDING_DIMENSIONS = 1536 # Must match model
# Batch size for embedding generation
EMBEDDING_BATCH_SIZE = 100
# Enable embedding caching
ENABLE_EMBEDDING_CACHE = TrueEdit data/scripts/generate_embeddings.py:
# Chunk size (characters)
CHUNK_SIZE = 1000 # Larger = more context, fewer chunks
# Chunk overlap (characters)
CHUNK_OVERLAP = 200 # Overlap between chunks for continuity
# Minimum chunk size
MIN_CHUNK_SIZE = 100 # Discard very small chunks# Minimum quality score to accept answer (0-100)
QUALITY_THRESHOLD = 85
# Lower = fewer iterations, faster responses, lower quality
# Higher = more iterations, slower responses, higher quality
# Minimum accuracy score (0-100)
ACCURACY_THRESHOLD = 90
# How technically accurate must the answer be?
# Maximum score difference between evaluators (0-100)
AGREEMENT_THRESHOLD = 10
# If evaluators disagree by more than this, trigger refinement# Maximum number of refinement iterations
MAX_ITERATIONS = 3
# Higher = better quality, higher cost
# Lower = faster responses, lower cost
# Minimum improvement required to continue iterating
MIN_IMPROVEMENT = 5 # Score must improve by at least 5 points
# Early stopping if score is very high
EXCELLENT_THRESHOLD = 95 # Stop iterating if score exceeds this# Reduce token limits
MAX_TOKENS = 1500 # Down from 2000
# Reduce retrieval count
RAG_TOP_K = 5 # Down from 10
# Use faster models
ANSWER_MODEL = "claude-haiku-4" # Faster than Sonnet
# Disable less critical stages (not recommended)
ENABLE_CLAIMS_VERIFICATION = False
ENABLE_ACCURACY_CHECK = False# Use cheaper models
ANSWER_MODEL = "claude-haiku-4"
CLAIMS_MODEL = "gpt-3.5-turbo"
ACCURACY_MODEL = "gpt-5.4-mini"
# Reduce token limits
MAX_TOKENS = 1000
# Lower quality threshold (fewer iterations)
QUALITY_THRESHOLD = 75
# Limit iterations
MAX_ITERATIONS = 1
# Enable aggressive caching
ENABLE_CACHING = True
CACHE_TTL_SECONDS = 3600 # 1 hour# Use best models
ANSWER_MODEL = "claude-opus-4"
CLAIMS_MODEL = "gpt-5.4-mini"
ACCURACY_MODEL = "claude-sonnet-4-5-20250929"
# Increase token limits
MAX_TOKENS = 3000
# Higher quality threshold
QUALITY_THRESHOLD = 90
# More iterations allowed
MAX_ITERATIONS = 5
# More RAG context
RAG_TOP_K = 15Pipeline times out:
# Increase timeout
TIMEOUT_SECONDS = 60
# Or reduce token limit
MAX_TOKENS = 1500High iteration rate:
# Lower quality threshold
QUALITY_THRESHOLD = 75
# Or improve RAG retrieval
RAG_TOP_K = 15
SIMILARITY_THRESHOLD = 0.70Costs too high:
# Use cheaper models
ANSWER_MODEL = "claude-haiku-4"
CLAIMS_MODEL = "gpt-3.5-turbo"
# Reduce iterations
MAX_ITERATIONS = 1
QUALITY_THRESHOLD = 75
# Reduce token limit
MAX_TOKENS = 1000Low quality scores:
# Use better models
ANSWER_MODEL = "claude-sonnet-4-5-20250929"
# More context
RAG_TOP_K = 15
# More output tokens
MAX_TOKENS = 2500
# More iterations
MAX_ITERATIONS = 5- Pipeline Guide - Detailed pipeline stages
- Observability Guide - Monitoring and instrumentation
- Hybrid Search Deep-Dive - Technical details on retrieval