Skip to content

Latest commit

 

History

History
379 lines (276 loc) · 9.11 KB

File metadata and controls

379 lines (276 loc) · 9.11 KB

Configuration Guide

Complete reference for all configuration options in the Ask Render Anything Assistant.

Table of Contents


Environment Variables

Required Variables

These must be set for the application to run:

# OpenAI API key for embeddings and GPT models
OPENAI_API_KEY=sk-proj-...

# Anthropic API key for Claude models
ANTHROPIC_API_KEY=sk-ant-...

# Logfire token for observability
LOGFIRE_TOKEN=...

# PostgreSQL database URL (with pgvector extension)
DATABASE_URL=postgresql://user:password@host:5432/dbname

Optional Variables

These have sensible defaults but can be customized:

# Quality and Iteration Settings
QUALITY_THRESHOLD=85                     # Minimum quality score (0-100)
ACCURACY_THRESHOLD=90                    # Minimum accuracy score (0-100)
MAX_ITERATIONS=3                         # Max refinement attempts
MAX_TOKENS=2000                          # Answer generation token limit

# RAG Settings
RAG_TOP_K=10                             # Number of documents to retrieve
SIMILARITY_THRESHOLD=0.75                # Minimum similarity score (0-1)
BM25_WEIGHT=0.4                          # Weight for BM25 in hybrid search (0-1)

# Embedding Settings
EMBEDDING_MODEL=text-embedding-3-small   # OpenAI embedding model
EMBEDDING_DIMENSIONS=1536                # Embedding vector dimensions

# Model Selection
ANSWER_MODEL=claude-sonnet-4-5-20250929  # Primary answer generation model
CLAIMS_MODEL=gpt-5.4-mini                # Claims extraction model
ACCURACY_MODEL=claude-sonnet-4-20250514  # Accuracy checking model
EVAL_MODEL_OPENAI=gpt-5.4-mini           # OpenAI evaluator model
EVAL_MODEL_ANTHROPIC=claude-sonnet-4-20250514  # Anthropic evaluator model

# Performance Settings
TIMEOUT_SECONDS=30                       # Per-stage timeout
ENABLE_CACHING=true                      # Cache embeddings
LOG_LEVEL=INFO                           # Logging verbosity (DEBUG, INFO, WARNING, ERROR)

# Frontend Settings (for deployment)
VITE_API_URL=http://localhost:8000       # Backend API URL

Pipeline Configuration

Edit backend/config.py to customize pipeline behavior:

Basic Configuration

class PipelineConfig:
    # Quality thresholds
    QUALITY_THRESHOLD = 85           # Minimum acceptable quality score
    ACCURACY_THRESHOLD = 90          # Minimum technical accuracy
    AGREEMENT_THRESHOLD = 10         # Max score difference between evaluators
    
    # Performance tuning
    MAX_ITERATIONS = 3               # Maximum refinement attempts
    MAX_TOKENS = 2000                # Output token limit for generation
    TIMEOUT_SECONDS = 30             # Per-stage timeout in seconds
    
    # RAG configuration
    RAG_TOP_K = 10                   # Number of documents to retrieve
    SIMILARITY_THRESHOLD = 0.75      # Minimum similarity score
    BM25_WEIGHT = 0.4                # BM25 weight in hybrid search (0-1)
    EMBEDDING_MODEL = "text-embedding-3-small"
    EMBEDDING_DIMENSIONS = 1536
    
    # Model selection
    ANSWER_MODEL = "claude-sonnet-4-5-20250929"
    CLAIMS_MODEL = "gpt-5.4-mini"
    ACCURACY_MODEL = "claude-sonnet-4-20250514"
    EVAL_MODELS = ["gpt-5.4-mini", "claude-sonnet-4-20250514"]

Model Selection

Available Models

OpenAI Models

# Embedding models
"text-embedding-3-small"    # Recommended, good balance of cost/quality
"text-embedding-3-large"    # Higher quality, more expensive
"text-embedding-ada-002"    # Legacy model, not recommended

# Chat models
"gpt-5.4-mini"              # Latest, recommended
"gpt-4-turbo"               # Previous generation
"gpt-3.5-turbo"             # Cheapest, lower quality

Anthropic Models

"claude-sonnet-4-5-20250929"  # Latest Sonnet, best quality
"claude-sonnet-4-20250514"    # Previous Sonnet version
"claude-haiku-4"              # Fast and cheap
"claude-opus-4"               # Most capable, most expensive

Model Selection Strategy

Cost-Optimized:

ANSWER_MODEL = "claude-haiku-4"          # Cheapest Claude
CLAIMS_MODEL = "gpt-3.5-turbo"           # Cheapest OpenAI
ACCURACY_MODEL = "gpt-5.4-mini"          # Balance
EVAL_MODEL_OPENAI = "gpt-3.5-turbo"
EVAL_MODEL_ANTHROPIC = "claude-haiku-4"

Quality-Optimized:

ANSWER_MODEL = "claude-opus-4"           # Best Anthropic
CLAIMS_MODEL = "gpt-5.4-mini"            # Best OpenAI for structured output
ACCURACY_MODEL = "claude-sonnet-4-5-20250929"
EVAL_MODEL_OPENAI = "gpt-5.4-mini"
EVAL_MODEL_ANTHROPIC = "claude-opus-4"

Balanced (Default):

ANSWER_MODEL = "claude-sonnet-4-5-20250929"  # Good balance
CLAIMS_MODEL = "gpt-5.4-mini"                # Fast and cheap
ACCURACY_MODEL = "claude-sonnet-4-20250514"  # Reliable
EVAL_MODEL_OPENAI = "gpt-5.4-mini"
EVAL_MODEL_ANTHROPIC = "claude-sonnet-4-20250514"

RAG Configuration

Hybrid Search Parameters

# Number of documents to retrieve
RAG_TOP_K = 10  # Increase for more context, decrease for faster retrieval

# Minimum similarity threshold (0-1)
SIMILARITY_THRESHOLD = 0.75  # Lower = more permissive, higher = more strict

# BM25 weight (0-1)
# 0 = pure semantic search
# 1 = pure BM25 lexical search
# 0.4 = 60% semantic, 40% BM25 (recommended)
BM25_WEIGHT = 0.4

# RRF constant (for combining rankings)
RRF_K = 60  # Lower = more weight to top-ranked items

Embedding Configuration

# Embedding model
EMBEDDING_MODEL = "text-embedding-3-small"

# Embedding dimensions
EMBEDDING_DIMENSIONS = 1536  # Must match model

# Batch size for embedding generation
EMBEDDING_BATCH_SIZE = 100

# Enable embedding caching
ENABLE_EMBEDDING_CACHE = True

Document Chunking

Edit data/scripts/generate_embeddings.py:

# Chunk size (characters)
CHUNK_SIZE = 1000  # Larger = more context, fewer chunks

# Chunk overlap (characters)
CHUNK_OVERLAP = 200  # Overlap between chunks for continuity

# Minimum chunk size
MIN_CHUNK_SIZE = 100  # Discard very small chunks

Quality Thresholds

Score Thresholds

# Minimum quality score to accept answer (0-100)
QUALITY_THRESHOLD = 85  
# Lower = fewer iterations, faster responses, lower quality
# Higher = more iterations, slower responses, higher quality

# Minimum accuracy score (0-100)
ACCURACY_THRESHOLD = 90
# How technically accurate must the answer be?

# Maximum score difference between evaluators (0-100)
AGREEMENT_THRESHOLD = 10
# If evaluators disagree by more than this, trigger refinement

Iteration Control

# Maximum number of refinement iterations
MAX_ITERATIONS = 3
# Higher = better quality, higher cost
# Lower = faster responses, lower cost

# Minimum improvement required to continue iterating
MIN_IMPROVEMENT = 5  # Score must improve by at least 5 points

# Early stopping if score is very high
EXCELLENT_THRESHOLD = 95  # Stop iterating if score exceeds this

Performance Tuning

Latency Optimization

# Reduce token limits
MAX_TOKENS = 1500  # Down from 2000

# Reduce retrieval count
RAG_TOP_K = 5  # Down from 10

# Use faster models
ANSWER_MODEL = "claude-haiku-4"  # Faster than Sonnet

# Disable less critical stages (not recommended)
ENABLE_CLAIMS_VERIFICATION = False
ENABLE_ACCURACY_CHECK = False

Cost Optimization

# Use cheaper models
ANSWER_MODEL = "claude-haiku-4"
CLAIMS_MODEL = "gpt-3.5-turbo"
ACCURACY_MODEL = "gpt-5.4-mini"

# Reduce token limits
MAX_TOKENS = 1000

# Lower quality threshold (fewer iterations)
QUALITY_THRESHOLD = 75

# Limit iterations
MAX_ITERATIONS = 1

# Enable aggressive caching
ENABLE_CACHING = True
CACHE_TTL_SECONDS = 3600  # 1 hour

Quality Optimization

# Use best models
ANSWER_MODEL = "claude-opus-4"
CLAIMS_MODEL = "gpt-5.4-mini"
ACCURACY_MODEL = "claude-sonnet-4-5-20250929"

# Increase token limits
MAX_TOKENS = 3000

# Higher quality threshold
QUALITY_THRESHOLD = 90

# More iterations allowed
MAX_ITERATIONS = 5

# More RAG context
RAG_TOP_K = 15

Troubleshooting

Common Issues

Pipeline times out:

# Increase timeout
TIMEOUT_SECONDS = 60

# Or reduce token limit
MAX_TOKENS = 1500

High iteration rate:

# Lower quality threshold
QUALITY_THRESHOLD = 75

# Or improve RAG retrieval
RAG_TOP_K = 15
SIMILARITY_THRESHOLD = 0.70

Costs too high:

# Use cheaper models
ANSWER_MODEL = "claude-haiku-4"
CLAIMS_MODEL = "gpt-3.5-turbo"

# Reduce iterations
MAX_ITERATIONS = 1
QUALITY_THRESHOLD = 75

# Reduce token limit
MAX_TOKENS = 1000

Low quality scores:

# Use better models
ANSWER_MODEL = "claude-sonnet-4-5-20250929"

# More context
RAG_TOP_K = 15

# More output tokens
MAX_TOKENS = 2500

# More iterations
MAX_ITERATIONS = 5

Related Documentation