LLM Provider Support

Valence supports multiple LLM providers for testing real AI systems.

Supported Providers

Stub (Default)

Built-in deterministic model for testing framework
No API key required
Usage: --model stub

OpenAI

Models: GPT-3.5, GPT-4, GPT-4o series
Environment: OPENAI_API_KEY
Usage: --model openai:gpt-4o or --model openai:gpt-3.5-turbo

Anthropic

Models: Claude 3 Haiku, Sonnet, Opus
Environment: ANTHROPIC_API_KEY
Usage: --model anthropic:claude-3-sonnet-20241022

Azure OpenAI

Azure-hosted OpenAI models
Environment variables:
- AZURE_OPENAI_KEY
- AZURE_OPENAI_ENDPOINT
- AZURE_OPENAI_DEPLOYMENT
Usage: --model azure-openai:gpt-4

Setup

Install Dependencies

pip install valence-evals[llm]
# or separately:
pip install openai anthropic

Set API Keys

Environment variables:

export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="your-deployment-name"

Or .env file:

OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
AZURE_OPENAI_KEY=your-azure-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=your-deployment-name

Usage Examples

OpenAI Evaluation

valence run \
  --model openai:gpt-4o \
  --seeds ./seeds.json \
  --packs ./packs/ \
  --out ./runs/openai-test/

Anthropic Claude

valence run \
  --model anthropic:claude-3-5-sonnet-20241022 \
  --seeds ./seeds.json \
  --packs ./packs/ \
  --out ./runs/claude-test/

Development Testing

valence run \
  --model stub \
  --seeds ./seeds.json \
  --packs ./packs/ \
  --out ./runs/test/

Model Parameters

Default settings (configurable in code):

temperature: 0.7 (randomness)
max_tokens: 500 (response length)
timeout: 30 seconds (API timeout)

Error Handling

Missing API keys: Clear error message
API failures: Logged and recorded in results
Network timeouts: Graceful handling
Failed responses: Marked with error status

Cost Management

Model Selection by Cost

Cheapest: gpt-3.5-turbo, claude-3-haiku
Balanced: gpt-4o-mini, claude-3-sonnet
Premium: gpt-4o, claude-3-opus

Cost Control Tips

Use cheaper models for initial testing
Limit mutations with --max-gens 1 during development
Monitor usage through provider dashboards
Use stub model for detector validation

Testing Strategy

Develop with stub model (free)
Validate with cheap models (gpt-4o-mini)
Final testing with target model
Production monitoring with appropriate model tier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Provider Support

Supported Providers

Stub (Default)

OpenAI

Anthropic

Azure OpenAI

Setup

Install Dependencies

Set API Keys

Usage Examples

OpenAI Evaluation

Anthropic Claude

Development Testing

Model Parameters

Error Handling

Cost Management

Model Selection by Cost

Cost Control Tips

Testing Strategy

FilesExpand file tree

LLM_PROVIDERS.md

Latest commit

History

LLM_PROVIDERS.md

File metadata and controls

LLM Provider Support

Supported Providers

Stub (Default)

OpenAI

Anthropic

Azure OpenAI

Setup

Install Dependencies

Set API Keys

Usage Examples

OpenAI Evaluation

Anthropic Claude

Development Testing

Model Parameters

Error Handling

Cost Management

Model Selection by Cost

Cost Control Tips

Testing Strategy