Valence supports multiple LLM providers for testing real AI systems.
- Built-in deterministic model for testing framework
- No API key required
- Usage:
--model stub
- Models: GPT-3.5, GPT-4, GPT-4o series
- Environment:
OPENAI_API_KEY - Usage:
--model openai:gpt-4oor--model openai:gpt-3.5-turbo
- Models: Claude 3 Haiku, Sonnet, Opus
- Environment:
ANTHROPIC_API_KEY - Usage:
--model anthropic:claude-3-sonnet-20241022
- Azure-hosted OpenAI models
- Environment variables:
AZURE_OPENAI_KEYAZURE_OPENAI_ENDPOINTAZURE_OPENAI_DEPLOYMENT
- Usage:
--model azure-openai:gpt-4
pip install valence-evals[llm]
# or separately:
pip install openai anthropicEnvironment variables:
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="your-deployment-name"Or .env file:
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
AZURE_OPENAI_KEY=your-azure-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=your-deployment-namevalence run \
--model openai:gpt-4o \
--seeds ./seeds.json \
--packs ./packs/ \
--out ./runs/openai-test/valence run \
--model anthropic:claude-3-5-sonnet-20241022 \
--seeds ./seeds.json \
--packs ./packs/ \
--out ./runs/claude-test/valence run \
--model stub \
--seeds ./seeds.json \
--packs ./packs/ \
--out ./runs/test/Default settings (configurable in code):
temperature: 0.7 (randomness)max_tokens: 500 (response length)timeout: 30 seconds (API timeout)
- Missing API keys: Clear error message
- API failures: Logged and recorded in results
- Network timeouts: Graceful handling
- Failed responses: Marked with error status
- Cheapest:
gpt-3.5-turbo,claude-3-haiku - Balanced:
gpt-4o-mini,claude-3-sonnet - Premium:
gpt-4o,claude-3-opus
- Use cheaper models for initial testing
- Limit mutations with
--max-gens 1during development - Monitor usage through provider dashboards
- Use stub model for detector validation
- Develop with stub model (free)
- Validate with cheap models (
gpt-4o-mini) - Final testing with target model
- Production monitoring with appropriate model tier