Skip to content

Commit 9d00e74

Browse files
author
Dylan Huang
committed
update for new strategy
1 parent c63ad89 commit 9d00e74

1 file changed

Lines changed: 23 additions & 31 deletions

File tree

README.md

Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,61 +2,53 @@
22

33
[![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)
44

5-
**Eval Protocol (EP) is the open-source standard and toolkit for practicing Eval-Driven Development.**
5+
**The open-source toolkit for building your internal model leaderboard.**
66

7-
Building with AI is different. Traditional software is deterministic, but AI systems are probabilistic. How do you ship new features without causing silent regressions? How do you prove a new prompt is actually better?
8-
9-
The answer is a new engineering discipline: **Eval-Driven Development (EDD)**. It adapts the rigor of Test-Driven Development for the uncertain world of AI. With EDD, you define your AI's desired behavior as a suite of executable tests, creating a safety net that allows you to innovate with confidence.
10-
11-
EP provides a consistent way to write evals, store traces, and analyze results.
12-
13-
<p align="center">
14-
<img src="https://raw.githubusercontent.com/eval-protocol/python-sdk/refs/heads/main/assets/ui.png" alt="UI" />
15-
<br>
16-
<sub><b>Log Viewer: Monitor your evaluation rollouts in real time.</b></sub>
17-
</p>
7+
When you have multiple AI models to choose from—different versions, providers, or configurations—how do you know which one is best for your use case?
188

199
## Quick Example
2010

21-
Here's a simple test function that checks if a model's response contains **bold** text formatting:
11+
Compare models on a simple formatting task:
2212

2313
```python test_bold_format.py
2414
from eval_protocol.models import EvaluateResult, EvaluationRow, Message
25-
from eval_protocol.pytest import SingleTurnRolloutProcessor, evaluation_test
15+
from eval_protocol.pytest import default_single_turn_rollout_processor, evaluation_test
2616

2717
@evaluation_test(
2818
input_messages=[
2919
[
30-
Message(role="system", content="You are a helpful assistant. Use bold text to highlight important information."),
31-
Message(role="user", content="Explain why **evaluations** matter for building AI agents. Make it dramatic!"),
20+
Message(role="system", content="Use bold text to highlight important information."),
21+
Message(role="user", content="Explain why evaluations matter for AI agents. Make it dramatic!"),
3222
],
3323
],
34-
completion_params=[{"model": "accounts/fireworks/models/llama-v3p1-8b-instruct"}],
35-
rollout_processor=SingleTurnRolloutProcessor(),
24+
model=[
25+
"fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct",
26+
"openai/gpt-4",
27+
"anthropic/claude-3-sonnet"
28+
],
29+
rollout_processor=default_single_turn_rollout_processor,
3630
mode="pointwise",
3731
)
3832
def test_bold_format(row: EvaluationRow) -> EvaluationRow:
39-
"""
40-
Simple evaluation that checks if the model's response contains bold text.
41-
"""
42-
33+
"""Check if the model's response contains bold text."""
4334
assistant_response = row.messages[-1].content
4435

45-
# Check if response contains **bold** text
46-
has_bold = "**" in assistant_response
36+
if assistant_response is None:
37+
row.evaluation_result = EvaluateResult(score=0.0, reason="No response")
38+
return row
4739

48-
if has_bold:
49-
result = EvaluateResult(score=1.0, reason="✅ Response contains bold text")
50-
else:
51-
result = EvaluateResult(score=0.0, reason="❌ No bold text found")
40+
has_bold = "**" in str(assistant_response)
41+
score = 1.0 if has_bold else 0.0
42+
reason = "Contains bold text" if has_bold else "No bold text found"
5243

53-
row.evaluation_result = result
44+
row.evaluation_result = EvaluateResult(score=score, reason=reason)
5445
return row
5546
```
5647

57-
## Documentation
48+
## 📚 Resources
5849

59-
See our [documentation](https://evalprotocol.io) for more details.
50+
- **[Documentation](https://evalprotocol.io)** - Complete guides and API reference
51+
- **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** - Community discussions
6052

6153
## Installation
6254

0 commit comments

Comments
 (0)