Skip to content

Commit a2cb35b

Browse files
author
Dylan Huang
committed
add quick example
1 parent 19e8272 commit a2cb35b

1 file changed

Lines changed: 38 additions & 0 deletions

File tree

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,44 @@
55
EP is an open protocol that standardizes how developers author evals for large
66
language model (LLM) applications.
77

8+
## Quick Example
9+
10+
Here's a simple test function that checks if a model's response contains **bold** text formatting:
11+
12+
```python test_bold_format.py
13+
from eval_protocol.models import EvaluateResult, EvaluationRow
14+
from eval_protocol.pytest import default_single_turn_rollout_processor, evaluation_test
15+
16+
@evaluation_test(
17+
input_messages=[
18+
[
19+
Message(role="system", content="You are a helpful assistant. Use bold text to highlight important information."),
20+
Message(role="user", content="Explain why **evaluations** matter for building AI agents. Make it dramatic!"),
21+
],
22+
],
23+
model=["accounts/fireworks/models/llama-v3p1-8b-instruct"],
24+
rollout_processor=default_single_turn_rollout_processor,
25+
mode="pointwise",
26+
)
27+
def test_bold_format(row: EvaluationRow) -> EvaluationRow:
28+
"""
29+
Simple evaluation that checks if the model's response contains bold text.
30+
"""
31+
32+
assistant_response = row.messages[-1].content
33+
34+
# Check if response contains **bold** text
35+
has_bold = "**" in assistant_response
36+
37+
if has_bold:
38+
result = EvaluateResult(score=1.0, reason="✅ Response contains bold text")
39+
else:
40+
result = EvaluateResult(score=0.0, reason="❌ No bold text found")
41+
42+
row.evaluation_result = result
43+
return row
44+
```
45+
846
## Documentation
947

1048
See our [documentation](https://evalprotocol.io) for more details.

0 commit comments

Comments
 (0)