File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 55EP is an open protocol that standardizes how developers author evals for large
66language model (LLM) applications.
77
8+ ## Quick Example
9+
10+ Here's a simple test function that checks if a model's response contains ** bold** text formatting:
11+
12+ ``` python test_bold_format.py
13+ from eval_protocol.models import EvaluateResult, EvaluationRow
14+ from eval_protocol.pytest import default_single_turn_rollout_processor, evaluation_test
15+
16+ @evaluation_test (
17+ input_messages = [
18+ [
19+ Message(role = " system" , content = " You are a helpful assistant. Use bold text to highlight important information." ),
20+ Message(role = " user" , content = " Explain why **evaluations** matter for building AI agents. Make it dramatic!" ),
21+ ],
22+ ],
23+ model = [" accounts/fireworks/models/llama-v3p1-8b-instruct" ],
24+ rollout_processor = default_single_turn_rollout_processor,
25+ mode = " pointwise" ,
26+ )
27+ def test_bold_format (row : EvaluationRow) -> EvaluationRow:
28+ """
29+ Simple evaluation that checks if the model's response contains bold text.
30+ """
31+
32+ assistant_response = row.messages[- 1 ].content
33+
34+ # Check if response contains **bold** text
35+ has_bold = " **" in assistant_response
36+
37+ if has_bold:
38+ result = EvaluateResult(score = 1.0 , reason = " ✅ Response contains bold text" )
39+ else :
40+ result = EvaluateResult(score = 0.0 , reason = " ❌ No bold text found" )
41+
42+ row.evaluation_result = result
43+ return row
44+ ```
45+
846## Documentation
947
1048See our [ documentation] ( https://evalprotocol.io ) for more details.
You can’t perform that action at this time.
0 commit comments