You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: fern/observability/evals-quickstart.mdx
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -576,7 +576,7 @@ For complex validation criteria beyond pattern matching, use AI-powered judges t
576
576
```
577
577
You are an LLM-Judge. Evaluate ONLY the last assistant message in the mock conversation: {{messages[-1]}}.
578
578
579
-
Include the full conversation history for context: {{messages[0:-1]}}
579
+
Include the full conversation history for context: {{messages}}
580
580
581
581
Decision rule:
582
582
- PASS if ALL "pass criteria" are satisfied AND NONE of the "fail criteria" are triggered.
@@ -596,6 +596,12 @@ Output format: respond with exactly one word: pass or fail
596
596
- No additional text
597
597
```
598
598
599
+
<Note>
600
+
**Template variables:**
601
+
-`{{messages}}` - The entire conversation history (all messages exchanged)
602
+
-`{{messages[-1]}}` - The last assistant message only
603
+
</Note>
604
+
599
605
### Example: Evaluate helpfulness and tone
600
606
601
607
<Tabs>
@@ -630,7 +636,7 @@ curl -X POST "https://api.vapi.ai/eval" \
630
636
"model": "gpt-4o",
631
637
"messages": [{
632
638
"role": "system",
633
-
"content": "You are an LLM-Judge. Evaluate ONLY the last assistant message: {{messages[-1]}}.\n\nInclude context: {{messages[0:-1]}}\n\nDecision rule:\n- PASS if ALL pass criteria are met AND NO fail criteria are triggered.\n- Otherwise FAIL.\n\nPass criteria:\n- Response acknowledges the user request\n- Response offers specific help or next steps\n- Tone is professional and friendly\n\nFail criteria (any triggers FAIL):\n- Response is rude or dismissive\n- Response ignores the user request\n- Response provides no actionable information\n\nOutput format: respond with exactly one word: pass or fail"
639
+
"content": "You are an LLM-Judge. Evaluate ONLY the last assistant message: {{messages[-1]}}.\n\nInclude context: {{messages}}\n\nDecision rule:\n- PASS if ALL pass criteria are met AND NO fail criteria are triggered.\n- Otherwise FAIL.\n\nPass criteria:\n- Response acknowledges the user request\n- Response offers specific help or next steps\n- Tone is professional and friendly\n\nFail criteria (any triggers FAIL):\n- Response is rude or dismissive\n- Response ignores the user request\n- Response provides no actionable information\n\nOutput format: respond with exactly one word: pass or fail"
634
640
}]
635
641
}
636
642
}
@@ -1366,11 +1372,13 @@ Run multiple evals sequentially to validate all greeting scenarios.
0 commit comments