Skip to content

Commit ce29cf2

Browse files
sebbycorpclaude
authored andcommitted
Restructure docs and clean up homepage
- Remove emojis from homepage headings, nav, and buttons - Fix How It Works section: remove extra margin, add connecting lines between steps, constrain paragraph width - Add "Read the Docs" CTA button to hero and footer - Restructure docs into new flow: Quick Start, Integrations, Custom Evaluators, UI Walkthrough, Advanced, FAQ - Consolidate old pages (ci-cd, configuration, examples, mcp-server, web-ui) into new Integrations page - Add Custom Evaluators page with matching modes, SDK examples, and link to repo guide - Add FAQ page with common questions - Add Advanced page with config reference and API tables Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f4a65fa commit ce29cf2

13 files changed

Lines changed: 719 additions & 606 deletions

File tree

content/docs/_index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
---
22
title: "Documentation"
3+
description: "Everything you need to get started with AgentEvals."
34
---

content/docs/advanced.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: "Advanced"
3+
weight: 5
4+
description: "Advanced configuration, API reference, and deep-dive resources."
5+
---
6+
7+
## Configuration Reference
8+
9+
### Eval Set Configuration
10+
11+
```yaml
12+
# agentevals.yaml
13+
version: "1"
14+
15+
trace_sources:
16+
- type: otlp
17+
port: 4318
18+
protocol: http
19+
20+
- type: jaeger
21+
path: ./traces/*.json
22+
23+
llm:
24+
provider: openai
25+
model: gpt-4o
26+
temperature: 0.0
27+
28+
output:
29+
format: table # table, json, junit
30+
verbose: false
31+
```
32+
33+
### Evaluator Parameters
34+
35+
| Parameter | Type | Default | Description |
36+
|-----------|------|---------|-------------|
37+
| `trajectory_match_mode` | `"strict"` \| `"unordered"` \| `"subset"` \| `"superset"` | `"strict"` | How to compare trajectories |
38+
| `tool_args_match_mode` | `"exact"` \| `"ignore"` \| `"subset"` \| `"superset"` | `"exact"` | How to match tool arguments |
39+
| `tool_args_match_overrides` | `Dict[str, ...]` | `None` | Custom matchers per tool |
40+
| `model` | `str` | `None` | LLM model for judge evaluators |
41+
| `continuous` | `bool` | `False` | Float (0–1) vs boolean scoring |
42+
| `use_reasoning` | `bool` | `True` | Include reasoning in output |
43+
| `few_shot_examples` | `List[FewShotExample]` | `None` | Example evaluations for LLM judge |
44+
| `feedback_key` | `str` | `"trajectory_accuracy"` | Key name for evaluation results |
45+
46+
### Environment Variables
47+
48+
| Variable | Description | Default |
49+
|----------|-------------|---------|
50+
| `AGENTEVALS_CONFIG` | Path to config file | `./agentevals.yaml` |
51+
| `OPENAI_API_KEY` | OpenAI API key for LLM judge | — |
52+
| `ANTHROPIC_API_KEY` | Anthropic API key for LLM judge | — |
53+
| `AGENTEVALS_LOG_LEVEL` | Log level (debug, info, warn, error) | `info` |
54+
| `AGENTEVALS_OUTPUT_FORMAT` | Output format override | `table` |
55+
56+
## Deep-Dive Documentation
57+
58+
For comprehensive coverage of specific topics, see the repository docs:
59+
60+
- [Trajectory Match Evaluators](https://github.com/agentevals-dev/agentevals#trajectory-match-evaluators) — Full reference for all matching modes and configuration
61+
- [LLM-as-Judge Evaluators](https://github.com/agentevals-dev/agentevals#llm-as-judge-evaluators) — Custom prompts, few-shot examples, continuous scoring
62+
- [Graph Trajectory Evaluators](https://github.com/agentevals-dev/agentevals#graph-trajectory-evaluators) — LangGraph integration and trajectory extraction
63+
- [LangSmith Integration](https://github.com/agentevals-dev/agentevals#langsmith-integration) — Running evaluations with pytest, Vitest, and LangSmith experiments
64+
- [Custom Evaluators Guide](https://github.com/agentevals-dev/agentevals/blob/main/docs/custom-evaluators.md) — Writing domain-specific evaluators
65+
66+
## Async Support
67+
68+
Both Python and TypeScript support fully async evaluators:
69+
70+
### Python
71+
72+
```python
73+
from agentevals.trajectory.match import (
74+
create_async_trajectory_match_evaluator
75+
)
76+
from agentevals.trajectory.llm import (
77+
create_async_trajectory_llm_as_judge
78+
)
79+
80+
# Async trajectory match
81+
async_match = create_async_trajectory_match_evaluator(
82+
trajectory_match_mode="strict"
83+
)
84+
result = await async_match(
85+
outputs=trajectory,
86+
reference_outputs=reference
87+
)
88+
89+
# Async LLM-as-judge
90+
async_judge = create_async_trajectory_llm_as_judge(
91+
model="openai:o3-mini"
92+
)
93+
result = await async_judge(
94+
outputs=trajectory,
95+
reference_outputs=reference
96+
)
97+
```
98+
99+
## API Reference
100+
101+
### Python Public API
102+
103+
| Function | Description |
104+
|----------|-------------|
105+
| `create_trajectory_match_evaluator()` | Create a configurable trajectory match evaluator |
106+
| `create_async_trajectory_match_evaluator()` | Async variant |
107+
| `create_trajectory_llm_as_judge()` | Evaluate trajectory quality with an LLM judge |
108+
| `create_async_trajectory_llm_as_judge()` | Async variant |
109+
| `create_graph_trajectory_llm_as_judge()` | LLM-as-judge for LangGraph workflows |
110+
| `create_async_graph_trajectory_llm_as_judge()` | Async variant |
111+
| `graph_trajectory_strict_match()` | Strict match for graph execution steps |
112+
| `extract_langgraph_trajectory_from_thread()` | Extract trajectory from a LangGraph thread |
113+
| `extract_langgraph_trajectory_from_snapshots()` | Extract trajectory from state snapshots |
114+
115+
### TypeScript Public API
116+
117+
| Function | Description |
118+
|----------|-------------|
119+
| `createTrajectoryMatchEvaluator()` | Create a configurable trajectory match evaluator |
120+
| `createTrajectoryLLMAsJudge()` | Evaluate trajectory quality with an LLM judge |
121+
| `createGraphTrajectoryLLMAsJudge()` | LLM-as-judge for LangGraph workflows |
122+
| `extractLangGraphTrajectoryFromThread()` | Extract trajectory from a LangGraph thread |
123+
| `extractLangGraphTrajectoryFromSnapshots()` | Extract trajectory from state snapshots |

content/docs/ci-cd.md

Lines changed: 0 additions & 129 deletions
This file was deleted.

content/docs/configuration.md

Lines changed: 0 additions & 109 deletions
This file was deleted.

0 commit comments

Comments
 (0)