You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2>Evaluation that matches how agents actually run.</h2>
41
-
<p>Traditional evals re-run entire workflows. AgentEvals scores the traces you already collect, so you can measure behavior in realistic conditions.</p>
36
+
<pclass="eyebrow">Why agentevals</p>
37
+
<h2>Evaluate behavior from the telemetry you already collect</h2>
38
+
<p>
39
+
Score agents against consistent rubrics using OpenTelemetry traces rather than replaying runs.
40
+
Keep evaluations close to your production workflows and compare changes over time.
41
+
</p>
42
42
</div>
43
-
44
43
<divclass="feature-grid">
45
44
<articleclass="feature-card">
46
-
<divclass="feature-icon">◉</div>
47
-
<h3>Trace-native evaluation</h3>
48
-
<p>Built on OpenTelemetry traces so you can evaluate real production-like runs without replaying agent execution.</p>
45
+
<h3>No reruns</h3>
46
+
<p>Use recorded traces to evaluate real executions after the fact.</p>
49
47
</article>
50
48
<articleclass="feature-card">
51
-
<divclass="feature-icon">◈</div>
52
-
<h3>Flexible scoring</h3>
53
-
<p>Combine built-in evaluators with custom Python logic to measure correctness, tool usage, memory behavior, and more.</p>
49
+
<h3>Behavior-first scoring</h3>
50
+
<p>Measure task completion, tool use quality, handoffs, latency, and more.</p>
54
51
</article>
55
52
<articleclass="feature-card">
56
-
<divclass="feature-icon">◎</div>
57
-
<h3>Works in your workflow</h3>
58
-
<p>Run locally with the CLI, automate in CI/CD, or explore results visually in the web UI.</p>
53
+
<h3>Built on OpenTelemetry</h3>
54
+
<p>Plug into existing observability pipelines instead of inventing a parallel eval stack.</p>
59
55
</article>
60
56
</div>
61
-
</section>
57
+
</div>
58
+
</section>
62
59
63
-
<sectionclass="workflow section">
60
+
<sectionclass="section alt">
61
+
<divclass="container">
64
62
<divclass="section-header">
65
-
<spanclass="section-label">How it works</span>
66
-
<h2>From traces to scores in three steps.</h2>
63
+
<pclass="eyebrow">How it works</p>
64
+
<h2>Two ways to evaluate</h2>
67
65
</div>
68
-
69
-
<divclass="workflow-steps">
70
-
<divclass="workflow-step">
71
-
<spanclass="step-number">01</span>
72
-
<h3>Collect traces</h3>
73
-
<p>Instrument your agent with OpenTelemetry and emit traces for prompts, tool calls, memory operations, and outputs.</p>
74
-
</div>
75
-
<divclass="workflow-step">
76
-
<spanclass="step-number">02</span>
77
-
<h3>Define evaluators</h3>
78
-
<p>Choose built-in evaluators or create your own to score the behaviors that matter for your agent.</p>
79
-
</div>
80
-
<divclass="workflow-step">
81
-
<spanclass="step-number">03</span>
82
-
<h3>Run evaluations</h3>
83
-
<p>Score trace datasets through the CLI or web UI and compare results across prompts, models, or tool strategies.</p>
84
-
</div>
66
+
<divclass="steps-grid two-up">
67
+
<articleclass="step-card">
68
+
<spanclass="step-number">1</span>
69
+
<h3>CLI workflow</h3>
70
+
<p>
71
+
Run evaluations locally or in CI with config files and reproducible commands.
72
+
</p>
73
+
<ahref="/docs/quick-start/">Open the CLI guide →</a>
74
+
</article>
75
+
<articleclass="step-card">
76
+
<spanclass="step-number">2</span>
77
+
<h3>Web workflow</h3>
78
+
<p>
79
+
Explore traces, inspect scores, and review rubric results in the browser.
80
+
</p>
81
+
<ahref="/docs/ui-walkthrough/">Open the Web guide →</a>
82
+
</article>
85
83
</div>
86
-
</section>
84
+
</div>
85
+
</section>
87
86
88
-
<sectionclass="docs-preview section">
87
+
<sectionclass="section">
88
+
<divclass="container">
89
89
<divclass="section-header">
90
-
<spanclass="section-label">Docs</span>
91
-
<h2>Start with the path that fits your workflow.</h2>
90
+
<pclass="eyebrow">Docs</p>
91
+
<h2>Start where you are</h2>
92
92
</div>
93
-
94
93
<divclass="docs-grid">
95
-
{{ range where .Site.RegularPages "Section" "docs" }}
96
-
<aclass="doc-card" href="{{ .RelPermalink }}">
97
-
<div>
98
-
<h3>{{ .Title }}</h3>
99
-
<p>{{ .Description }}</p>
100
-
</div>
101
-
<spanclass="doc-arrow">→</span>
102
-
</a>
103
-
{{ end }}
94
+
<aclass="docs-card" href="/docs/quick-start/">
95
+
<h3>Quick start</h3>
96
+
<p>Install agentevals, run your first scoring pass, and inspect the output.</p>
97
+
</a>
98
+
<aclass="docs-card" href="/docs/integrations/">
99
+
<h3>Integrations</h3>
100
+
<p>Connect agentevals with your existing tracing and observability stack.</p>
0 commit comments