Skip to content

Commit 3c7450a

Browse files
committed
Fix landing page evaluation methods copy
1 parent 5b37a7f commit 3c7450a

1 file changed

Lines changed: 122 additions & 205 deletions

File tree

layouts/index.html

Lines changed: 122 additions & 205 deletions
Original file line numberDiff line numberDiff line change
@@ -1,224 +1,141 @@
1-
{{ define "main" }}
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>{{ .Site.Title }}</title>
7+
{{ partial "head.html" . }}
8+
</head>
9+
<body>
10+
<div class="bg-grid"></div>
211

3-
<!-- Navigation -->
4-
<nav class="nav">
5-
<a href="{{ "/" | relURL }}" class="nav-logo">
6-
<img src="{{ "images/logo-color.png" | relURL }}" alt="AgentEvals" class="logo-dark">
7-
<img src="{{ "images/logo-light.png" | relURL }}" alt="AgentEvals" class="logo-light">
8-
</a>
9-
<button class="nav-toggle" onclick="document.querySelector('.nav-links').classList.toggle('active')" aria-label="Menu">&#9776;</button>
10-
<div class="nav-links">
11-
<a href="#features">Features</a>
12-
<a href="#how-it-works">How It Works</a>
13-
<a href="#interfaces">Interfaces</a>
14-
<a href="#get-started">Get Started</a>
15-
<a href="{{ "docs/" | relURL }}">Docs</a>
16-
<a href="{{ "evaluators/" | relURL }}">Evaluators</a>
17-
<a href="{{ .Site.Params.discord }}" target="_blank">Discord</a>
18-
<a href="{{ .Site.Params.github }}" target="_blank" class="btn-sm">GitHub</a>
19-
<button class="theme-toggle" onclick="toggleTheme()" aria-label="Toggle theme">
20-
<span class="icon-sun">&#9728;</span>
21-
<span class="icon-moon">&#9790;</span>
22-
</button>
23-
</div>
24-
</nav>
25-
26-
<!-- Hero -->
27-
<section class="hero">
28-
<div class="hero-content">
29-
<div class="hero-logo">
30-
<img src="{{ "images/logo-color.png" | relURL }}" alt="AgentEvals" class="logo-dark">
31-
<img src="{{ "images/logo-color-transparent.png" | relURL }}" alt="AgentEvals" class="logo-light">
32-
</div>
33-
<h1>Ship Agents <span class="highlight">Reliably</span></h1>
34-
<p>Benchmark your agents before they hit production. AgentEvals scores performance and inference quality from OpenTelemetry traces — no re-runs, no guesswork.</p>
35-
<div class="hero-buttons">
36-
<a href="{{ .Site.Params.github }}" target="_blank" class="btn btn-primary">
37-
<svg class="icon" viewBox="0 0 24 24" fill="currentColor"><path d="M12 0c-6.626 0-12 5.373-12 12 0 5.302 3.438 9.8 8.207 11.387.599.111.793-.261.793-.577v-2.234c-3.338.726-4.033-1.416-4.033-1.416-.546-1.387-1.333-1.756-1.333-1.756-1.089-.745.083-.729.083-.729 1.205.084 1.839 1.237 1.839 1.237 1.07 1.834 2.807 1.304 3.492.997.107-.775.418-1.305.762-1.604-2.665-.305-5.467-1.334-5.467-5.931 0-1.311.469-2.381 1.236-3.221-.124-.303-.535-1.524.117-3.176 0 0 1.008-.322 3.301 1.23.957-.266 1.983-.399 3.003-.404 1.02.005 2.047.138 3.006.404 2.291-1.552 3.297-1.23 3.297-1.23.653 1.653.242 2.874.118 3.176.77.84 1.235 1.911 1.235 3.221 0 4.609-2.807 5.624-5.479 5.921.43.372.823 1.102.823 2.222v3.293c0 .319.192.694.801.576 4.765-1.589 8.199-6.086 8.199-11.386 0-6.627-5.373-12-12-12z"/></svg>
38-
View on GitHub
39-
</a>
40-
<a href="{{ "docs/" | relURL }}" class="btn btn-secondary">Read the Docs</a>
41-
<a href="{{ .Site.Params.discord }}" target="_blank" class="btn btn-secondary">
42-
Join Discord
43-
</a>
44-
</div>
45-
</div>
46-
</section>
47-
48-
<!-- Features -->
49-
<section id="features" class="container">
50-
<div class="section-header">
51-
<h2>Why AgentEvals?</h2>
52-
<p>Evaluate agent behavior from real traces, not synthetic replays.</p>
53-
</div>
54-
<div class="features-grid">
55-
<div class="feature-card">
56-
<div class="feature-icon">&#x1f50d;</div>
57-
<h3>Trace-Based Evaluation</h3>
58-
<p>Parse OTLP streams and Jaeger JSON traces to evaluate agent behavior directly from production or test telemetry data.</p>
59-
</div>
60-
<div class="feature-card">
61-
<div class="feature-icon">&#x26a1;</div>
62-
<h3>No Re-Running Required</h3>
63-
<p>Score agent behavior from existing traces. No need to replay expensive LLM calls or wait for agent re-execution.</p>
64-
</div>
65-
<div class="feature-card">
66-
<div class="feature-icon">&#x1f3af;</div>
67-
<h3>Golden Eval Sets</h3>
68-
<p>Define expected behaviors as golden eval sets and score traces against them using ADK's evaluation framework.</p>
69-
</div>
70-
<div class="feature-card">
71-
<div class="feature-icon">&#x1f4ca;</div>
72-
<h3>Trajectory Matching</h3>
73-
<p>Compare agent trajectories with strict, unordered, subset, or superset matching modes for flexible evaluation.</p>
74-
</div>
75-
<div class="feature-card">
76-
<div class="feature-icon">&#x1f916;</div>
77-
<h3>LLM-as-Judge</h3>
78-
<p>Use LLM-powered evaluation for nuanced scoring of agent behavior without requiring reference trajectories.</p>
79-
</div>
80-
<div class="feature-card">
81-
<div class="feature-icon">&#x1f6e0;</div>
82-
<h3>CI/CD Integration</h3>
83-
<p>Run evaluations in your pipeline with the CLI. Gate deployments on agent behavior quality scores.</p>
84-
</div>
85-
<div class="feature-card">
86-
<div class="feature-icon">&#x1f9e9;</div>
87-
<h3>Custom Evaluators</h3>
88-
<p>Write custom scoring logic in Python, JavaScript, or any language. Share and discover evaluators through the community registry.</p>
89-
</div>
90-
</div>
91-
</section>
92-
93-
<!-- How It Works -->
94-
<section id="how-it-works" class="how-it-works">
95-
<div class="container">
96-
<div class="section-header">
97-
<h2>How It Works</h2>
98-
<p>Three steps from traces to scores.</p>
99-
</div>
100-
<div class="steps">
101-
<div class="step">
102-
<div class="step-number">1</div>
103-
<h3>Collect Traces</h3>
104-
<p>Instrument your agent with OpenTelemetry or export Jaeger JSON traces from your observability platform.</p>
12+
<header class="hero">
13+
<div class="hero-content">
14+
<div class="hero-badge">
15+
<span class="badge-dot"></span>
16+
Open source • Python SDK • OpenTelemetry native
10517
</div>
106-
<div class="step">
107-
<div class="step-number">2</div>
108-
<h3>Define Eval Sets</h3>
109-
<p>Create golden evaluation sets that describe expected agent behaviors, tool calls, and trajectories.</p>
18+
<h1>Score your AI agent behavior from traces.</h1>
19+
<p class="hero-subtitle">
20+
AgentEvals is the open-source Python framework for scoring AI agent performance and behavior
21+
from OpenTelemetry traces. Test prompts, tools, memory, and workflows without re-running your agents.
22+
</p>
23+
<div class="hero-cta">
24+
<a href="/docs/quick-start/" class="btn btn-primary">Quick Start</a>
25+
<a href="https://github.com/agentevals-dev/agentevals" class="btn btn-secondary" target="_blank" rel="noopener">GitHub</a>
11026
</div>
111-
<div class="step">
112-
<div class="step-number">3</div>
113-
<h3>Score &amp; Report</h3>
114-
<p>Run evaluations via CLI or Web UI. Get detailed scores and pass/fail results.</p>
27+
<div class="hero-meta">
28+
<span>CLI</span>
29+
<span>Custom Evaluators</span>
30+
<span>Web UI</span>
31+
<span>CI/CD</span>
11532
</div>
11633
</div>
117-
</div>
118-
</section>
34+
</header>
11935

120-
<!-- Interfaces -->
121-
<section id="interfaces" class="interfaces">
122-
<div class="container">
123-
<div class="section-header">
124-
<h2>Three Ways to Evaluate</h2>
125-
<p>Choose the interface that fits your workflow.</p>
126-
</div>
127-
<div class="interfaces-grid interfaces-grid-2">
128-
<div class="interface-card">
129-
<div class="interface-icon">&#x2328;</div>
130-
<h3>CLI</h3>
131-
<p>Script evaluations and integrate into CI/CD pipelines. Pipe in traces, get scores out. Built for automation.</p>
132-
</div>
133-
<div class="interface-card">
134-
<div class="interface-icon">&#x1f5a5;</div>
135-
<h3>Web UI</h3>
136-
<p>Visually inspect traces and interactively evaluate agent behavior. Browse results, compare runs, and drill into details.</p>
36+
<main>
37+
<section class="features section">
38+
<div class="section-header">
39+
<span class="section-label">Why AgentEvals</span>
40+
<h2>Evaluation that matches how agents actually run.</h2>
41+
<p>Traditional evals re-run entire workflows. AgentEvals scores the traces you already collect, so you can measure behavior in realistic conditions.</p>
13742
</div>
138-
</div>
139-
</div>
140-
</section>
14143

142-
<!-- Custom Evaluators CTA -->
143-
<section class="evaluators-cta">
144-
<div class="container">
145-
<div class="evaluators-cta-inner">
146-
<div class="evaluators-cta-text">
147-
<h2>Build Your Own Evaluators</h2>
148-
<p>Write custom scoring logic in Python, JavaScript, or any language. Share it with the community through our evaluator registry.</p>
44+
<div class="feature-grid">
45+
<article class="feature-card">
46+
<div class="feature-icon"></div>
47+
<h3>Trace-native evaluation</h3>
48+
<p>Built on OpenTelemetry traces so you can evaluate real production-like runs without replaying agent execution.</p>
49+
</article>
50+
<article class="feature-card">
51+
<div class="feature-icon"></div>
52+
<h3>Flexible scoring</h3>
53+
<p>Combine built-in evaluators with custom Python logic to measure correctness, tool usage, memory behavior, and more.</p>
54+
</article>
55+
<article class="feature-card">
56+
<div class="feature-icon"></div>
57+
<h3>Works in your workflow</h3>
58+
<p>Run locally with the CLI, automate in CI/CD, or explore results visually in the web UI.</p>
59+
</article>
14960
</div>
150-
<div class="evaluators-cta-actions">
151-
<a href="{{ "evaluators/" | relURL }}" class="btn btn-primary">Browse Evaluators</a>
152-
<a href="https://github.com/agentevals-dev/evaluators#contributing-an-evaluator" target="_blank" class="btn btn-secondary">Submit Your Own</a>
61+
</section>
62+
63+
<section class="workflow section">
64+
<div class="section-header">
65+
<span class="section-label">How it works</span>
66+
<h2>From traces to scores in three steps.</h2>
15367
</div>
154-
</div>
155-
</div>
156-
</section>
15768

158-
<!-- Get Started -->
159-
<section id="get-started" class="code-section">
160-
<div class="container">
161-
<div class="section-header">
162-
<h2>Get Started</h2>
163-
<p>Up and running in seconds.</p>
164-
</div>
165-
<div class="code-block">
166-
<div class="code-header">
167-
<div class="code-dots">
168-
<span></span><span></span><span></span>
69+
<div class="workflow-steps">
70+
<div class="workflow-step">
71+
<span class="step-number">01</span>
72+
<h3>Collect traces</h3>
73+
<p>Instrument your agent with OpenTelemetry and emit traces for prompts, tool calls, memory operations, and outputs.</p>
74+
</div>
75+
<div class="workflow-step">
76+
<span class="step-number">02</span>
77+
<h3>Define evaluators</h3>
78+
<p>Choose built-in evaluators or create your own to score the behaviors that matter for your agent.</p>
79+
</div>
80+
<div class="workflow-step">
81+
<span class="step-number">03</span>
82+
<h3>Run evaluations</h3>
83+
<p>Score trace datasets through the CLI or web UI and compare results across prompts, models, or tool strategies.</p>
16984
</div>
170-
<span class="code-label">terminal</span>
17185
</div>
172-
<div class="code-body">
173-
<pre><span class="comment"># Install from release wheel</span>
174-
<span class="cmd">pip</span> install agentevals-&lt;version&gt;-py3-none-any.whl
86+
</section>
17587

176-
<span class="comment"># Run an evaluation against a trace</span>
177-
<span class="cmd">agentevals</span> run samples/helm.json \
178-
<span class="flag">--eval-set</span> <span class="string">samples/eval_set_helm.json</span> \
179-
<span class="flag">-m</span> <span class="string">tool_trajectory_avg_score</span>
180-
181-
<span class="comment"># Start the web UI</span>
182-
<span class="cmd">agentevals</span> serve
88+
<section class="docs-preview section">
89+
<div class="section-header">
90+
<span class="section-label">Docs</span>
91+
<h2>Start with the path that fits your workflow.</h2>
92+
</div>
18393

184-
</pre>
94+
<div class="docs-grid">
95+
{{ range where .Site.RegularPages "Section" "docs" }}
96+
<a class="doc-card" href="{{ .RelPermalink }}">
97+
<div>
98+
<h3>{{ .Title }}</h3>
99+
<p>{{ .Description }}</p>
100+
</div>
101+
<span class="doc-arrow"></span>
102+
</a>
103+
{{ end }}
185104
</div>
186-
</div>
187-
</div>
188-
</section>
105+
</section>
189106

190-
<!-- CTA -->
191-
<section class="cta">
192-
<div class="cta-content">
193-
<h2>Start Evaluating Your Agents</h2>
194-
<p>Open source. Trace-driven. No re-runs needed.</p>
195-
<div class="cta-buttons">
196-
<a href="{{ .Site.Params.github }}" target="_blank" class="btn btn-primary">
197-
<svg class="icon" viewBox="0 0 24 24" fill="currentColor"><path d="M12 0c-6.626 0-12 5.373-12 12 0 5.302 3.438 9.8 8.207 11.387.599.111.793-.261.793-.577v-2.234c-3.338.726-4.033-1.416-4.033-1.416-.546-1.387-1.333-1.756-1.333-1.756-1.089-.745.083-.729.083-.729 1.205.084 1.839 1.237 1.839 1.237 1.07 1.834 2.807 1.304 3.492.997.107-.775.418-1.305.762-1.604-2.665-.305-5.467-1.334-5.467-5.931 0-1.311.469-2.381 1.236-3.221-.124-.303-.535-1.524.117-3.176 0 0 1.008-.322 3.301 1.23.957-.266 1.983-.399 3.003-.404 1.02.005 2.047.138 3.006.404 2.291-1.552 3.297-1.23 3.297-1.23.653 1.653.242 2.874.118 3.176.77.84 1.235 1.911 1.235 3.221 0 4.609-2.807 5.624-5.479 5.921.43.372.823 1.102.823 2.222v3.293c0 .319.192.694.801.576 4.765-1.589 8.199-6.086 8.199-11.386 0-6.627-5.373-12-12-12z"/></svg>
198-
GitHub
199-
</a>
200-
<a href="{{ .Site.Params.discord }}" target="_blank" class="btn btn-secondary">
201-
Join Discord
202-
</a>
203-
</div>
204-
</div>
205-
</section>
107+
<section class="usage section">
108+
<div class="section-header">
109+
<span class="section-label">Usage</span>
110+
<h2>Two ways to evaluate.</h2>
111+
<p>Use the CLI for fast, scriptable scoring or the Web UI for visual exploration of evaluation results.</p>
112+
</div>
206113

207-
<!-- Footer -->
208-
<footer class="footer">
209-
<div class="footer-content">
210-
<a href="{{ "/" | relURL }}" class="footer-logo">
211-
<img src="{{ "images/logo-color.png" | relURL }}" alt="AgentEvals" class="logo-dark">
212-
<img src="{{ "images/logo-light.png" | relURL }}" alt="AgentEvals" class="logo-light">
213-
</a>
214-
<div class="footer-links">
215-
<a href="{{ "docs/" | relURL }}">Docs</a>
216-
<a href="{{ .Site.Params.github }}" target="_blank">GitHub</a>
217-
<a href="{{ .Site.Params.discord }}" target="_blank">Discord</a>
218-
<a href="https://github.com/agentregistry-dev/" target="_blank">AgentRegistry</a>
219-
</div>
220-
<span class="footer-copy">&copy; {{ now.Year }} AgentEvals. Open source under Apache 2.0.</span>
221-
</div>
222-
</footer>
114+
<div class="usage-grid">
115+
<article class="usage-card">
116+
<h3>CLI</h3>
117+
<p>Run evaluations locally or in CI with straightforward commands and structured outputs.</p>
118+
<pre><code>agentevals eval run config.yaml</code></pre>
119+
</article>
120+
<article class="usage-card">
121+
<h3>Web UI</h3>
122+
<p>Inspect trace datasets, compare runs, and review evaluator outputs in a visual interface.</p>
123+
<pre><code>agentevals ui</code></pre>
124+
</article>
125+
</div>
126+
</section>
223127

224-
{{ end }}
128+
<section class="cta section">
129+
<div class="cta-card">
130+
<span class="section-label">Get started</span>
131+
<h2>Bring evaluation into your agent development loop.</h2>
132+
<p>Install AgentEvals, connect your traces, and start measuring how your agent behaves in the real world.</p>
133+
<div class="hero-cta">
134+
<a href="/docs/quick-start/" class="btn btn-primary">Read the docs</a>
135+
<a href="https://github.com/agentevals-dev/agentevals" class="btn btn-secondary" target="_blank" rel="noopener">View on GitHub</a>
136+
</div>
137+
</div>
138+
</section>
139+
</main>
140+
</body>
141+
</html>

0 commit comments

Comments
 (0)