Skip to content

Commit 1908cdc

Browse files
sebbycorpclaude
authored andcommitted
Remove MCP references and improve code block contrast
- Remove MCP server sections from integrations, advanced, FAQ, and quick-start docs - Change "Three Ways to Evaluate" to "Two Ways" on landing page - Remove MCP Server interface card from landing page - Bump code block text to --text-primary for better readability - Use --bg-secondary for docs code blocks for more contrast Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent db37679 commit 1908cdc

6 files changed

Lines changed: 12 additions & 73 deletions

File tree

content/docs/advanced.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -25,25 +25,6 @@ While the server is running (`agentevals serve`), interactive API documentation
2525

2626
The OTLP receiver (port 4318) serves its own docs at `http://localhost:4318/docs`.
2727

28-
## MCP Server Tools
29-
30-
| Tool | Requires `serve` | Description |
31-
|------|:---:|-------------|
32-
| `list_metrics` | yes | List available metrics |
33-
| `evaluate_traces` | no | Evaluate local trace files (OTLP or Jaeger) |
34-
| `list_sessions` | yes | List streaming sessions |
35-
| `summarize_session` | yes | Structured summary of a session's tool calls |
36-
| `evaluate_sessions` | yes | Evaluate sessions against a golden reference |
37-
38-
## Claude Code Skills
39-
40-
Two slash-command workflows in `.claude/skills/`, available automatically in repos with the agentevals config:
41-
42-
| Skill | What it does |
43-
|-------|-------------|
44-
| `/eval` | Score traces or compare sessions against a golden reference |
45-
| `/inspect` | Turn-by-turn narrative of a live session with anomaly detection |
46-
4728
## Development
4829

4930
```bash

content/docs/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ However, if you're iterating on your agents locally, you can point your agents t
1414

1515
AgentCore's evaluation integration (via `strands-agents-evals`) also couples agent execution with evaluation. It re-invokes the agent for each test case, converts the resulting OTel spans to AWS's ADOT format, and scores them against 4 built-in evaluators (Helpfulness, Accuracy, Harmfulness, Relevance) via a cloud API call. This means you need an AWS account, valid credentials, and network access for every evaluation.
1616

17-
agentevals takes a different approach: it scores pre-recorded traces locally without re-running anything. It works with standard Jaeger JSON and OTLP formats from any framework, supports open-ended metrics (tool trajectory matching, LLM-based judges, custom scorers), and ships with a CLI, web UI, and MCP server. No cloud dependency required.
17+
agentevals takes a different approach: it scores pre-recorded traces locally without re-running anything. It works with standard Jaeger JSON and OTLP formats from any framework, supports open-ended metrics (tool trajectory matching, LLM-based judges, custom scorers), and ships with a CLI and web UI. No cloud dependency required.
1818

1919
## What trace formats are supported?
2020

content/docs/integrations.md

Lines changed: 2 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "Integrations & Use Cases"
33
weight: 2
4-
description: "Zero-code, SDK, CLI/CI, and MCP integration patterns."
4+
description: "Zero-code, SDK, and CLI/CI integration patterns."
55
---
66

7-
AgentEvals can be used in multiple ways depending on your workflow. Evaluate agents with zero code via OTel, programmatically via the SDK, in CI pipelines with the CLI, or conversationally through the MCP server.
7+
AgentEvals can be used in multiple ways depending on your workflow. Evaluate agents with zero code via OTel, programmatically via the SDK, or in CI pipelines with the CLI.
88

99
> For detailed, working examples covering all integration patterns, see the [examples directory](https://github.com/agentevals-dev/agentevals/tree/main/examples) in the repository.
1010
@@ -127,39 +127,3 @@ jobs:
127127
"
128128
```
129129
130-
---
131-
132-
## MCP Server
133-
134-
Exposes evaluation tools to MCP clients. A `.mcp.json` at the project root lets Claude Code pick it up automatically.
135-
136-
### Available Tools
137-
138-
| Tool | Requires `serve` | Description |
139-
|------|:---:|-------------|
140-
| `list_metrics` | yes | List available metrics |
141-
| `evaluate_traces` | no | Evaluate local trace files (OTLP or Jaeger) |
142-
| `list_sessions` | yes | List streaming sessions |
143-
| `summarize_session` | yes | Structured summary of a session's tool calls |
144-
| `evaluate_sessions` | yes | Evaluate sessions against a golden reference |
145-
146-
### Setup
147-
148-
```bash
149-
# Start the MCP server
150-
uv run agentevals mcp
151-
152-
# Custom server URL
153-
AGENTEVALS_SERVER_URL=http://localhost:9000 uv run agentevals mcp
154-
```
155-
156-
The React UI and MCP server share the same in-memory session state and can run simultaneously.
157-
158-
### Claude Code Skills
159-
160-
Two slash-command workflows are available in repos with `.claude/skills/`:
161-
162-
| Skill | What it does |
163-
|-------|-------------|
164-
| `/eval` | Score traces or compare sessions against a golden reference |
165-
| `/inspect` | Turn-by-turn narrative of a live session with anomaly detection |

content/docs/quick-start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Grab a wheel from the [releases page](https://github.com/agentevals-dev/agenteva
1111
```bash
1212
pip install agentevals-<version>-py3-none-any.whl
1313

14-
# For MCP server and live streaming support:
14+
# For live streaming support:
1515
pip install "agentevals-<version>-py3-none-any.whl[live]"
1616
```
1717

@@ -61,6 +61,6 @@ Live-streamed traces appear in the "Local Dev" tab, grouped by session ID.
6161

6262
## What's Next
6363

64-
- [Integrations](/docs/integrations/) — Zero-code, SDK, CLI/CI, and MCP integration patterns
64+
- [Integrations](/docs/integrations/) — Zero-code, SDK, and CLI/CI integration patterns
6565
- [Custom Evaluators](/docs/custom-evaluators/) — Build your own evaluators
6666
- [UI Walkthrough](/docs/ui-walkthrough/) — Deep dive into the web UI

layouts/index.html

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ <h3>Define Eval Sets</h3>
106106
<div class="step">
107107
<div class="step-number">3</div>
108108
<h3>Score &amp; Report</h3>
109-
<p>Run evaluations via CLI, Web UI, or MCP server. Get detailed scores and pass/fail results.</p>
109+
<p>Run evaluations via CLI or Web UI. Get detailed scores and pass/fail results.</p>
110110
</div>
111111
</div>
112112
</div>
@@ -116,7 +116,7 @@ <h3>Score &amp; Report</h3>
116116
<section id="interfaces" class="interfaces">
117117
<div class="container">
118118
<div class="section-header">
119-
<h2>Three Ways to Evaluate</h2>
119+
<h2>Two Ways to Evaluate</h2>
120120
<p>Choose the interface that fits your workflow.</p>
121121
</div>
122122
<div class="interfaces-grid">
@@ -130,11 +130,6 @@ <h3>CLI</h3>
130130
<h3>Web UI</h3>
131131
<p>Visually inspect traces and interactively evaluate agent behavior. Browse results, compare runs, and drill into details.</p>
132132
</div>
133-
<div class="interface-card">
134-
<div class="interface-icon">&#x1f50c;</div>
135-
<h3>MCP Server</h3>
136-
<p>Run evaluations directly from Claude Code conversations. The MCP server integrates agentevals into your AI workflow.</p>
137-
</div>
138133
</div>
139134
</div>
140135
</section>
@@ -181,8 +176,7 @@ <h2>Get Started</h2>
181176
<span class="comment"># Start the web UI</span>
182177
<span class="cmd">agentevals</span> serve
183178

184-
<span class="comment"># Start the MCP server</span>
185-
<span class="cmd">agentevals</span> mcp</pre>
179+
</pre>
186180
</div>
187181
</div>
188182
</div>

static/css/style.css

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,7 @@ section {
531531
font-family: var(--font-mono);
532532
font-size: 0.85rem;
533533
line-height: 1.7;
534-
color: var(--text-secondary);
534+
color: var(--text-primary);
535535
}
536536

537537
.code-body .cmd {
@@ -559,7 +559,7 @@ section {
559559
}
560560

561561
.code-body .comment {
562-
color: var(--text-muted);
562+
color: var(--text-secondary);
563563
}
564564

565565
/* Evaluators CTA (homepage) */
@@ -926,7 +926,7 @@ section {
926926
}
927927

928928
.docs-article pre {
929-
background: var(--bg-card);
929+
background: var(--bg-secondary);
930930
border: 1px solid var(--border);
931931
border-radius: 8px;
932932
padding: 1.25rem;
@@ -940,7 +940,7 @@ section {
940940
padding: 0;
941941
font-size: 0.85rem;
942942
line-height: 1.7;
943-
color: var(--text-secondary);
943+
color: var(--text-primary);
944944
}
945945

946946
.docs-article table {

0 commit comments

Comments
 (0)