Skip to content

Commit db37679

Browse files
sebbycorpclaude
authored andcommitted
revert: undo custom evaluators feature, MCP removal, and code block contrast changes
Reverts changes from 5b37a7f and subsequent commits that built on it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 212036f commit db37679

6 files changed

Lines changed: 283 additions & 151 deletions

File tree

content/docs/advanced.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,25 @@ While the server is running (`agentevals serve`), interactive API documentation
2525

2626
The OTLP receiver (port 4318) serves its own docs at `http://localhost:4318/docs`.
2727

28+
## MCP Server Tools
29+
30+
| Tool | Requires `serve` | Description |
31+
|------|:---:|-------------|
32+
| `list_metrics` | yes | List available metrics |
33+
| `evaluate_traces` | no | Evaluate local trace files (OTLP or Jaeger) |
34+
| `list_sessions` | yes | List streaming sessions |
35+
| `summarize_session` | yes | Structured summary of a session's tool calls |
36+
| `evaluate_sessions` | yes | Evaluate sessions against a golden reference |
37+
38+
## Claude Code Skills
39+
40+
Two slash-command workflows in `.claude/skills/`, available automatically in repos with the agentevals config:
41+
42+
| Skill | What it does |
43+
|-------|-------------|
44+
| `/eval` | Score traces or compare sessions against a golden reference |
45+
| `/inspect` | Turn-by-turn narrative of a live session with anomaly detection |
46+
2847
## Development
2948

3049
```bash

content/docs/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ However, if you're iterating on your agents locally, you can point your agents t
1414

1515
AgentCore's evaluation integration (via `strands-agents-evals`) also couples agent execution with evaluation. It re-invokes the agent for each test case, converts the resulting OTel spans to AWS's ADOT format, and scores them against 4 built-in evaluators (Helpfulness, Accuracy, Harmfulness, Relevance) via a cloud API call. This means you need an AWS account, valid credentials, and network access for every evaluation.
1616

17-
agentevals takes a different approach: it scores pre-recorded traces locally without re-running anything. It works with standard Jaeger JSON and OTLP formats from any framework, supports open-ended metrics (tool trajectory matching, LLM-based judges, custom scorers), and ships with a CLI and web UI. No cloud dependency required.
17+
agentevals takes a different approach: it scores pre-recorded traces locally without re-running anything. It works with standard Jaeger JSON and OTLP formats from any framework, supports open-ended metrics (tool trajectory matching, LLM-based judges, custom scorers), and ships with a CLI, web UI, and MCP server. No cloud dependency required.
1818

1919
## What trace formats are supported?
2020

content/docs/integrations.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "Integrations & Use Cases"
33
weight: 2
4-
description: "Zero-code, SDK, and CLI/CI integration patterns."
4+
description: "Zero-code, SDK, CLI/CI, and MCP integration patterns."
55
---
66

7-
AgentEvals can be used in multiple ways depending on your workflow. Evaluate agents with zero code via OTel, programmatically via the SDK, or in CI pipelines with the CLI.
7+
AgentEvals can be used in multiple ways depending on your workflow. Evaluate agents with zero code via OTel, programmatically via the SDK, in CI pipelines with the CLI, or conversationally through the MCP server.
88

99
> For detailed, working examples covering all integration patterns, see the [examples directory](https://github.com/agentevals-dev/agentevals/tree/main/examples) in the repository.
1010
@@ -127,3 +127,39 @@ jobs:
127127
"
128128
```
129129
130+
---
131+
132+
## MCP Server
133+
134+
Exposes evaluation tools to MCP clients. A `.mcp.json` at the project root lets Claude Code pick it up automatically.
135+
136+
### Available Tools
137+
138+
| Tool | Requires `serve` | Description |
139+
|------|:---:|-------------|
140+
| `list_metrics` | yes | List available metrics |
141+
| `evaluate_traces` | no | Evaluate local trace files (OTLP or Jaeger) |
142+
| `list_sessions` | yes | List streaming sessions |
143+
| `summarize_session` | yes | Structured summary of a session's tool calls |
144+
| `evaluate_sessions` | yes | Evaluate sessions against a golden reference |
145+
146+
### Setup
147+
148+
```bash
149+
# Start the MCP server
150+
uv run agentevals mcp
151+
152+
# Custom server URL
153+
AGENTEVALS_SERVER_URL=http://localhost:9000 uv run agentevals mcp
154+
```
155+
156+
The React UI and MCP server share the same in-memory session state and can run simultaneously.
157+
158+
### Claude Code Skills
159+
160+
Two slash-command workflows are available in repos with `.claude/skills/`:
161+
162+
| Skill | What it does |
163+
|-------|-------------|
164+
| `/eval` | Score traces or compare sessions against a golden reference |
165+
| `/inspect` | Turn-by-turn narrative of a live session with anomaly detection |

content/docs/quick-start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Grab a wheel from the [releases page](https://github.com/agentevals-dev/agenteva
1111
```bash
1212
pip install agentevals-<version>-py3-none-any.whl
1313

14-
# For live streaming support:
14+
# For MCP server and live streaming support:
1515
pip install "agentevals-<version>-py3-none-any.whl[live]"
1616
```
1717

@@ -61,6 +61,6 @@ Live-streamed traces appear in the "Local Dev" tab, grouped by session ID.
6161

6262
## What's Next
6363

64-
- [Integrations](/docs/integrations/) — Zero-code, SDK, and CLI/CI integration patterns
64+
- [Integrations](/docs/integrations/) — Zero-code, SDK, CLI/CI, and MCP integration patterns
6565
- [Custom Evaluators](/docs/custom-evaluators/) — Build your own evaluators
6666
- [UI Walkthrough](/docs/ui-walkthrough/) — Deep dive into the web UI

0 commit comments

Comments
 (0)