Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 43 additions & 32 deletions content/docs/quick-start.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,77 @@
---
title: "Quick Start"
weight: 1
description: "Get up and running with AgentEvals in under 5 minutes."
description: "Get started with agentevals in minutes"
---

## Installation
# Quick Start

Grab a wheel from the [releases page](https://github.com/agentevals-dev/agentevals/releases). The **core** wheel has the CLI and REST API. The **bundle** wheel adds streaming and the embedded web UI.
Get from zero to your first evaluation in under 5 minutes.

```bash
pip install agentevals-<version>-py3-none-any.whl
## 1. Install

# For MCP server and live streaming support:
pip install "agentevals-<version>-py3-none-any.whl[live]"
```bash
npm install -g @agentevals/agentv
```

**From source** with `uv` or Nix:
Verify the installation:

```bash
uv sync
# or: nix develop .
agentv --version
```

See [DEVELOPMENT.md](https://github.com/agentevals-dev/agentevals/blob/main/DEVELOPMENT.md) for build instructions.
## 2. Create an Eval

## CLI Quick Start
Create a file named `EVAL.yaml`:

Run an evaluation against a sample trace:
```yaml
suite: customer-support-evals
version: 1

```bash
uv run agentevals run samples/helm.json \
--eval-set samples/eval_set_helm.json \
-m tool_trajectory_avg_score
cases:
- name: tool_usage_validation
target: support-bot
criteria: Agent should use search_docs before answering policy questions
evaluators:
- type: tool_trajectory
expected_sequence: [search_docs, format_answer]
allow_extra_steps: true
```

List available evaluators:
## 3. Run Your Eval

Execute your evaluation suite:

```bash
uv run agentevals evaluator list
agentv run --eval EVAL.yaml
```

## Live UI Quick Start
## 4. View Results

Start the server with the embedded web UI:
Get detailed results in the terminal:

```bash
agentevals serve
agentv run --eval EVAL.yaml --format table
```

Open `http://localhost:8001` to upload traces and eval sets, select metrics, and view results with interactive span trees.

**From source** (two terminals):
Export as JSON for CI/CD or further processing:

```bash
uv run agentevals serve --dev # Terminal 1
cd ui && npm install && npm run dev # Terminal 2 → http://localhost:5173
agentv run --eval EVAL.yaml --format json > results.json
```

Live-streamed traces appear in the "Local Dev" tab, grouped by session ID.
## 5. Try the Examples

Explore sample evaluation suites:

```bash
npx agentv run --eval examples/customer-support/EVAL.yaml
npx agentv run --eval examples/code-review/EVAL.yaml
```

## What's Next
## Next Steps

- [Integrations](/docs/integrations/) — Zero-code, SDK, CLI/CI, and MCP integration patterns
- [Custom Evaluators](/docs/custom-evaluators/) — Build your own evaluators
- [UI Walkthrough](/docs/ui-walkthrough/) — Deep dive into the web UI
- **Learn the YAML format** → [Configuration](/docs/configuration/)
- **See more examples** → [Examples](/docs/examples/)
- **Set up CI/CD** → [CI/CD Integration](/docs/ci-cd/)
- **Use the MCP server** → [MCP Server](/docs/mcp-server/)
Loading