|
1 | 1 | --- |
2 | 2 | title: "Quick Start" |
3 | 3 | weight: 1 |
4 | | -description: "Get up and running with AgentEvals in under 5 minutes." |
| 4 | +description: "Get started with agentevals in minutes" |
5 | 5 | --- |
6 | 6 |
|
7 | | -## Installation |
| 7 | +# Quick Start |
8 | 8 |
|
9 | | -Grab a wheel from the [releases page](https://github.com/agentevals-dev/agentevals/releases). The **core** wheel has the CLI and REST API. The **bundle** wheel adds streaming and the embedded web UI. |
| 9 | +Get from zero to your first evaluation in under 5 minutes. |
10 | 10 |
|
11 | | -```bash |
12 | | -pip install agentevals-<version>-py3-none-any.whl |
| 11 | +## 1. Install |
13 | 12 |
|
14 | | -# For MCP server and live streaming support: |
15 | | -pip install "agentevals-<version>-py3-none-any.whl[live]" |
| 13 | +```bash |
| 14 | +npm install -g @agentevals/agentv |
16 | 15 | ``` |
17 | 16 |
|
18 | | -**From source** with `uv` or Nix: |
| 17 | +Verify the installation: |
19 | 18 |
|
20 | 19 | ```bash |
21 | | -uv sync |
22 | | -# or: nix develop . |
| 20 | +agentv --version |
23 | 21 | ``` |
24 | 22 |
|
25 | | -See [DEVELOPMENT.md](https://github.com/agentevals-dev/agentevals/blob/main/DEVELOPMENT.md) for build instructions. |
| 23 | +## 2. Create an Eval |
26 | 24 |
|
27 | | -## CLI Quick Start |
| 25 | +Create a file named `EVAL.yaml`: |
28 | 26 |
|
29 | | -Run an evaluation against a sample trace: |
| 27 | +```yaml |
| 28 | +suite: customer-support-evals |
| 29 | +version: 1 |
30 | 30 |
|
31 | | -```bash |
32 | | -uv run agentevals run samples/helm.json \ |
33 | | - --eval-set samples/eval_set_helm.json \ |
34 | | - -m tool_trajectory_avg_score |
| 31 | +cases: |
| 32 | + - name: tool_usage_validation |
| 33 | + target: support-bot |
| 34 | + criteria: Agent should use search_docs before answering policy questions |
| 35 | + evaluators: |
| 36 | + - type: tool_trajectory |
| 37 | + expected_sequence: [search_docs, format_answer] |
| 38 | + allow_extra_steps: true |
35 | 39 | ``` |
36 | 40 |
|
37 | | -List available evaluators: |
| 41 | +## 3. Run Your Eval |
| 42 | +
|
| 43 | +Execute your evaluation suite: |
38 | 44 |
|
39 | 45 | ```bash |
40 | | -uv run agentevals evaluator list |
| 46 | +agentv run --eval EVAL.yaml |
41 | 47 | ``` |
42 | 48 |
|
43 | | -## Live UI Quick Start |
| 49 | +## 4. View Results |
44 | 50 |
|
45 | | -Start the server with the embedded web UI: |
| 51 | +Get detailed results in the terminal: |
46 | 52 |
|
47 | 53 | ```bash |
48 | | -agentevals serve |
| 54 | +agentv run --eval EVAL.yaml --format table |
49 | 55 | ``` |
50 | 56 |
|
51 | | -Open `http://localhost:8001` to upload traces and eval sets, select metrics, and view results with interactive span trees. |
52 | | - |
53 | | -**From source** (two terminals): |
| 57 | +Export as JSON for CI/CD or further processing: |
54 | 58 |
|
55 | 59 | ```bash |
56 | | -uv run agentevals serve --dev # Terminal 1 |
57 | | -cd ui && npm install && npm run dev # Terminal 2 → http://localhost:5173 |
| 60 | +agentv run --eval EVAL.yaml --format json > results.json |
58 | 61 | ``` |
59 | 62 |
|
60 | | -Live-streamed traces appear in the "Local Dev" tab, grouped by session ID. |
| 63 | +## 5. Try the Examples |
| 64 | + |
| 65 | +Explore sample evaluation suites: |
| 66 | + |
| 67 | +```bash |
| 68 | +npx agentv run --eval examples/customer-support/EVAL.yaml |
| 69 | +npx agentv run --eval examples/code-review/EVAL.yaml |
| 70 | +``` |
61 | 71 |
|
62 | | -## What's Next |
| 72 | +## Next Steps |
63 | 73 |
|
64 | | -- [Integrations](/docs/integrations/) — Zero-code, SDK, CLI/CI, and MCP integration patterns |
65 | | -- [Custom Evaluators](/docs/custom-evaluators/) — Build your own evaluators |
66 | | -- [UI Walkthrough](/docs/ui-walkthrough/) — Deep dive into the web UI |
| 74 | +- **Learn the YAML format** → [Configuration](/docs/configuration/) |
| 75 | +- **See more examples** → [Examples](/docs/examples/) |
| 76 | +- **Set up CI/CD** → [CI/CD Integration](/docs/ci-cd/) |
| 77 | +- **Use the MCP server** → [MCP Server](/docs/mcp-server/) |
0 commit comments