You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
19
19
-`FileResultLogger` now accepts `pathlib.Path` for argument `output_dir` and has an `overwrite` argument to prevent overwriting of existing logs files.
20
20
-`Benchmark` class now has a `fail_on_setup_error` flag that raises errors observed during setup of task (PR: #10)
21
21
- The `Evaluator` class now has a `filter_traces` base method to conveniently adapt the same evaluator to different entities in the traces (PR: #10).
22
+
- Improved Quick Start Guide in `docs/getting-started/quickstart.md`. (PR: #10)
This guide will help you get started with MASEval.
3
+
This guide introduces the core concepts of MASEval and helps you get started quickly.
4
4
5
5
## Installation
6
6
@@ -23,48 +23,190 @@ This includes all core functionality for defining benchmarks, tasks, and evaluat
23
23
24
24
### Optional Dependencies
25
25
26
-
Install additional integrations based on your agent framework and tooling. These can also be installed separately, but are offered here for convenience.
27
-
28
-
**Agent Frameworks:**
26
+
Install additional integrations based on your agent framework. For example,
For comprehensive documentation on how to piece together the library's components—including detailed explanations of the execution lifecycle, setup methods, and best practices — see the [`Benchmark`](../reference/benchmark.md) class documentation.
212
+
Work through the examples listed above. 3. **Explore the [`examples/five_a_day_benchmark/`](https://github.com/parameterlab/MASEval/tree/main/examples/five_a_day_benchmark) folder** for tool implementations, evaluators, and the CLI script (`five_a_day_benchmark.py`) 4. **Build your own benchmark** using the patterns you've learned
0 commit comments