Anyone! We had a few groups in mind when building MASEval.
- Benchmark Developers: Researchers proposing new benchmarks for multi-agent systems can use MASEval to handle all the boilerplate.
- Benchmark Consumers: Researchers studying multi-agent systems can use MASEval as a unified interface across different benchmarks.
- System Comparison: Developers who want to test different agentic systems against each other can do so with MASEval.
- Check this documentation.
- If the feature does not exist, please open an issue on GitHub. Feature requests are welcome.
- Consider implementing it yourself. Check out the contributing guide for details.
No. MASEval works well for single-agent systems too. We designed the library to handle the complexity of multi-agent systems, but single-agent evaluation is fully supported. You can even run model comparisons, for example GPT against Claude.