FAQ

Q: Who is this library for?

Anyone! We had a few groups in mind when building MASEval.

Benchmark Developers: Researchers proposing new benchmarks for multi-agent systems can use MASEval to handle all the boilerplate.
Benchmark Consumers: Researchers studying multi-agent systems can use MASEval as a unified interface across different benchmarks.
System Comparison: Developers who want to test different agentic systems against each other can do so with MASEval.

Q: I am looking for a specific feature, but I cannot find it.

Check this documentation.
If the feature does not exist, please open an issue on GitHub. Feature requests are welcome.
Consider implementing it yourself. Check out the contributing guide for details.

Q: Can I only test multi-agent systems?

No. MASEval works well for single-agent systems too. We designed the library to handle the complexity of multi-agent systems, but single-agent evaluation is fully supported. You can even run model comparisons, for example GPT against Claude.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ

Q: Who is this library for?

Q: I am looking for a specific feature, but I cannot find it.

Q: Can I only test multi-agent systems?

FilesExpand file tree

faq.md

Latest commit

History

faq.md

File metadata and controls

FAQ

Q: Who is this library for?

Q: I am looking for a specific feature, but I cannot find it.

Q: Can I only test multi-agent systems?