parameterlab · cemde · Feb 3, 2026 · Jan 19, 2026 · Jan 19, 2026 · Jan 19, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -387,6 +387,82 @@ def calculate_average(numbers: list) -> float:
     """
 ```
 
+### mkdocs Rendering
+
+This project uses mkdocstrings to render docstrings as HTML. Follow these rules to ensure proper rendering:
+
+**Lists require a blank line before them:**
+
+```python
+# Bad - renders as one paragraph
+"""Subclasses must provide:
+- method_one(): Description
+- method_two(): Description
+"""
+
+# Good - renders as proper bullet list
+"""Subclasses must provide:
+
+- `method_one()` - Description
+- `method_two()` - Description
+"""
+```
+
+**Return descriptions must be single-line** (multi-line creates multiple table rows):
+
+```python
+# Bad
+"""
+Returns:
+    TerminationReason indicating why is_done() returns True,
+    or NOT_TERMINATED if the interaction is still ongoing.
+"""
+
+# Good
+"""
+Returns:
+    Why `is_done()` returns True, or `NOT_TERMINATED` if still ongoing.
+"""
+```
+
+**For dictionary returns, document fields in the docstring body** using "Output fields:":
+
+```python
+# Bad - creates multiple table rows in Returns
+"""
+Returns:
+    Dictionary containing:
+    - `name` - User identifier
+    - `profile` - User profile data
+"""
+
+# Good - fields in body, single-line Returns
+"""
+Gather execution traces from this user.
+
+Output fields:
+
+- `name` - User identifier
+- `profile` - User profile data
+- `message_count` - Number of messages in history
+
+Returns:
+    Dictionary containing user state and interaction data.
+"""
+```
+
+**HTML-like strings must be in backticks** (otherwise stripped as HTML):
+
+```python
+# Bad - </stop> disappears
+"""Uses "</stop>" to signal satisfaction."""
+
+# Good
+"""Uses `"</stop>"` to signal satisfaction."""
+```
+
+**Use backticks for code references** - method names, parameters, and values: `` `is_done()` ``, `` `stop_tokens` ``, `` `None` ``
+
 ## Early-Release Status
 
 **This project is early-release. Clean, maintainable code is the priority - not backwards compatibility.**

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,8 +9,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+**Interface**
+
+- CAMEL-AI integration: `CamelAgentAdapter` and `CamelLLMUser` for evaluating CAMEL-AI ChatAgent-based systems (PR: #22)
+- Added `CamelAgentUser` for using a CAMEL ChatAgent as the user in agent-to-agent evaluation (PR: #22)
+- Added `camel_role_playing_execution_loop()` for benchmarks using CAMEL's RolePlaying semantics (PR: #22)
+- Added `CamelRolePlayingTracer` and `CamelWorkforceTracer` for capturing orchestration-level traces from CAMEL's multi-agent systems (PR: #22)
+
 ### Changed
 
+**Interface**
+
+- Renamed framework-specific `LLMUser` subclasses for clarity (PR: #22):
+  - `SmolAgentUser` → `SmolAgentLLMUser`
+  - `LangGraphUser` → `LangGraphLLMUser`
+  - `LlamaIndexUser` → `LlamaIndexLLMUser`
+
 ### Fixed
 
 ### Removed
@@ -126,7 +140,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 **Interface**
 
-- [LlamaIndex](https://github.com/run-llama/llama_index) integration: `LlamaIndexAgentAdapter` and `LlamaIndexUser` for evaluating LlamaIndex workflow-based agents (PR: #7)
+- [LlamaIndex](https://github.com/run-llama/llama_index) integration: `LlamaIndexAgentAdapter` and `LlamaIndexLLMUser` for evaluating LlamaIndex workflow-based agents (PR: #7)
 - The `logs` property inside `SmolAgentAdapter` and `LanggraphAgentAdapter` are now properly filled. (PR: #3)
 
 **Examples**

diff --git a/docs/getting-started/faq.md b/docs/getting-started/faq.md
@@ -1,5 +1,19 @@
 # FAQ
 
-## Q: Test
+## Q: Who is this library for?
 
-## A: Test
+Anyone! We had a few groups in mind when building MASEval.
+
+1. **Benchmark Developers**: Researchers proposing new benchmarks for multi-agent systems can use MASEval to handle all the boilerplate.
+2. **Benchmark Consumers**: Researchers studying multi-agent systems can use MASEval as a unified interface across different benchmarks.
+3. **System Comparison**: Developers who want to test different agentic systems against each other can do so with MASEval.
+
+## Q: I am looking for a specific feature, but I cannot find it.
+
+1. Check this documentation.
+2. If the feature does not exist, please [open an issue on GitHub](https://github.com/parameterlab/MASEval/issues/new). Feature requests are welcome.
+3. Consider implementing it yourself. Check out the [contributing guide](contributing.md) for details.
+
+## Q: Can I only test multi-agent systems?
+
+No. MASEval works well for single-agent systems too. We designed the library to handle the complexity of multi-agent systems, but single-agent evaluation is fully supported. You can even run model comparisons, for example GPT against Claude.
diff --git a/docs/interface/agents/camel.md b/docs/interface/agents/camel.md
@@ -0,0 +1,34 @@
+# CAMEL-AI
+
+Adapter for the CAMEL-AI multi-agent framework.
+
+- [Documentation](https://docs.camel-ai.org/)
+- [Code Repository](https://github.com/camel-ai/camel)
+
+## Installation
+
+```bash
+pip install maseval[camel]
+```
+
+Alternatively, install camel-ai directly:
+
+```bash
+pip install camel-ai
+```
+
+## API Reference
+
+[:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/camel.py){ .md-source-file }
+
+::: maseval.interface.agents.camel.CamelAgentAdapter
+
+::: maseval.interface.agents.camel.CamelLLMUser
+
+::: maseval.interface.agents.camel.CamelAgentUser
+
+::: maseval.interface.agents.camel.camel_role_playing_execution_loop
+
+::: maseval.interface.agents.camel.CamelRolePlayingTracer
+
+::: maseval.interface.agents.camel.CamelWorkforceTracer
diff --git a/docs/interface/agents/langgraph.md b/docs/interface/agents/langgraph.md
@@ -23,4 +23,4 @@ pip install langgraph
 
 ::: maseval.interface.agents.langgraph.LangGraphAgentAdapter
 
-::: maseval.interface.agents.langgraph.LangGraphUser
+::: maseval.interface.agents.langgraph.LangGraphLLMUser
diff --git a/docs/interface/agents/llamaindex.md b/docs/interface/agents/llamaindex.md
@@ -23,4 +23,4 @@ pip install llama-index-core
 
 ::: maseval.interface.agents.llamaindex.LlamaIndexAgentAdapter
 
-::: maseval.interface.agents.llamaindex.LlamaIndexUser
+::: maseval.interface.agents.llamaindex.LlamaIndexLLMUser
diff --git a/docs/interface/agents/smolagents.md b/docs/interface/agents/smolagents.md
@@ -23,7 +23,7 @@ pip install smolagents
 
 ::: maseval.interface.agents.smolagents.SmolAgentAdapter
 
-::: maseval.interface.agents.smolagents.SmolAgentUser
+::: maseval.interface.agents.smolagents.SmolAgentLLMUser
 
 [:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/smolagents_optional.py){ .md-source-file }
 

diff --git a/docs/reference/environment.md b/docs/reference/environment.md
@@ -14,4 +14,4 @@ Some agent adapters expose helper tools or user-simulation tools that can be use
 
 ::: maseval.interface.agents.smolagents.SmolAgentAdapter
 
-::: maseval.interface.agents.smolagents.SmolAgentUser
+::: maseval.interface.agents.smolagents.SmolAgentLLMUser
diff --git a/docs/reference/user.md b/docs/reference/user.md
@@ -1,27 +1,35 @@
 # User
 
-In many real-world applications, Multi-Agent Systems (MAS) are designed to interact with human users to accomplish tasks. To effectively benchmark such systems, it is crucial to have a standardized way to simulate these interactions. The `User` class in MASEval provides this capability by acting as a programmable, LLM-driven user that can engage with the MAS in a realistic manner.
+In many real-world applications, Multi-Agent Systems (MAS) are designed to interact with human users to accomplish tasks. To effectively benchmark such systems, it is crucial to have a standardized way to simulate these interactions. MASEval provides this capability through a `User` hierarchy: the abstract `User` base class defines the interface, while `LLMUser` provides an LLM-driven implementation that can engage with the MAS in a realistic manner.
 
-The User is initialized with a persona and a scenario, both of which are typically defined within a Task. This tight integration allows for dynamic and context-aware simulations. For example, a Task might generate a random birthdate for the user. This birthdate is then passed to both the `User` and the `Evaluator`. The User will use this information in its conversation with the MAS, and the `Evaluator` will check if the MAS correctly processes and remembers this information. This mechanism enables the creation of sophisticated and reliable benchmarks that can assess the interactive capabilities of a MAS.
+The `LLMUser` is initialized with a persona and a scenario, both of which are typically defined within a Task. This tight integration allows for dynamic and context-aware simulations. For example, a Task might generate a random birthdate for the user. This birthdate is then passed to both the `LLMUser` and the `Evaluator`. The user will use this information in its conversation with the MAS, and the `Evaluator` will check if the MAS correctly processes and remembers this information. This mechanism enables the creation of sophisticated and reliable benchmarks that can assess the interactive capabilities of a MAS.
 
 [:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/core/user.py){ .md-source-file }
 
 ::: maseval.core.user.User
 
-::: maseval.core.user.AgenticUser
+::: maseval.core.user.LLMUser
+
+::: maseval.core.user.AgenticLLMUser
 
 ## Interfaces
 
 Some integrations provide convenience user/tool implementations for specific agent frameworks. For example:
 
 [:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/smolagents.py){ .md-source-file }
 
-::: maseval.interface.agents.smolagents.SmolAgentUser
+::: maseval.interface.agents.smolagents.SmolAgentLLMUser
 
 [:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/langgraph.py){ .md-source-file }
 
-::: maseval.interface.agents.langgraph.LangGraphUser
+::: maseval.interface.agents.langgraph.LangGraphLLMUser
 
 [:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/llamaindex.py){ .md-source-file }
 
-::: maseval.interface.agents.llamaindex.LlamaIndexUser
+::: maseval.interface.agents.llamaindex.LlamaIndexLLMUser
+
+[:material-github: View source](https://github.com/parameterlab/maseval/blob/main/maseval/interface/agents/camel.py){ .md-source-file }
+
+::: maseval.interface.agents.camel.CamelLLMUser
+
+::: maseval.interface.agents.camel.CamelAgentUser
diff --git a/examples/macs_benchmark/macs_benchmark.py b/examples/macs_benchmark/macs_benchmark.py
@@ -148,7 +148,7 @@ class UserInputTool(SmolagentsTool):
             output_type = "string"
 
             def forward(self, question: str) -> str:
-                return user.simulate_response(question)
+                return user.respond(question)
 
         return UserInputTool()
 
@@ -393,7 +393,7 @@ def get_tool(self):
 
         def user_input(question: str) -> str:
             """Ask the user a question to understand their complete requirements."""
-            return self.simulate_response(question)
+            return self.respond(question)
 
         return StructuredTool.from_function(
             func=user_input,

diff --git a/examples/tau2_benchmark/tau2_benchmark.py b/examples/tau2_benchmark/tau2_benchmark.py
@@ -146,7 +146,7 @@ def get_tool(self) -> Dict[str, Any]:
 
         def ask_user(question: str) -> str:
             """Ask the customer a question to clarify their request or get additional information."""
-            return self.simulate_response(question)
+            return self.respond(question)
 
         return {"ask_user": ask_user}
 
@@ -345,7 +345,7 @@ class UserInputTool(SmolagentsTool):
             output_type = "string"
 
             def forward(self, question: str) -> str:
-                return user.simulate_response(question)
+                return user.respond(question)
 
         return UserInputTool()
 
@@ -506,7 +506,7 @@ def get_tool(self):
 
         def user_input(question: str) -> str:
             """Ask the customer a question to clarify their request."""
-            return self.simulate_response(question)
+            return self.respond(question)
 
         return StructuredTool.from_function(
             func=user_input,

diff --git a/maseval/__init__.py b/maseval/__init__.py
@@ -33,7 +33,7 @@
     UserSimulatorError,
 )
 from .core.model import ModelAdapter, ChatResponse
-from .core.user import User, TerminationReason
+from .core.user import User, LLMUser, AgenticLLMUser, TerminationReason
 from .core.evaluator import Evaluator
 from .core.history import MessageHistory, ToolInvocationHistory
 from .core.tracing import TraceableMixin
@@ -75,6 +75,8 @@
     "UserSimulatorError",
     # User simulation
     "User",
+    "LLMUser",
+    "AgenticLLMUser",
     "TerminationReason",
     # Evaluation
     "Evaluator",

diff --git a/maseval/benchmark/macs/macs.py b/maseval/benchmark/macs/macs.py
@@ -48,6 +48,7 @@ def get_model_adapter(self, model_id, **kwargs):
 from maseval import (
     AgentAdapter,
     Benchmark,
+    User,
     Environment,
     Evaluator,
     MessageHistory,
@@ -56,7 +57,7 @@ def get_model_adapter(self, model_id, **kwargs):
     TaskExecutionStatus,
     ToolInvocationHistory,
     ToolLLMSimulator,
-    User,
+    LLMUser,
     AgentError,
     EnvironmentError,
     validate_arguments_from_schema,
@@ -456,10 +457,10 @@ def _compute_gsr(self, report: List[Dict[str, Any]]) -> Tuple[float, float]:
 # =============================================================================
 
 
-class MACSUser(User):
+class MACSUser(LLMUser):
     """MACS-specific user simulator with conversation limits.
 
-    Extends the base User class with MACS-specific behavior:
+    Extends the LLMUser class with MACS-specific behavior:
     - Maximum 5 turns of interaction (as per MACS paper)
     - </stop> token detection for natural conversation ending
     - User profile and scenario-aware responses

diff --git a/maseval/benchmark/tau2/tau2.py b/maseval/benchmark/tau2/tau2.py
@@ -61,7 +61,7 @@ def get_model_adapter(self, model_id, **kwargs):
 from typing import Any, Dict, List, Optional, Sequence, Tuple, Callable
 
 from maseval import AgentAdapter, Benchmark, Evaluator, ModelAdapter, Task, User
-from maseval.core.user import AgenticUser
+from maseval.core.user import AgenticLLMUser
 from maseval.core.callback import BenchmarkCallback
 
 from maseval.benchmark.tau2.environment import Tau2Environment
@@ -73,10 +73,10 @@ def get_model_adapter(self, model_id, **kwargs):
 # =============================================================================
 
 
-class Tau2User(AgenticUser):
+class Tau2User(AgenticLLMUser):
     """Tau2-specific user simulator with customer service personas.
 
-    Extends the AgenticUser class with tau2-specific behavior:
+    Extends the AgenticLLMUser class with tau2-specific behavior:
     - Customer personas from user_scenario
     - Domain-aware responses (airline, retail, telecom)
     - Multi-turn interaction support
Original file line number	Diff line number	Diff line change
Expand Up		@@ -23,4 +23,4 @@ pip install langgraph

		::: maseval.interface.agents.langgraph.LangGraphAgentAdapter

		::: maseval.interface.agents.langgraph.LangGraphUser
		::: maseval.interface.agents.langgraph.LangGraphLLMUser
Original file line number	Diff line number	Diff line change
Expand Up		@@ -23,4 +23,4 @@ pip install llama-index-core

		::: maseval.interface.agents.llamaindex.LlamaIndexAgentAdapter

		::: maseval.interface.agents.llamaindex.LlamaIndexUser
		::: maseval.interface.agents.llamaindex.LlamaIndexLLMUser
Original file line number	Diff line number	Diff line change
Expand Up		@@ -14,4 +14,4 @@ Some agent adapters expose helper tools or user-simulation tools that can be use

		::: maseval.interface.agents.smolagents.SmolAgentAdapter

		::: maseval.interface.agents.smolagents.SmolAgentUser
		::: maseval.interface.agents.smolagents.SmolAgentLLMUser