You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* added proper logging to `LanggraphAgentAdapter`
* added proper logging to `SmolAgentAdapter`
* updated testing for agents
* fixed some naming of agent wrapper to adapter
* fixed typing error in release script
* removed non-contractual functions from AgentAdapter
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The `maseval` package is designed with a strict separation between its core logi
23
23
24
24
1.**`maseval/core`**: This is the heart of the library. It contains the essential logic and **must not** have any optional dependencies. It should be fully functional with a minimal installation.
25
25
26
-
2.**`maseval/interface`**: This contains adapters and wrappers for other multi-agent frameworks (like `crewai`, `langgraph`, etc.). All dependencies for these integrations are optional.
26
+
2.**`maseval/interface`**: This contains adapters for other multi-agent frameworks (like `crewai`, `langgraph`, etc.). All dependencies for these integrations are optional.
27
27
28
28
> [!WARNING]
29
29
> Code in `maseval/core`**must never** import from `maseval/interface`. This separation is critical to keep the core package lightweight and dependency-free. Breaking this rule will cause the library to fail.
@@ -197,11 +197,11 @@ The pipeline automatically performs the following tasks:
197
197
198
198
### 6. Implementing Framework Adapters
199
199
200
-
When creating wrappers for external agent frameworks (in `maseval/interface/agents/`), follow these best practices to ensure consistency and reliability:
200
+
When creating adapters for external agent frameworks (in `maseval/interface/agents/`), follow these best practices to ensure consistency and reliability:
201
201
202
202
#### Message History Pattern
203
203
204
-
**Always use the framework's native message storage as the source of truth.** Do not cache converted messages in the wrapper, as this can lead to inconsistencies if the framework's internal state changes.
204
+
**Always use the framework's native message storage as the source of truth.** Do not cache converted messages in the adapter, as this can lead to inconsistencies if the framework's internal state changes.
205
205
206
206
**Correct Pattern** (SmolAgents example):
207
207
@@ -256,13 +256,14 @@ When adding support for a new framework:
256
256
-[ ] Add conditional import in `maseval/interface/agents/__init__.py`
257
257
-[ ] Write integration tests in `tests/test_interface/`
258
258
-[ ] Update documentation with usage examples
259
+
-[ ] Provide a `logs` property inside the `AgentAdapter`.
259
260
260
261
#### Framework-Specific Patterns
261
262
262
263
**Pattern 1: Persistent State (smolagents)**
263
264
264
265
```python
265
-
classMyFrameworkWrapper(AgentAdapter):
266
+
classMyFrameworkAdapter(AgentAdapter):
266
267
defget_messages(self) -> MessageHistory:
267
268
"""Dynamically fetch from framework's internal storage."""
268
269
# Get from framework (e.g., agent.memory, agent.messages)
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ Analogous to pytest for testing or MLflow for ML experimentation, MASEval focuse
27
27
28
28
-**Task-Specific Configurations:** Each benchmark task is a self-contained evaluation unit with its own instructions, environment state, success criteria, and custom evaluation logic. One task might measure success by environment state changes, another by programmatic output validation.
29
29
30
-
-**Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and wrappers enable any agent system to be evaluated without modification to the core library.
30
+
-**Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and adapters enable any agent system to be evaluated without modification to the core library.
31
31
32
32
-**Lifecycle Hooks via Callbacks:** Inject custom logic at any point in the evaluation lifecycle (e.g., on_run_start, on_task_start, on_agent_step_end) through a callback system. This enables extensibility without modifying core evaluation logic.
@@ -190,9 +193,9 @@ Messages use OpenAI's chat completion format:
190
193
}
191
194
```
192
195
193
-
## Custom Agent Wrappers
196
+
## Custom Agent Adapters
194
197
195
-
If you're implementing a custom wrapper, the framework handles message storage automatically via `get_messages()`. Just ensure your `_run_agent()` method returns a `MessageHistory`:
198
+
If you're implementing a custom adapter, the framework handles message storage automatically via `get_messages()`. Just ensure your `_run_agent()` method returns a `MessageHistory`:
196
199
197
200
```python
198
201
from maseval import AgentAdapter, MessageHistory
@@ -211,13 +214,13 @@ class MyAgentAdapter(AgentAdapter):
211
214
return history
212
215
```
213
216
214
-
See the [Agent Wrapper guide](../reference/agent.md) for details on implementing custom wrappers.
217
+
See the [AgentAdapter guide](../reference/agent.md) for details on implementing custom adapters.
215
218
216
219
## Tips
217
220
218
221
**For debugging**: Use `verbose=True` to see traces in real-time.
219
222
220
-
**For benchmarks**: Clear history between tasks with `wrapper.clear_message_history()`.
223
+
**For benchmarks**: Create a new adapter instance for each task to ensure clean conversation history.
221
224
222
225
**For multi-agent systems**: Use a shared tracer and `get_conversations_by_agent()` to analyze each agent separately.
Copy file name to clipboardExpand all lines: docs/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ More details in the [Quickstart](getting-started/quickstart.md)
24
24
25
25
-**Task-Specific Configurations:** Each benchmark task is a self-contained evaluation unit with its own instructions, environment state, success criteria, and custom evaluation logic. One task might measure success by environment state changes, another by programmatic output validation.
26
26
27
-
-**Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and wrappers enable any agent system to be evaluated without modification to the core library.
27
+
-**Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and adapters enable any agent system to be evaluated without modification to the core library.
28
28
29
29
-**Lifecycle Hooks via Callbacks:** Inject custom logic at any point in the evaluation lifecycle (e.g., `on_run_start`, `on_task_start`, `on_agent_step_end`) through a callback system. This enables extensibility without modifying core evaluation logic.
0 commit comments