Skip to content

Commit d196cdd

Browse files
authored
Improved Logging of AgentAdapter (#3)
* added proper logging to `LanggraphAgentAdapter` * added proper logging to `SmolAgentAdapter` * updated testing for agents * fixed some naming of agent wrapper to adapter * fixed typing error in release script * removed non-contractual functions from AgentAdapter
1 parent db8ef59 commit d196cdd

28 files changed

Lines changed: 995 additions & 670 deletions

.github/scripts/extract_changelog.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ def extract_section(version: str, changelog_path: Path) -> str:
1919
if not match:
2020
print(f"No changelog entry found for version {version}", file=sys.stderr)
2121
sys.exit(1)
22+
assert match is not None
2223
return match.group(0).strip()
2324

2425

AGENTS.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,12 +102,12 @@ uv remove <package-name>
102102

103103
**Framework Adapter Pattern:**
104104

105-
When implementing wrappers for external frameworks, **always use the framework's native message storage as the source of truth**:
105+
When implementing adapters for external frameworks, **always use the framework's native message storage as the source of truth**:
106106

107107
**Pattern 1: Persistent State (smolagents)**
108108

109109
```python
110-
class MyFrameworkWrapper(AgentAdapter):
110+
class MyFrameworkAdapter(AgentAdapter):
111111
def get_messages(self) -> MessageHistory:
112112
"""Dynamically fetch from framework's internal storage."""
113113
# Get from framework (e.g., agent.memory, agent.messages)
@@ -236,3 +236,7 @@ For lists and dictionaries, use `Dict[...,...]`, `List[...]`, `Sequence[...]` et
236236
- DO NOT publicly distribute code or data
237237
- DO NOT publish without explicit permission
238238
- DO NOT share copyrighted third-party benchmark data
239+
240+
## Changelog
241+
242+
When the task is completed, add your changes to the Changelog.

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- The `logs` property inside `SmolAgentAdapter` and `LanggraphAgentAdapter` are now properly filled. (PR: #3)
13+
1214
### Changed
1315

1416
### Fixed
1517

18+
- Consistent naming of agent `adapter` over `wrapper` (PR: #3)
19+
1620
### Removed
1721

22+
- Removed `set_message_history`, `append_message_history` and `clear_message_history` for `AgentAdapter` and subclasses. (PR: #3)
23+
1824
## [0.1.2] - 2025-11-18
1925

2026
### Added

CONTRIBUTING.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The `maseval` package is designed with a strict separation between its core logi
2323

2424
1. **`maseval/core`**: This is the heart of the library. It contains the essential logic and **must not** have any optional dependencies. It should be fully functional with a minimal installation.
2525

26-
2. **`maseval/interface`**: This contains adapters and wrappers for other multi-agent frameworks (like `crewai`, `langgraph`, etc.). All dependencies for these integrations are optional.
26+
2. **`maseval/interface`**: This contains adapters for other multi-agent frameworks (like `crewai`, `langgraph`, etc.). All dependencies for these integrations are optional.
2727

2828
> [!WARNING]
2929
> Code in `maseval/core` **must never** import from `maseval/interface`. This separation is critical to keep the core package lightweight and dependency-free. Breaking this rule will cause the library to fail.
@@ -197,11 +197,11 @@ The pipeline automatically performs the following tasks:
197197
198198
### 6. Implementing Framework Adapters
199199

200-
When creating wrappers for external agent frameworks (in `maseval/interface/agents/`), follow these best practices to ensure consistency and reliability:
200+
When creating adapters for external agent frameworks (in `maseval/interface/agents/`), follow these best practices to ensure consistency and reliability:
201201

202202
#### Message History Pattern
203203

204-
**Always use the framework's native message storage as the source of truth.** Do not cache converted messages in the wrapper, as this can lead to inconsistencies if the framework's internal state changes.
204+
**Always use the framework's native message storage as the source of truth.** Do not cache converted messages in the adapter, as this can lead to inconsistencies if the framework's internal state changes.
205205

206206
**Correct Pattern** (SmolAgents example):
207207

@@ -256,13 +256,14 @@ When adding support for a new framework:
256256
- [ ] Add conditional import in `maseval/interface/agents/__init__.py`
257257
- [ ] Write integration tests in `tests/test_interface/`
258258
- [ ] Update documentation with usage examples
259+
- [ ] Provide a `logs` property inside the `AgentAdapter`.
259260

260261
#### Framework-Specific Patterns
261262

262263
**Pattern 1: Persistent State (smolagents)**
263264

264265
```python
265-
class MyFrameworkWrapper(AgentAdapter):
266+
class MyFrameworkAdapter(AgentAdapter):
266267
def get_messages(self) -> MessageHistory:
267268
"""Dynamically fetch from framework's internal storage."""
268269
# Get from framework (e.g., agent.memory, agent.messages)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Analogous to pytest for testing or MLflow for ML experimentation, MASEval focuse
2727

2828
- **Task-Specific Configurations:** Each benchmark task is a self-contained evaluation unit with its own instructions, environment state, success criteria, and custom evaluation logic. One task might measure success by environment state changes, another by programmatic output validation.
2929

30-
- **Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and wrappers enable any agent system to be evaluated without modification to the core library.
30+
- **Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and adapters enable any agent system to be evaluated without modification to the core library.
3131

3232
- **Lifecycle Hooks via Callbacks:** Inject custom logic at any point in the evaluation lifecycle (e.g., on_run_start, on_task_start, on_agent_step_end) through a callback system. This enables extensibility without modifying core evaluation logic.
3333

docs/guides/config-gathering.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,8 @@ class MyBenchmark(Benchmark):
124124
def setup_agents(self, agent_data, environment, task, user):
125125
model = MyModelAdapter(...)
126126
agent = MyAgent(model=model)
127-
wrapper = AgentAdapter(agent, "agent")
128-
return [wrapper], {"agent": wrapper}
127+
adapter = AgentAdapter(agent, "agent")
128+
return [adapter], {"agent": adapter}
129129
# ... other methods
130130

131131
# Run benchmark

docs/guides/message-tracing.md

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ MASEval provides message tracing to capture agent conversations during benchmark
1616

1717
## Core Concepts
1818

19-
**`MessageHistory`**: OpenAI-compatible message storage that all agent wrappers use internally.
19+
**`MessageHistory`**: OpenAI-compatible message storage that all agent adapters use internally.
2020

2121
**`AgentAdapter.get_messages()`**: Standard method to retrieve conversation history from any wrapped agent.
2222

@@ -26,17 +26,17 @@ MASEval provides message tracing to capture agent conversations during benchmark
2626

2727
### Accessing Message History
2828

29-
Every agent wrapper exposes message history through `get_messages()`:
29+
Every agent adapter exposes message history through `get_messages()`:
3030

3131
```python
32-
from maseval.interface.agents import SmolAgentsWrapper
32+
from maseval.interface.agents import SmolAgentAdapter
3333

3434
# Create and run your agent
35-
wrapper = SmolAgentsWrapper(agent, name="researcher")
36-
result = wrapper.run("What's the capital of France?")
35+
agent_adapter = SmolAgentAdapter(agent, name="researcher")
36+
result = agent_adapter.run("What's the capital of France?")
3737

3838
# Get the conversation
39-
messages = wrapper.get_messages()
39+
messages = agent_adapter.get_messages()
4040

4141
# Inspect messages
4242
for msg in messages:
@@ -45,18 +45,21 @@ for msg in messages:
4545
print(f" Tools called: {[tc['function']['name'] for tc in msg['tool_calls']]}")
4646
```
4747

48-
### Clearing History Between Tasks
48+
### Fresh Conversations for Multiple Tasks
4949

50-
In benchmarks, you typically want to clear history before each new task:
50+
In benchmarks, you typically want a fresh agent instance for each task:
5151

5252
```python
5353
# In your benchmark loop
5454
for task in benchmark.tasks:
55-
wrapper.clear_message_history() # Reset for new task
56-
result = wrapper.run(task.query)
55+
# Create a new adapter instance for each task
56+
agent_adapter = YourAgentAdapter(agent_instance=agent, name="task_agent")
57+
result = agent_adapter.run(task.query)
5758
evaluate(result, task.ground_truth)
5859
```
5960

61+
This ensures each task starts with a clean slate and avoids conversation history contamination.
62+
6063
## Using the Tracing Callback
6164

6265
For multi-agent systems or when you need to collect conversations from many runs, use `MessageTracingAgentCallback`:
@@ -68,12 +71,12 @@ from maseval.core.callbacks import MessageTracingAgentCallback
6871
tracer = MessageTracingAgentCallback()
6972

7073
# Attach to your agent(s)
71-
wrapper = SmolAgentsWrapper(agent, name="assistant", callbacks=[tracer])
74+
agent_adapter = SmolAgentAdapter(agent, name="assistant", callbacks=[tracer])
7275

7376
# Run tasks
74-
wrapper.run("Task 1")
75-
wrapper.run("Task 2")
76-
wrapper.run("Task 3")
77+
agent_adapter.run("Task 1")
78+
agent_adapter.run("Task 2")
79+
agent_adapter.run("Task 3")
7780

7881
# Get all conversations
7982
conversations = tracer.get_all_conversations()
@@ -93,8 +96,8 @@ Share one tracer across multiple agents to collect all conversations:
9396
tracer = MessageTracingAgentCallback()
9497

9598
# Attach to multiple agents
96-
agent1 = SmolAgentsWrapper(agent1, name="researcher", callbacks=[tracer])
97-
agent2 = SmolAgentsWrapper(agent2, name="writer", callbacks=[tracer])
99+
agent1 = SmolAgentAdapter(agent1, name="researcher", callbacks=[tracer])
100+
agent2 = SmolAgentAdapter(agent2, name="writer", callbacks=[tracer])
98101

99102
# Run both agents
100103
agent1.run("Research topic X")
@@ -119,7 +122,7 @@ tracer = MessageTracingAgentCallback()
119122

120123
for batch in task_batches:
121124
for task in batch:
122-
wrapper.run(task.query)
125+
agent_adapter.run(task.query)
123126

124127
# Process this batch
125128
conversations = tracer.get_all_conversations()
@@ -190,9 +193,9 @@ Messages use OpenAI's chat completion format:
190193
}
191194
```
192195

193-
## Custom Agent Wrappers
196+
## Custom Agent Adapters
194197

195-
If you're implementing a custom wrapper, the framework handles message storage automatically via `get_messages()`. Just ensure your `_run_agent()` method returns a `MessageHistory`:
198+
If you're implementing a custom adapter, the framework handles message storage automatically via `get_messages()`. Just ensure your `_run_agent()` method returns a `MessageHistory`:
196199

197200
```python
198201
from maseval import AgentAdapter, MessageHistory
@@ -211,13 +214,13 @@ class MyAgentAdapter(AgentAdapter):
211214
return history
212215
```
213216

214-
See the [Agent Wrapper guide](../reference/agent.md) for details on implementing custom wrappers.
217+
See the [AgentAdapter guide](../reference/agent.md) for details on implementing custom adapters.
215218

216219
## Tips
217220

218221
**For debugging**: Use `verbose=True` to see traces in real-time.
219222

220-
**For benchmarks**: Clear history between tasks with `wrapper.clear_message_history()`.
223+
**For benchmarks**: Create a new adapter instance for each task to ensure clean conversation history.
221224

222225
**For multi-agent systems**: Use a shared tracer and `get_conversations_by_agent()` to analyze each agent separately.
223226

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ More details in the [Quickstart](getting-started/quickstart.md)
2424

2525
- **Task-Specific Configurations:** Each benchmark task is a self-contained evaluation unit with its own instructions, environment state, success criteria, and custom evaluation logic. One task might measure success by environment state changes, another by programmatic output validation.
2626

27-
- **Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and wrappers enable any agent system to be evaluated without modification to the core library.
27+
- **Framework Agnostic by Design:** MASEval is intentionally unopinionated about agent frameworks, model providers, and system architectures. Simple, standardized interfaces and adapters enable any agent system to be evaluated without modification to the core library.
2828

2929
- **Lifecycle Hooks via Callbacks:** Inject custom logic at any point in the evaluation lifecycle (e.g., `on_run_start`, `on_task_start`, `on_agent_step_end`) through a callback system. This enables extensibility without modifying core evaluation logic.
3030

maseval/core/agent.py

Lines changed: 7 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
from abc import ABC, abstractmethod
2-
from typing import List, Any, Optional, Union, Dict
2+
from typing import List, Any, Optional, Dict
33

44
from .callback import AgentCallback
5-
from .history import MessageHistory, RoleType
5+
from .history import MessageHistory
66
from .tracing import TraceableMixin
77
from .config import ConfigurableMixin
88

99

1010
class AgentAdapter(ABC, TraceableMixin, ConfigurableMixin):
1111
"""Wraps an agent from any framework to provide a standard interface.
1212
13-
This wrapper provides:
13+
This Adapter provides:
1414
- Unified execution interface via `run()`
1515
- Callback hooks for monitoring
1616
- Message history management via getter/setter
@@ -101,35 +101,6 @@ def get_messages(self) -> MessageHistory:
101101
"""
102102
return self.messages if self.messages is not None else MessageHistory()
103103

104-
def set_message_history(self, history: MessageHistory) -> None:
105-
"""Set the message history.
106-
107-
This is typically called by _run_agent() implementations after executing
108-
the agent, but can also be used to inject or modify history.
109-
110-
Args:
111-
history: The MessageHistory to set
112-
"""
113-
self.messages = history
114-
115-
def clear_message_history(self) -> None:
116-
"""Clear the message history."""
117-
self.messages = None
118-
119-
def append_to_message_history(self, role: Union[RoleType, str], content: Union[str, List[Any]], **kwargs) -> None:
120-
"""Append a message to the history.
121-
122-
If no history exists, creates a new one.
123-
124-
Args:
125-
role: The message role ("user", "assistant", "system", "tool")
126-
content: The message content (string or list of content parts)
127-
**kwargs: Additional fields (name, metadata, timestamp, etc.)
128-
"""
129-
if self.messages is None:
130-
self.messages = MessageHistory()
131-
self.messages.add_message(role, content, **kwargs) # type: ignore
132-
133104
def gather_traces(self) -> dict[str, Any]:
134105
"""Gather execution traces from this agent.
135106
@@ -148,7 +119,7 @@ def gather_traces(self) -> dict[str, Any]:
148119
149120
How to use:
150121
This method is automatically called by Benchmark during trace collection.
151-
Framework-specific wrappers can extend this to include additional data:
122+
Framework-specific adapters can extend this to include additional data:
152123
153124
```python
154125
def gather_traces(self) -> dict[str, Any]:
@@ -181,12 +152,12 @@ def gather_config(self) -> dict[str, Any]:
181152
- gathered_at: ISO timestamp
182153
- name: Agent name
183154
- agent_type: Underlying agent framework class name
184-
- wrapper_type: The specific wrapper class (e.g., SmolAgentAdapter)
155+
- adapter_type: The specific adapter class (e.g., SmolAgentAdapter)
185156
- callbacks: List of callback class names attached to this agent
186157
187158
How to use:
188159
This method is automatically called by Benchmark during config collection.
189-
Framework-specific wrappers can extend this to include additional data:
160+
Framework-specific adapters can extend this to include additional data:
190161
191162
```python
192163
def gather_config(self) -> dict[str, Any]:
@@ -200,7 +171,7 @@ def gather_config(self) -> dict[str, Any]:
200171
**super().gather_config(),
201172
"name": self.name,
202173
"agent_type": type(self.agent).__name__,
203-
"wrapper_type": type(self).__name__,
174+
"adapter_type": type(self).__name__,
204175
"callbacks": [type(cb).__name__ for cb in self.callbacks],
205176
}
206177

maseval/core/benchmark.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ def setup_environment(self, agent_data, task):
6464
6565
def setup_agents(self, agent_data, environment, task, user):
6666
agent = MyAgent(model=agent_data["model"])
67-
wrapper = AgentAdapter(agent, "agent")
68-
return [wrapper], {"agent": wrapper}
67+
agent_adapter = AgentAdapter(agent, "agent")
68+
return [agent_adapter], {"agent": agent_adapter}
6969
7070
def run_agents(self, agents, task, environment):
7171
return agents[0].run(task.query)
@@ -258,10 +258,10 @@ def setup_agents(self, agent_data, environment, task, user):
258258
259259
# Create agent (auto-registered when returned)
260260
agent = MyAgent(model=model)
261-
wrapper = AgentAdapter(agent, "agent1")
261+
agent_adapter = AgentAdapter(agent, "agent1")
262262
263263
# Environment and user are also auto-registered
264-
return [wrapper], {"agent1": wrapper}
264+
return [agent_adapter], {"agent1": agent_adapter}
265265
```
266266
267267
Traces and configs are automatically collected before evaluation via
@@ -673,12 +673,12 @@ def setup_agents(self, agent_data, environment, task, user):
673673
model=model,
674674
managed_agents=[w.agent for w in workers.values()]
675675
)
676-
orchestrator_wrapper = AgentAdapter(orchestrator, "orchestrator")
676+
orchestrator_adapter = AgentAdapter(orchestrator, "orchestrator")
677677
678678
# Return orchestrator to run, but all agents for monitoring
679679
# All agents auto-registered for tracing
680-
all_agents = {"orchestrator": orchestrator_wrapper, **workers}
681-
return [orchestrator_wrapper], all_agents
680+
all_agents = {"orchestrator": orchestrator_adapter, **workers}
681+
return [orchestrator_adapter], all_agents
682682
```
683683
"""
684684
pass

0 commit comments

Comments
 (0)