Skip to content

Commit e5ec08d

Browse files
authored
Improvement of Non-Seeding Workflow (#27)
Simplify seeding API with null-safe pattern Change seed_generator from Optional[SeedGenerator] to SeedGenerator in all benchmark setup methods. When seeding is disabled (seed=None), derive_seed() returns None instead of the generator being None. This eliminates conditional checks throughout the codebase - the same code works whether seeding is enabled or disabled. - Update Benchmark base class and all setup method signatures - Update DefaultSeedGenerator to accept global_seed=None - Update all benchmarks (GAIA2, MACS, MultiAgentBench, Tau2) - Update seeding documentation and examples - Update all tests to use new pattern
1 parent 153a5be commit e5ec08d

36 files changed

Lines changed: 781 additions & 546 deletions

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4949

5050
### Changed
5151

52+
**Core**
53+
54+
- Simplified seeding API: `seed_generator` parameter in setup methods is now always non-None (`SeedGenerator` instead of `Optional[SeedGenerator]`). When seeding is disabled (`seed=None`), `derive_seed()` returns `None` instead of raising an error. This eliminates all `if seed_generator is not None:` conditional checks - the same code path works whether seeding is enabled or disabled. (PR: #27)
55+
5256
**Benchmarks**
5357

5458
- `MACSBenchmark` and `Tau2Benchmark` benchmarks now actively use the seeding system by deriving seeds for model adapters. Seeds are passed to agents, user simulators, tool simulators, and LLM-based evaluators for reproducible runs. (PR: #26)

docs/benchmark/gaia2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ configure_model_ids(tasks, evaluator_model_id="gpt-4o")
4646

4747
# Create your framework-specific benchmark subclass
4848
class MyGaia2Benchmark(Gaia2Benchmark):
49-
def setup_agents(self, agent_data, environment, task, user, seed_generator=None):
49+
def setup_agents(self, agent_data, environment, task, user, seed_generator):
5050
tools = environment.create_tools()
5151
# Create your agent with these tools
5252
...

docs/guides/seeding.md

Lines changed: 56 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,34 @@ results = benchmark.run(tasks, agent_data=config)
3838

3939
This creates a `DefaultSeedGenerator` internally and passes it to all setup methods.
4040

41+
### Disabling Seeding
42+
43+
If you don't need seeding, you can simply ignore the seed generators. However, in workflows where you mix seeded and non-seeded runs, you can disable seeding without writing `if/else` statements to check whether a seed is provided.
44+
45+
To disable seeding, omit the `seed` parameter when creating your `Benchmark` or `DefaultSeedGenerator` (or pass `seed=None`):
46+
47+
1. A `DefaultSeedGenerator(global_seed=None)` is still created internally
48+
2. Setup methods still receive a `seed_generator` parameter
49+
3. `derive_seed()` returns `None` instead of an integer
50+
51+
```python
52+
class MyBenchmark(Benchmark):
53+
...
54+
def setup_agents(self, agent_data, environment, task, user, seed_generator):
55+
# Always works - seed_generator is never None
56+
agent = MyAgent(seed=seed_generator("agents/orchestrator"))
57+
...
58+
59+
# No seed = seeding disabled
60+
benchmark = MyBenchmark(seed=None)
61+
```
62+
4163
### Using Seeds in Setup Methods
4264

43-
All setup methods receive an optional `seed_generator` parameter. Use it to derive seeds for your components:
65+
All setup methods receive a `seed_generator` parameter. Use it to derive seeds for your components. When seeding is disabled (no `seed` passed to benchmark), `derive_seed()` returns `None`:
4466

4567
```python
4668
from maseval import Benchmark, SeedGenerator
47-
from typing import Optional
4869

4970
class MyBenchmark(Benchmark):
5071
def setup_agents(
@@ -53,18 +74,16 @@ class MyBenchmark(Benchmark):
5374
environment,
5475
task,
5576
user,
56-
seed_generator: Optional[SeedGenerator] = None,
77+
seed_generator: SeedGenerator,
5778
):
5879
# Derive a seed for your agent using hierarchical paths
59-
agent_seed = None
60-
if seed_generator is not None:
61-
# Use child() to create logical namespaces - results in "agents/orchestrator"
62-
agent_gen = seed_generator.child("agents")
63-
agent_seed = agent_gen.derive_seed("orchestrator")
64-
65-
# Pass seed to model adapter
66-
model = self.get_model_adapter(model_id, seed=agent_seed)
67-
agent = MyAgent(model=model)
80+
# Returns None if seeding is disabled (global_seed=None)
81+
# Use child() to create logical namespaces - results in "agents/orchestrator"
82+
agent_gen = seed_generator.child("agents")
83+
agent_seed = agent_gen.derive_seed("orchestrator")
84+
85+
# Pass seed directly to your agent
86+
agent = MyAgent(seed=agent_seed)
6887
# ... rest of setup
6988
```
7089

@@ -75,18 +94,17 @@ Seeds are derived from hierarchical paths, so `derive_seed("orchestrator")` with
7594
When running multiple repetitions of the same task, you may want some components to vary while others remain constant. The `per_repetition` flag controls this:
7695

7796
```python
78-
def setup_agents(self, agent_data, environment, task, user, seed_generator=None):
79-
if seed_generator is not None:
80-
# Use child() to group agent seeds under "agents/" namespace
81-
agent_gen = seed_generator.child("agents")
97+
def setup_agents(self, agent_data, environment, task, user, seed_generator):
98+
# Use child() to group agent seeds under "agents/" namespace
99+
agent_gen = seed_generator.child("agents")
82100

83-
# Varies per repetition - different seed for rep 0, 1, 2, ...
84-
# Results in path: "agents/experimental"
85-
experimental_seed = agent_gen.derive_seed("experimental", per_repetition=True)
101+
# Varies per repetition - different seed for rep 0, 1, 2, ...
102+
# Results in path: "agents/experimental"
103+
experimental_seed = agent_gen.derive_seed("experimental", per_repetition=True)
86104

87-
# Constant across repetitions - same seed for rep 0, 1, 2, ...
88-
# Results in path: "agents/baseline"
89-
baseline_seed = agent_gen.derive_seed("baseline", per_repetition=False)
105+
# Constant across repetitions - same seed for rep 0, 1, 2, ...
106+
# Results in path: "agents/baseline"
107+
baseline_seed = agent_gen.derive_seed("baseline", per_repetition=False)
90108
```
91109

92110
**Use cases:**
@@ -101,26 +119,24 @@ def setup_agents(self, agent_data, environment, task, user, seed_generator=None)
101119
For complex systems with many components, use `child()` to create hierarchical namespaces:
102120

103121
```python
104-
def setup_environment(self, agent_data, task, seed_generator=None):
105-
if seed_generator is not None:
106-
# Create a child generator for environment components
107-
env_gen = seed_generator.child("environment")
108-
109-
# Further nest tools under "environment/tools/"
110-
tools_gen = env_gen.child("tools")
111-
weather_seed = tools_gen.derive_seed("weather") # "environment/tools/weather"
112-
search_seed = tools_gen.derive_seed("search") # "environment/tools/search"
113-
114-
def setup_agents(self, agent_data, environment, task, user, seed_generator=None):
115-
if seed_generator is not None:
116-
# Create a child generator for agents
117-
agent_gen = seed_generator.child("agents")
122+
def setup_environment(self, agent_data, task, seed_generator):
123+
# Create a child generator for environment components
124+
env_gen = seed_generator.child("environment")
125+
126+
# Further nest tools under "environment/tools/"
127+
tools_gen = env_gen.child("tools")
128+
weather_seed = tools_gen.derive_seed("weather") # "environment/tools/weather"
129+
search_seed = tools_gen.derive_seed("search") # "environment/tools/search"
130+
131+
def setup_agents(self, agent_data, environment, task, user, seed_generator):
132+
# Create a child generator for agents
133+
agent_gen = seed_generator.child("agents")
118134

119-
orchestrator_seed = agent_gen.derive_seed("orchestrator") # "agents/orchestrator"
135+
orchestrator_seed = agent_gen.derive_seed("orchestrator") # "agents/orchestrator"
120136

121-
# Nest workers under "agents/workers/"
122-
worker_gen = agent_gen.child("workers")
123-
analyst_seed = worker_gen.derive_seed("analyst") # "agents/workers/analyst"
137+
# Nest workers under "agents/workers/"
138+
worker_gen = agent_gen.child("workers")
139+
analyst_seed = worker_gen.derive_seed("analyst") # "agents/workers/analyst"
124140
```
125141

126142
Child generators share the same seed log, so all derived seeds are recorded together.

examples/five_a_day_benchmark/five_a_day_benchmark.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -523,7 +523,7 @@
523523
"id": "70c66cd0",
524524
"metadata": {},
525525
"outputs": [],
526-
"source": "class FiveADayBenchmark(Benchmark):\n \"\"\"5-A-Day benchmark with multi-agent support.\"\"\"\n\n def setup_environment(self, agent_data: Dict[str, Any], task: Task, seed_generator: Optional[SeedGenerator] = None) -> Environment:\n \"\"\"Create environment from task data.\"\"\"\n task_data = {\n \"environment_data\": task.environment_data,\n \"query\": task.query,\n \"evaluation_data\": task.evaluation_data,\n \"metadata\": task.metadata,\n }\n\n environment = FiveADayEnvironment(task_data)\n\n # Register all tools for tracing\n for tool_name, tool_adapter in environment.get_tools().items():\n self.register(\"tools\", tool_name, tool_adapter)\n\n return environment\n\n def setup_agents(\n self,\n agent_data: Dict[str, Any],\n environment: Environment,\n task: Task,\n user=None,\n seed_generator: Optional[SeedGenerator] = None,\n ) -> tuple[list[SmolAgentAdapter], Dict[str, SmolAgentAdapter]]:\n \"\"\"Create multi-agent system with orchestrator and specialists.\n\n If seed_generator is provided, seeds are derived for each agent\n using the benchmark's seeding system with hierarchical paths.\n \"\"\"\n # Build seeds dict if seed_generator is available\n # Use child(\"agents\") to create logical paths like \"agents/primary_agent\"\n seeds = None\n if seed_generator is not None:\n agent_gen = seed_generator.child(\"agents\")\n seeds = {}\n for agent_spec in agent_data[\"agents\"]:\n seeds[agent_spec[\"agent_id\"]] = agent_gen.derive_seed(agent_spec[\"agent_id\"])\n\n agents_to_run, agents_to_monitor = build_agents(agent_data, environment, seeds)\n\n # Create adapters for the primary agent(s) to run\n adapters_to_run = [SmolAgentAdapter(agent, agent.name) for agent in agents_to_run]\n\n # This ensures all agent traces are collected by the benchmark\n all_agents = {agent.name: agent for agent in agents_to_run} | agents_to_monitor\n adapters_to_monitor = {name: SmolAgentAdapter(agent, name) for name, agent in all_agents.items()}\n return adapters_to_run, adapters_to_monitor\n\n def setup_evaluators(self, environment, task, agents, user, seed_generator: Optional[SeedGenerator] = None) -> Sequence[Evaluator]:\n \"\"\"Create evaluators based on task's evaluation criteria.\"\"\"\n if not task.evaluation_data[\"evaluators\"]:\n return []\n\n evaluator_instances = []\n for name in task.evaluation_data[\"evaluators\"]:\n evaluator_class = getattr(evaluators, name)\n evaluator_instances.append(evaluator_class(task, environment, user))\n\n return evaluator_instances\n\n def run_agents(self, agents: Sequence[AgentAdapter], task: Task, environment: Environment, query: str) -> Sequence[Any]:\n \"\"\"Execute agents and return their final answers.\"\"\"\n answers = [agent.run(query) for agent in agents]\n return answers\n\n def get_model_adapter(self, model_id: str, **kwargs) -> ModelAdapter:\n \"\"\"Return a model adapter for benchmark components that need LLM access.\n\n This benchmark doesn't use simulated tools, user simulators, or LLM judges,\n so this method is not called during execution.\n \"\"\"\n raise NotImplementedError(\"This benchmark doesn't use model adapters for tools/users/evaluators.\")\n\n def evaluate(\n self,\n evaluators: Sequence[Evaluator],\n agents: Dict[str, AgentAdapter],\n final_answer: Any,\n traces: Dict[str, Any],\n ) -> list[Dict[str, Any]]:\n \"\"\"Evaluate agent performance.\"\"\"\n results = []\n for evaluator in evaluators:\n filtered_traces = evaluator.filter_traces(traces)\n results.append(evaluator(filtered_traces, final_answer))\n return results"
526+
"source": "class FiveADayBenchmark(Benchmark):\n \"\"\"5-A-Day benchmark with multi-agent support.\"\"\"\n\n def setup_environment(self, agent_data: Dict[str, Any], task: Task, seed_generator: SeedGenerator) -> Environment:\n \"\"\"Create environment from task data.\"\"\"\n task_data = {\n \"environment_data\": task.environment_data,\n \"query\": task.query,\n \"evaluation_data\": task.evaluation_data,\n \"metadata\": task.metadata,\n }\n\n environment = FiveADayEnvironment(task_data)\n\n # Register all tools for tracing\n for tool_name, tool_adapter in environment.get_tools().items():\n self.register(\"tools\", tool_name, tool_adapter)\n\n return environment\n\n def setup_agents(\n self,\n agent_data: Dict[str, Any],\n environment: Environment,\n task: Task,\n user,\n seed_generator: SeedGenerator,\n ) -> tuple[list[SmolAgentAdapter], Dict[str, SmolAgentAdapter]]:\n \"\"\"Create multi-agent system with orchestrator and specialists.\n\n Seeds are derived for each agent using the benchmark's seeding system\n with hierarchical paths. derive_seed() returns None if seeding is disabled.\n \"\"\"\n # Build seeds dict using seed_generator\n # Use child(\"agents\") to create logical paths like \"agents/primary_agent\"\n agent_gen = seed_generator.child(\"agents\")\n seeds = {}\n for agent_spec in agent_data[\"agents\"]:\n seeds[agent_spec[\"agent_id\"]] = agent_gen.derive_seed(agent_spec[\"agent_id\"])\n\n agents_to_run, agents_to_monitor = build_agents(agent_data, environment, seeds)\n\n # Create adapters for the primary agent(s) to run\n adapters_to_run = [SmolAgentAdapter(agent, agent.name) for agent in agents_to_run]\n\n # This ensures all agent traces are collected by the benchmark\n all_agents = {agent.name: agent for agent in agents_to_run} | agents_to_monitor\n adapters_to_monitor = {name: SmolAgentAdapter(agent, name) for name, agent in all_agents.items()}\n return adapters_to_run, adapters_to_monitor\n\n def setup_evaluators(self, environment, task, agents, user, seed_generator: SeedGenerator) -> Sequence[Evaluator]:\n \"\"\"Create evaluators based on task's evaluation criteria.\"\"\"\n if not task.evaluation_data[\"evaluators\"]:\n return []\n\n evaluator_instances = []\n for name in task.evaluation_data[\"evaluators\"]:\n evaluator_class = getattr(evaluators, name)\n evaluator_instances.append(evaluator_class(task, environment, user))\n\n return evaluator_instances\n\n def run_agents(self, agents: Sequence[AgentAdapter], task: Task, environment: Environment, query: str) -> Sequence[Any]:\n \"\"\"Execute agents and return their final answers.\"\"\"\n answers = [agent.run(query) for agent in agents]\n return answers\n\n def get_model_adapter(self, model_id: str, **kwargs) -> ModelAdapter:\n \"\"\"Return a model adapter for benchmark components that need LLM access.\n\n This benchmark doesn't use simulated tools, user simulators, or LLM judges,\n so this method is not called during execution.\n \"\"\"\n raise NotImplementedError(\"This benchmark doesn't use model adapters for tools/users/evaluators.\")\n\n def evaluate(\n self,\n evaluators: Sequence[Evaluator],\n agents: Dict[str, AgentAdapter],\n final_answer: Any,\n traces: Dict[str, Any],\n ) -> list[Dict[str, Any]]:\n \"\"\"Evaluate agent performance.\"\"\"\n results = []\n for evaluator in evaluators:\n filtered_traces = evaluator.filter_traces(traces)\n results.append(evaluator(filtered_traces, final_answer))\n return results"
527527
},
528528
{
529529
"cell_type": "markdown",

examples/five_a_day_benchmark/five_a_day_benchmark.py

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -729,7 +729,7 @@ class FiveADayBenchmark(Benchmark):
729729
Supports single-agent and multi-agent (orchestrator+specialist) configurations.
730730
"""
731731

732-
def setup_environment(self, agent_data: Dict[str, Any], task: Task, seed_generator: Optional[SeedGenerator] = None) -> Environment:
732+
def setup_environment(self, agent_data: Dict[str, Any], task: Task, seed_generator: SeedGenerator) -> Environment:
733733
"""Create environment from task data."""
734734
# Pass full task data to environment
735735
task_data = {
@@ -753,8 +753,8 @@ def setup_agents(
753753
agent_data: Dict[str, Any],
754754
environment: Environment,
755755
task: Task,
756-
user=None,
757-
seed_generator: Optional[SeedGenerator] = None,
756+
user,
757+
seed_generator: SeedGenerator,
758758
) -> tuple[List[AgentAdapter], Dict[str, AgentAdapter]]:
759759
"""Create framework-specific agent with tools from environment.
760760
@@ -775,14 +775,13 @@ def setup_agents(
775775
primary_spec = next(a for a in agents_specs if a["agent_id"] == primary_agent_id)
776776
specialist_specs = [a for a in agents_specs if a["agent_id"] != primary_agent_id]
777777

778-
# Derive seeds for agents using seed_generator if available
778+
# Derive seeds for agents using seed_generator
779779
# Use child("agents") to create logical paths like "agents/primary_agent"
780-
seeds = None
781-
if seed_generator is not None:
782-
agent_gen = seed_generator.child("agents")
783-
seeds = {primary_spec["agent_id"]: agent_gen.derive_seed(primary_spec["agent_id"])}
784-
for spec in specialist_specs:
785-
seeds[spec["agent_id"]] = agent_gen.derive_seed(spec["agent_id"])
780+
# derive_seed() returns None if seeding is disabled
781+
agent_gen = seed_generator.child("agents")
782+
seeds = {primary_spec["agent_id"]: agent_gen.derive_seed(primary_spec["agent_id"])}
783+
for spec in specialist_specs:
784+
seeds[spec["agent_id"]] = agent_gen.derive_seed(spec["agent_id"])
786785

787786
# Build agent using unified interface - now returns (primary_adapter, all_adapters_dict)
788787
builder = get_agent_builder(framework, agent_type)
@@ -791,7 +790,7 @@ def setup_agents(
791790
# Return primary adapter to run, and all adapters for trace registration
792791
return [primary_adapter], all_adapters_dict
793792

794-
def setup_evaluators(self, environment, task, agents, user, seed_generator: Optional[SeedGenerator] = None) -> Sequence[Evaluator]:
793+
def setup_evaluators(self, environment, task, agents, user, seed_generator: SeedGenerator) -> Sequence[Evaluator]:
795794
"""Create evaluators based on task's evaluation_data.evaluators list."""
796795
if not task.evaluation_data["evaluators"]:
797796
return []

examples/macs_benchmark/macs_benchmark.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ def setup_user(
179179
agent_data: Dict[str, Any],
180180
environment: Environment,
181181
task: Task,
182-
seed_generator: Optional[SeedGenerator] = None,
182+
seed_generator: SeedGenerator,
183183
) -> SmolagentsMACSUser:
184184
"""Create smolagents-compatible user simulator.
185185
@@ -210,7 +210,7 @@ def setup_agents(
210210
environment: MACSEnvironment, # type: ignore[override]
211211
task: Task,
212212
user: Optional[User],
213-
seed_generator: Optional[SeedGenerator] = None,
213+
seed_generator: SeedGenerator,
214214
) -> Tuple[List[AgentAdapter], Dict[str, AgentAdapter]]:
215215
"""Create smolagents multi-agent hierarchy.
216216
@@ -435,7 +435,7 @@ def setup_user(
435435
agent_data: Dict[str, Any],
436436
environment: Environment,
437437
task: Task,
438-
seed_generator: Optional[SeedGenerator] = None,
438+
seed_generator: SeedGenerator,
439439
) -> LangGraphMACSUser:
440440
"""Create langgraph-compatible user simulator.
441441
@@ -466,7 +466,7 @@ def setup_agents(
466466
environment: MACSEnvironment, # type: ignore[override]
467467
task: Task,
468468
user: Optional[User],
469-
seed_generator: Optional[SeedGenerator] = None,
469+
seed_generator: SeedGenerator,
470470
) -> Tuple[List[AgentAdapter], Dict[str, AgentAdapter]]:
471471
"""Create langgraph multi-agent hierarchy.
472472

0 commit comments

Comments
 (0)