You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
### Fixed
11
+
12
+
- Fixed GAIA2 multi-turn notification loop: `wait_for_notification()` no longer terminates the agent prematurely, enabling correct behavior for `time` and `adaptability` scenarios that require the agent to wait for simulation events and resume (PR: #PR_NUMBER_PLACEHOLDER)
13
+
- Added `Gaia2Environment.poll_notifications()` convenience method for custom agent implementations to drain the notification queue without needing ARE-internal imports (PR: #PR_NUMBER_PLACEHOLDER)
GAIA2 uses an **event-driven** multi-turn architecture, not user-turn interaction. Unlike Tau2 (where a user simulator drives multi-turn), GAIA2 scenarios have scheduled events (e.g., "calendar events added at t=240s", "friend replies at t=300s") that the agent must wait for and react to.
114
+
115
+
The benchmark invokes the agent **once**. The agent handles multi-turn internally via the notification loop:
116
+
117
+
1. Agent calls `SystemApp__wait_for_notification(timeout=N)` as a normal tool.
118
+
2. The ARE environment processes scheduled events, advances simulation time, and queues resulting notifications — all synchronously during the tool call.
119
+
3. The tool returns. The agent's loop continues (it does **not** terminate).
120
+
4. Before the next LLM call, the agent polls `environment.poll_notifications()` to retrieve messages that arrived during the wait.
121
+
5. The agent injects those messages into its context and continues reasoning.
122
+
6. Eventually the agent calls `AgentUserInterface__send_message_to_user` — the **only** termination signal.
123
+
124
+
### What custom agents must implement
125
+
126
+
The ARE tools handle all environment-side mechanics automatically (event processing, time advancement, notification queuing). No callbacks or hooks required. Custom agents must handle two things:
127
+
128
+
**1. Do not terminate on `wait_for_notification`.** Treat it as a regular tool call. Only terminate on `AgentUserInterface__send_message_to_user`.
129
+
130
+
**2. Poll notifications between steps.** After `wait_for_notification` returns, new messages are in the queue. Call `environment.poll_notifications()` to drain them:
131
+
132
+
```python
133
+
# Between agent steps (e.g., before each LLM call):
"""Create a model that waits for notification then terminates.
472
+
473
+
wait_for_notification is NOT a termination signal — the agent must
474
+
continue its loop. This fixture provides two responses: the wait call
475
+
followed by the real termination call (send_message_to_user).
476
+
"""
472
477
returnDummyModelAdapter(
473
478
model_id="test-wait-model",
474
479
responses=[
475
480
'Thought: I need to wait for a notification.\n\nAction:\n{"action": "SystemApp__wait_for_notification", "action_input": {"timeout_seconds": 30}}<end_action>',
0 commit comments