fix(groupchat): route WorkflowEvent payloads + enforce framework max_rounds for af 1.3.0

Prachig-Microsoft · Copilot · Prachig-Microsoft · commit 42902ab23491 · 2026-06-13T14:33:34.000+05:30
In agent-framework 1.3.0, `workflow.run(stream=True)` only yields
`WorkflowEvent` instances. `AgentResponseUpdate` is wrapped inside
`event.data` for `type=="output"` events. The two types are unrelated
(verified by MRO), so the previous `isinstance(event, AgentResponseUpdate)`
gate from the b260107 era was permanently dead in 1.3.0. As a result every
orchestrator-side safety guard inside that branch silently no-opped:

* per-agent loop detection
* Coordinator finish=true detection
* max_rounds enforcement
* streaming callback dispatch
* manager-instruction extraction

That is why production runs hit the framework's own 100-iteration runner
cap as `RuntimeError("Runner did not converge after 100 iterations")`
even after the recent identity-resolution patch (which only touched code
that never executed).

Three coordinated fixes:

1. Replace the dead `isinstance(event, AgentResponseUpdate)` gate with
   `isinstance(event, WorkflowEvent) and event.type == "output"` and
   inspect `event.data` / `event.executor_id` to distinguish per-
   participant streaming chunks (executor_id matches one of self.agents
   and data is AgentResponseUpdate) from the framework orchestrator's
   final output (list[Message] or custom result object).

2. Add `executor_id` parameter to `_handle_agent_update` so identity
   resolves from the WorkflowEvent wrapper's executor_id (always populated
   from `AgentExecutor.id` = the agent's name) first, then falls back
   to `event.author_name`, then legacy `event.agent_id`. Matches the
   approach already used by Content Processing Solution.

3. Pass `max_rounds=self.max_rounds` and `intermediate_outputs=True`
   to `GroupChatBuilder`:
   - `max_rounds` gives the framework itself a clean termination
     ceiling so even if our orchestrator-side guards miss, the workflow
     halts cleanly instead of crashing at the runner's 100-iteration cap.
   - `intermediate_outputs=True` is required for each participant's
     `yield_output(AgentResponseUpdate)` call to surface as a workflow
     `output` event. Without this, only the orchestrator's final yield
     reaches our streaming loop and the per-agent guards above never run.

Tests:
* Existing termination/loop-detection tests still pass (handler now has
  3-tier identity resolution with backward-compat for `author_name`).
* Added `test_handle_agent_update_prefers_executor_id_over_author_name`
  to lock in the new precedence.
* Added `test_handle_agent_update_strips_executor_id_prefix` to cover
  the `groupchat_agent:Coordinator` framework prefix.
* Full suite: 833 passed (was 831; +2 new tests).

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/src/processor/src/libs/agent_framework/groupchat_orchestrator.py b/src/processor/src/libs/agent_framework/groupchat_orchestrator.py
@@ -543,9 +543,32 @@ async def run_stream(
                             termination_type="hard_timeout",
                         )
 
-                if isinstance(event, AgentResponseUpdate):
+                # In agent-framework 1.3.0, ``workflow.run(stream=True)`` yields
+                # only ``WorkflowEvent`` instances; ``AgentResponseUpdate`` is
+                # wrapped inside ``WorkflowEvent.data`` for ``type=="output"``
+                # events. The previous ``isinstance(event, AgentResponseUpdate)``
+                # check from the b260107 era is permanently dead in 1.3.0
+                # because the two types are unrelated. We now dispatch on
+                # ``WorkflowEvent.type`` and inspect ``event.data`` /
+                # ``event.executor_id`` to route per-participant streaming
+                # chunks vs the orchestrator's final output.
+                if not isinstance(event, WorkflowEvent) or event.type != "output":
+                    continue
+
+                data = event.data
+                src_executor = self._normalize_executor_id(event.executor_id or "")
+
+                # Per-participant streaming chunk. Requires
+                # ``intermediate_outputs=True`` on the GroupChatBuilder so the
+                # underlying executors' ``yield_output(AgentResponseUpdate)``
+                # calls surface as workflow events rather than being swallowed.
+                if (
+                    isinstance(data, AgentResponseUpdate)
+                    and src_executor in self.agents
+                ):
                     await self._handle_agent_update(
-                        event,
+                        data,
+                        executor_id=event.executor_id,
                         stream_callback=on_agent_response_stream,
                         callback=on_agent_response,
                     )
@@ -565,22 +588,23 @@ async def run_stream(
                     # If the Coordinator requested finish=true, stop immediately.
                     if self._termination_requested:
                         break
-                elif event.type == "output":
-                    event: WorkflowEvent
-                    # Complete last agent's response before finishing
-                    if self._last_executor_id and self._current_agent_response:
-                        await self._complete_agent_response(
-                            self._last_executor_id, on_agent_response
-                        )
 
-                    # Extract final conversation from output
-                    if isinstance(event.data, list):
-                        conversation = event.data
-                        self._conversation = conversation  # Update instance variable
-                    else:
-                        # Handle custom result objects with conversation attribute
-                        conversation = getattr(event.data, "conversation", [])
-                        self._conversation = conversation  # Update instance variable
+                    continue
+
+                # Final orchestrator output: complete any buffered agent
+                # response and capture the conversation.
+                if self._last_executor_id and self._current_agent_response:
+                    await self._complete_agent_response(
+                        self._last_executor_id, on_agent_response
+                    )
+
+                if isinstance(data, list):
+                    conversation = data
+                    self._conversation = conversation  # Update instance variable
+                else:
+                    # Handle custom result objects with conversation attribute
+                    conversation = getattr(data, "conversation", [])
+                    self._conversation = conversation  # Update instance variable
 
             # Backfill tool usage from the final conversation (more reliable than streaming updates)
             # AgentResponseUpdate may stream text only; tool calls are represented as FunctionCallContent
@@ -715,6 +739,7 @@ async def run_stream(
     async def _handle_agent_update(
         self,
         event: AgentResponseUpdate,
+        executor_id: str | None = None,
         stream_callback: AgentResponseStreamCallback | None = None,
         callback: AgentResponseCallback | None = None,
     ) -> None:
@@ -726,19 +751,21 @@ async def _handle_agent_update(
         2. On agent switch, complete previous agent's response
         3. Trigger callback with complete response
         4. Handle tool calls separately from text streaming
+
+        Agent identity resolution priority:
+          1. ``executor_id`` from the wrapping ``WorkflowEvent`` (always
+             populated by the workflow runner from ``AgentExecutor.id`` which
+             is the agent's name). This is the primary source in 1.3.0.
+          2. ``event.author_name`` (set by 1.3.0's ``map_chat_to_agent_update``).
+          3. ``event.agent_id`` (legacy; not populated in 1.3.0).
         """
-        # NOTE: In agent-framework 1.3.0, ``AgentResponseUpdate.agent_id`` is no
-        # longer populated by ``map_chat_to_agent_update`` (only ``author_name``
-        # is set, from the agent's name). Reading ``event.agent_id`` alone
-        # silently yielded an empty string, which made every downstream identity
-        # check (loop detection, coordinator termination signal extraction,
-        # manager-instruction parsing) silently no-op. Prefer ``author_name``
-        # and fall back to ``agent_id`` only for older shapes. Use ``getattr``
-        # so older event types without ``author_name`` still work.
-        author_name = getattr(event, "author_name", None)
-        agent_name = author_name or self._normalize_executor_id(
-            getattr(event, "agent_id", None) or ""
-        )
+        if executor_id:
+            agent_name = self._normalize_executor_id(executor_id)
+        else:
+            author_name = getattr(event, "author_name", None)
+            agent_name = author_name or self._normalize_executor_id(
+                getattr(event, "agent_id", None) or ""
+            )
         await self._start_agent_if_needed(agent_name, stream_callback, callback)
         self._append_text_chunk(event)
         await self._process_tool_calls(event, agent_name, stream_callback)
@@ -1237,10 +1264,24 @@ async def _build_groupchat(self) -> Workflow:
             and name != self.get_result_generator_name()
         ]
 
+        # ``max_rounds`` is enforced at the framework level so the workflow
+        # halts cleanly even if our orchestrator-side guards miss an event
+        # shape. Without this, the framework's default behavior is "continue
+        # indefinitely" (see GroupChatBuilder docstring) until the workflow
+        # runner hits its own 100-iteration cap and raises
+        # ``RuntimeError("Runner did not converge after 100 iterations")``.
+        #
+        # ``intermediate_outputs=True`` surfaces each participant's
+        # ``yield_output(AgentResponseUpdate)`` call as a workflow ``output``
+        # event. Without this, only the orchestrator's final yield reaches
+        # our streaming loop, which means per-agent loop detection, finish
+        # signal extraction, and streaming callbacks all silently no-op.
         return (
             GroupChatBuilder(
                 orchestrator_agent=coordinator,
                 participants=participants,
+                max_rounds=self.max_rounds,
+                intermediate_outputs=True,
             )
             .build()
         )
diff --git a/src/processor/src/tests/unit/libs/agent_framework/test_groupchat_orchestrator_termination.py b/src/processor/src/tests/unit/libs/agent_framework/test_groupchat_orchestrator_termination.py
@@ -227,9 +227,69 @@ async def _run():
         await orch._complete_agent_response("Chief Architect", callback=None)
 
         assert orch._forced_termination_requested is True, (
-            "Loop detection failed to fire after 3 identical Coordinator "
-            "selections via _handle_agent_update; agent identity resolution "
-            "is broken."
-        )
-
+            "Loop detection failed to fire after 3 identical Coordinator "
+            "selections via _handle_agent_update; agent identity resolution "
+            "is broken."
+        )
+
+    asyncio.run(_run())
+
+
+def test_handle_agent_update_prefers_executor_id_over_author_name():
+    """In agent-framework 1.3.0, the workflow runner always wraps payloads in
+    a ``WorkflowEvent`` whose ``executor_id`` is the ``AgentExecutor.id``
+    (= the agent's name). This is the most reliable identity source - more
+    reliable than ``author_name`` which may differ if the agent runtime
+    rewrites the chat author. The handler must prefer ``executor_id`` when
+    provided.
+    """
+
+    async def _run():
+        orch = _make_orchestrator()
+
+        # author_name disagrees with the framework executor_id on purpose.
+        event = _AgentResponseUpdateStub(
+            author_name="SomethingElse",
+            agent_id=None,
+        )
+
+        await orch._handle_agent_update(
+            event,
+            executor_id="Coordinator",
+            stream_callback=None,
+            callback=None,
+        )  # type: ignore[arg-type]
+
+        assert orch._last_executor_id == "Coordinator", (
+            "executor_id from the WorkflowEvent wrapper must take precedence "
+            "over event.author_name; otherwise downstream coordinator checks "
+            "may resolve to the wrong agent."
+        )
+
+    asyncio.run(_run())
+
+
+def test_handle_agent_update_strips_executor_id_prefix():
+    """``GroupChatBuilder`` may wrap executor ids with a
+    ``groupchat_agent:Coordinator`` prefix. ``_normalize_executor_id`` must
+    strip it so the agent name compares cleanly against ``coordinator_name``.
+    """
+
+    async def _run():
+        orch = _make_orchestrator()
+
+        event = _AgentResponseUpdateStub(author_name=None, agent_id=None)
+
+        await orch._handle_agent_update(
+            event,
+            executor_id="groupchat_agent:Coordinator",
+            stream_callback=None,
+            callback=None,
+        )  # type: ignore[arg-type]
+
+        assert orch._last_executor_id == "Coordinator", (
+            "_normalize_executor_id must strip the framework prefix so "
+            "agent identity matches the configured coordinator_name."
+        )
+
     asyncio.run(_run())