You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add judge evaluation support to agent graphs
Implement spec AIRUNNER 2.1.3 and GRAPH 1.3.1. The agent graph runner
now captures per-node input/output pairs on
AgentGraphRunnerResult.eval_requests without dispatching any judges
itself. ManagedAgentGraph consumes those requests to fire judge
evaluations as a single background asyncio Task surfaced on
ManagedGraphResult.evaluations.
- Add EvalRequest dataclass (node_key, input, output).
- AgentGraphRunnerResult.eval_requests is populated for nodes whose
AIAgentConfig has a judge_configuration with at least one judge.
- ManagedGraphResult.evaluations is now always an asyncio Task; when
no eval_requests exist it resolves immediately to an empty list.
- LangGraph runner emits one EvalRequest per node activation that is
not a functional-tool-loop step. Responses whose only tool calls
are handoff tools still emit. Per-run isolation: the eval_requests
list is built locally in run() and passed through make_node_fn so
concurrent calls do not share state.
- OpenAI runner extracts eval_requests from result.new_items, pairing
each agent's final message with the prompt that triggered the
activation (user input for the root, source agent's last message
for downstream nodes via HandoffOutputItem).
Re-implements PR #142 (merged then reverted) without the in-runner
evaluator dispatch or the ContextVar-based task accumulator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments