[feat][evaluation] agent evaluator#421
Merged
Merged
Conversation
Implement async run and debug functionality for agent evaluator type, including: - Add new async methods in evaluator service interface - Implement async handlers in evaluator service - Add request/response types for async operations - Add agent evaluator version conversion logic
Implement methods to fetch async debug evaluator results across service layers. Added new request/response types and implemented the functionality in evaluator source services while maintaining backward compatibility for unsupported evaluator types.
Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.
…outes Remove unused async debug result endpoint and its related request/response structs Add missing api routes for evaluator record endpoints
remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase
The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.
Add new error code for agent evaluator run failures to handle configuration issues
…ed to current agent and haven't matched with AG-UI message protocol
…ed to current agent and haven't matched with AG-UI message protocol
…into feat/agent_evaluator Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5
Add getAllEvalSetFields method to fetch omitted content for evaluation set fields. This ensures complete field data is available for agent evaluator type.
- Implement agent evaluator version batch get functionality - Add agent evaluator input data building and async execution in turn evaluation - Introduce async run and debug endpoints for agent evaluators - Support built-in agent evaluators with auth bypass
… formatting - Remove unnecessary type casting in ConvertAgentConfigDO2DTO - Reformat function parameters for better readability in asyncCallEvaluator - Clean up struct field alignment in GetAsyncDebugEvaluatorInvokeResultResponse - Fix import reference and variable naming in eval_openapi_app.go
Ensure metric emission captures the correct error state by wrapping in anonymous functions. Also add test cases for error scenarios in async evaluator calls.
Update test case to properly set InputFields and EvaluateTargetOutputFields with targetFields instead of empty maps to match expected behavior
- Add tests for evaluator version validation and input/output schemas - Implement test cases for evaluator agent delegation and setters - Add round-trip conversion tests for evaluator data structures - Include test coverage for evaluator HTTP info and run config - Add tests for evaluator async handlers and report functionality - Implement test cases for evaluator output data conversion - Add tests for evaluator content conversion and skill configs
Align struct fields and test case parameters consistently Remove trailing whitespace and ensure proper spacing
Use the new method in expt_run_item_turn_impl to determine async evaluation
Add test cases for agent evaluator functionality including auth checks, creation, updating drafts, and debugging operations. Tests cover success and failure scenarios with config switch validations.
Add test cases to verify different scenarios for updating evaluator record results, including nil output data, nil evaluator result, score handling, and DAO error cases
dsf86
previously approved these changes
Mar 12, 2026
HearyShen
commented
Mar 12, 2026
HymanShi
approved these changes
Mar 12, 2026
HearyShen
commented
Mar 12, 2026
dsf86
approved these changes
Mar 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Check the PR title
(Optional) Translate the PR title into Chinese
(Optional) More detailed description for this PR(en: English/zh: Chinese)
en:
zh(optional):
(Optional) Which issue(s) this PR fixes