Skip to content

[feat][evaluation] agent evaluator#421

Merged
HearyShen merged 77 commits into
mainfrom
feat/agent_evaluator
Mar 12, 2026
Merged

[feat][evaluation] agent evaluator#421
HearyShen merged 77 commits into
mainfrom
feat/agent_evaluator

Conversation

@HearyShen

@HearyShen HearyShen commented Feb 5, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

@codecov

codecov Bot commented Feb 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.85607% with 121 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...odules/evaluation/domain/service/evaluator_impl.go 82.57% 14 Missing and 9 partials ⚠️
...nd/modules/evaluation/application/evaluator_app.go 85.41% 15 Missing and 6 partials ⚠️
...ation/application/convertor/evaluator/evaluator.go 87.70% 13 Missing and 2 partials ⚠️
...api/handler/coze/loop/apis/eval_open_apiservice.go 0.00% 9 Missing ⚠️
...aluation/domain/service/expt_run_item_turn_impl.go 89.88% 5 Missing and 4 partials ⚠️
...kend/modules/evaluation/domain/entity/evaluator.go 83.67% 8 Missing ⚠️
...ation/convertor/evaluator/evaluator_output_data.go 91.02% 4 Missing and 3 partials ⚠️
backend/infra/redis/redis.go 0.00% 6 Missing ⚠️
...d/modules/evaluation/application/experiment_app.go 85.00% 3 Missing and 3 partials ⚠️
.../evaluation/infra/repo/evaluator/evaluator_impl.go 54.54% 2 Missing and 3 partials ⚠️
... and 4 more

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   73.95%   74.30%   +0.34%     
==========================================
  Files         628      629       +1     
  Lines       65195    65939     +744     
==========================================
+ Hits        48216    48993     +777     
+ Misses      13697    13667      -30     
+ Partials     3282     3279       -3     
Flag Coverage Δ
unittests 74.30% <84.85%> (+0.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...nd/api/handler/coze/loop/apis/evaluator_service.go 62.06% <100.00%> (+9.89%) ⬆️
...modules/evaluation/application/eval_openapi_app.go 92.55% <100.00%> (+0.16%) ⬆️
backend/modules/evaluation/domain/entity/common.go 87.23% <ø> (ø)
...dules/evaluation/domain/entity/evaluator_record.go 100.00% <ø> (ø)
...valuation/domain/entity/evaluator_version_agent.go 100.00% <100.00%> (ø)
...ckend/modules/evaluation/domain/entity/expt_run.go 68.44% <100.00%> (+2.70%) ⬆️
backend/modules/evaluation/domain/entity/param.go 64.28% <ø> (ø)
...tion/infra/repo/evaluator/evaluator_record_impl.go 93.84% <100.00%> (+0.86%) ⬆️
backend/modules/evaluation/domain/entity/expt.go 86.20% <60.00%> (-1.60%) ⬇️
backend/modules/evaluation/pkg/conf/evaluator.go 60.29% <0.00%> (-0.90%) ⬇️
... and 12 more

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6089fb6...fbc9641. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HearyShen HearyShen changed the title [evaluation] agent evaluator [feat][evaluation] agent evaluator Feb 5, 2026
HearyShen and others added 26 commits February 5, 2026 14:40
Implement async run and debug functionality for agent evaluator type, including:
- Add new async methods in evaluator service interface
- Implement async handlers in evaluator service
- Add request/response types for async operations
- Add agent evaluator version conversion logic
Implement methods to fetch async debug evaluator results across service layers. Added new request/response types and implemented the functionality in evaluator source services while maintaining backward compatibility for unsupported evaluator types.
Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.
…outes

Remove unused async debug result endpoint and its related request/response structs
Add missing api routes for evaluator record endpoints
remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase
The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.
Add new error code for agent evaluator run failures to handle configuration issues
Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221
Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b
Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3
Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890
Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49
…ed to current agent and haven't matched with AG-UI message protocol
…ed to current agent and haven't matched with AG-UI message protocol
Change-Id: I8fea5b657938b46d264349d057b87d733e59f431
…into feat/agent_evaluator

Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5
Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4
HearyShen added 21 commits March 9, 2026 11:40
Add getAllEvalSetFields method to fetch omitted content for evaluation set fields. This ensures complete field data is available for agent evaluator type.
- Implement agent evaluator version batch get functionality
- Add agent evaluator input data building and async execution in turn evaluation
- Introduce async run and debug endpoints for agent evaluators
- Support built-in agent evaluators with auth bypass
… formatting

- Remove unnecessary type casting in ConvertAgentConfigDO2DTO
- Reformat function parameters for better readability in asyncCallEvaluator
- Clean up struct field alignment in GetAsyncDebugEvaluatorInvokeResultResponse
- Fix import reference and variable naming in eval_openapi_app.go
Ensure metric emission captures the correct error state by wrapping in anonymous functions. Also add test cases for error scenarios in async evaluator calls.
Update test case to properly set InputFields and EvaluateTargetOutputFields with targetFields instead of empty maps to match expected behavior
- Add tests for evaluator version validation and input/output schemas
- Implement test cases for evaluator agent delegation and setters
- Add round-trip conversion tests for evaluator data structures
- Include test coverage for evaluator HTTP info and run config
- Add tests for evaluator async handlers and report functionality
- Implement test cases for evaluator output data conversion
- Add tests for evaluator content conversion and skill configs
Align struct fields and test case parameters consistently
Remove trailing whitespace and ensure proper spacing
Use the new method in expt_run_item_turn_impl to determine async evaluation
Add test cases for agent evaluator functionality including auth checks, creation, updating drafts, and debugging operations. Tests cover success and failure scenarios with config switch validations.
Add test cases to verify different scenarios for updating evaluator record results, including nil output data, nil evaluator result, score handling, and DAO error cases
dsf86
dsf86 previously approved these changes Mar 12, 2026
Comment thread backend/modules/evaluation/application/evaluator_app.go
Comment thread backend/modules/evaluation/infra/repo/evaluator/mysql/evaluator.go
@HearyShen HearyShen merged commit bb61e2f into main Mar 12, 2026
17 checks passed
@HearyShen HearyShen deleted the feat/agent_evaluator branch March 12, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants