[feat][evaluation] agent evaluator by HearyShen · Pull Request #421 · coze-dev/coze-loop

HearyShen · 2026-02-05T04:50:46Z

What type of PR is this?

Check the PR title

This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
The description of this PR title is user-oriented and clear enough for others to understand.
Add documentation if the current PR requires user awareness at the usage level.
This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

codecov · 2026-02-05T05:11:35Z

Codecov Report

❌ Patch coverage is 84.85607% with 121 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...odules/evaluation/domain/service/evaluator_impl.go	82.57%	14 Missing and 9 partials ⚠️
...nd/modules/evaluation/application/evaluator_app.go	85.41%	15 Missing and 6 partials ⚠️
...ation/application/convertor/evaluator/evaluator.go	87.70%	13 Missing and 2 partials ⚠️
...api/handler/coze/loop/apis/eval_open_apiservice.go	0.00%	9 Missing ⚠️
...aluation/domain/service/expt_run_item_turn_impl.go	89.88%	5 Missing and 4 partials ⚠️
...kend/modules/evaluation/domain/entity/evaluator.go	83.67%	8 Missing ⚠️
...ation/convertor/evaluator/evaluator_output_data.go	91.02%	4 Missing and 3 partials ⚠️
backend/infra/redis/redis.go	0.00%	6 Missing ⚠️
...d/modules/evaluation/application/experiment_app.go	85.00%	3 Missing and 3 partials ⚠️
.../evaluation/infra/repo/evaluator/evaluator_impl.go	54.54%	2 Missing and 3 partials ⚠️
... and 4 more

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   73.95%   74.30%   +0.34%     
==========================================
  Files         628      629       +1     
  Lines       65195    65939     +744     
==========================================
+ Hits        48216    48993     +777     
+ Misses      13697    13667      -30     
+ Partials     3282     3279       -3

Flag	Coverage Δ
unittests	`74.30% <84.85%> (+0.34%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...nd/api/handler/coze/loop/apis/evaluator_service.go	`62.06% <100.00%> (+9.89%)`	⬆️
...modules/evaluation/application/eval_openapi_app.go	`92.55% <100.00%> (+0.16%)`	⬆️
backend/modules/evaluation/domain/entity/common.go	`87.23% <ø> (ø)`
...dules/evaluation/domain/entity/evaluator_record.go	`100.00% <ø> (ø)`
...valuation/domain/entity/evaluator_version_agent.go	`100.00% <100.00%> (ø)`
...ckend/modules/evaluation/domain/entity/expt_run.go	`68.44% <100.00%> (+2.70%)`	⬆️
backend/modules/evaluation/domain/entity/param.go	`64.28% <ø> (ø)`
...tion/infra/repo/evaluator/evaluator_record_impl.go	`93.84% <100.00%> (+0.86%)`	⬆️
backend/modules/evaluation/domain/entity/expt.go	`86.20% <60.00%> (-1.60%)`	⬇️
backend/modules/evaluation/pkg/conf/evaluator.go	`60.29% <0.00%> (-0.90%)`	⬇️
... and 12 more

... and 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6089fb6...fbc9641. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ned display

Implement async run and debug functionality for agent evaluator type, including: - Add new async methods in evaluator service interface - Implement async handlers in evaluator service - Add request/response types for async operations - Add agent evaluator version conversion logic

Implement methods to fetch async debug evaluator results across service layers. Added new request/response types and implemented the functionality in evaluator source services while maintaining backward compatibility for unsupported evaluator types.

Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.

…outes Remove unused async debug result endpoint and its related request/response structs Add missing api routes for evaluator record endpoints

remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase

The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.

Add new error code for agent evaluator run failures to handle configuration issues

Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221

Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b

Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3

Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890

Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49

…ed to current agent and haven't matched with AG-UI message protocol

Change-Id: I8fea5b657938b46d264349d057b87d733e59f431

…into feat/agent_evaluator Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5

Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4

Add getAllEvalSetFields method to fetch omitted content for evaluation set fields. This ensures complete field data is available for agent evaluator type.

- Implement agent evaluator version batch get functionality - Add agent evaluator input data building and async execution in turn evaluation - Introduce async run and debug endpoints for agent evaluators - Support built-in agent evaluators with auth bypass

… formatting - Remove unnecessary type casting in ConvertAgentConfigDO2DTO - Reformat function parameters for better readability in asyncCallEvaluator - Clean up struct field alignment in GetAsyncDebugEvaluatorInvokeResultResponse - Fix import reference and variable naming in eval_openapi_app.go

Ensure metric emission captures the correct error state by wrapping in anonymous functions. Also add test cases for error scenarios in async evaluator calls.

…or cases

Update test case to properly set InputFields and EvaluateTargetOutputFields with targetFields instead of empty maps to match expected behavior

- Add tests for evaluator version validation and input/output schemas - Implement test cases for evaluator agent delegation and setters - Add round-trip conversion tests for evaluator data structures - Include test coverage for evaluator HTTP info and run config - Add tests for evaluator async handlers and report functionality - Implement test cases for evaluator output data conversion - Add tests for evaluator content conversion and skill configs

Align struct fields and test case parameters consistently Remove trailing whitespace and ensure proper spacing

Use the new method in expt_run_item_turn_impl to determine async evaluation

…services

Add test cases for agent evaluator functionality including auth checks, creation, updating drafts, and debugging operations. Tests cover success and failure scenarios with config switch validations.

Add test cases to verify different scenarios for updating evaluator record results, including nil output data, nil evaluator result, score handling, and DAO error cases

HearyShen added 2 commits February 5, 2026 12:48

init cozeloop idl for agent evaluator

a37b94a

init cozeloop idl for agent evaluator

f60e53b

HearyShen changed the title ~~[evaluation] agent evaluator~~ [feat][evaluation] agent evaluator Feb 5, 2026

HearyShen and others added 26 commits February 5, 2026 14:40

init cozeloop idl for agent evaluator

5e52d26

init cozeloop idl for agent evaluator

6cd39ff

init cozeloop idl for agent evaluator

6f4abab

Merge branch 'main' into feat/agent_evaluator

7c8f305

EvaluatorExtraOutputContent require uri for reporting and url for sig…

1a484d0

…ned display

remove ext in AgentEvaluatorVersion DO

9b7ba91

Merge branch 'main' into feat/agent_evaluator

5f71dc3

update mockgen and codegen

a3acc66

feat(evaluator): add record creation for async evaluator runs

50be157

Add evaluator record creation in AsyncRunEvaluator and AsyncDebugEvaluator methods to track async evaluation runs. This provides better observability and audit trail for async operations.

refactor(evaluator): remove async debug result endpoint and add api r…

6cec211

…outes Remove unused async debug result endpoint and its related request/response structs Add missing api routes for evaluator record endpoints

refactor(evaluator): remove async debug result feature

e69bd7b

remove unused async debug result related code including handler, service interface, entity and mock implementations to clean up the codebase

refactor(evaluator): remove unused GetAsyncRunResult method

c0515ed

The method was not implemented and always returned an error, indicating it was not actually needed for the evaluator services. This cleanup removes dead code from the interface and implementations.

feat(evaluation): add agent evaluator run failed error code

8ae3d69

Add new error code for agent evaluator run failures to handle configuration issues

fix

8a09efd

Change-Id: Iab4a819e956ce2ba7521d381d5adcd671c7f5221

异步执行评估器

5b103ed

Change-Id: Ie837fdef6255afcd2ea292d372594e2bbc2b190b

fix

8bce90c

Change-Id: Ief52edc654e55c162b117307fae75423144e4ad3

fix

6878d5b

Change-Id: Icf87dbf107860aba8c5f24513835897239c8a890

fix

ec36a71

Change-Id: Ibfbe0d2ebcd63d93a19ee2a90ea9782e087b6e49

remove ReportEvaluatorInvokeProgress from idl considering its specifi…

0134ed8

…ed to current agent and haven't matched with AG-UI message protocol

remove ReportEvaluatorInvokeProgress from idl considering its specifi…

d77fee6

…ed to current agent and haven't matched with AG-UI message protocol

add (api.js_conv="true") to i64 skill_id

290ccad

fox

fb9ec2f

Change-Id: I8fea5b657938b46d264349d057b87d733e59f431

Merge branch 'feat/agent_evaluator' of github.com:coze-dev/coze-loop …

1115c55

…into feat/agent_evaluator Change-Id: Ibc6898290abb64cf78c0c999b42e6b03843446e5

fix

372fab0

Change-Id: I9f4673c1280102a9b8cefdf33b8ca592ea0290a4

HearyShen added 21 commits March 9, 2026 11:40

fix(evaluation): handle omitted content in evaluator input data

d47d9a6

Add getAllEvalSetFields method to fetch omitted content for evaluation set fields. This ensures complete field data is available for agent evaluator type.

chore: regenerate redis mock to include LRange

828a7ef

fix(evaluation): correct deferred metric emission in evaluator calls

9233f81

Ensure metric emission captures the correct error state by wrapping in anonymous functions. Also add test cases for error scenarios in async evaluator calls.

test(evaluation): add tests for async report, status handling and err…

96d2e89

…or cases

style: fix indentation and remove redundant comment in test files

beb5054

Merge branch 'main' into feat/agent_evaluator

e30b828

fix: correct input and output fields in evaluator test

5a12e25

Update test case to properly set InputFields and EvaluateTargetOutputFields with targetFields instead of empty maps to match expected behavior

style: fix indentation in test service initialization

4e1ddae

Merge branch 'main' into feat/agent_evaluator

9c1cbeb

style: fix indentation and alignment in test files

910673e

Align struct fields and test case parameters consistently Remove trailing whitespace and ensure proper spacing

test: remove trailing newline in evaluator version test

1288470

feat(evaluator): add IsAsync method to check evaluator type

4c24468

Use the new method in expt_run_item_turn_impl to determine async evaluation

feat(evaluation): add error handling and test coverage for evaluator …

944df3e

…services

style: fix formatting in evaluator test function

32c05fd

test(evaluator): add agent evaluator test cases

aa87655

Add test cases for agent evaluator functionality including auth checks, creation, updating drafts, and debugging operations. Tests cover success and failure scenarios with config switch validations.

test(evaluator): add tests for UpdateEvaluatorRecordResult method

e1a1644

Add test cases to verify different scenarios for updating evaluator record results, including nil output data, nil evaluator result, score handling, and DAO error cases

Merge branch 'main' into feat/agent_evaluator

6fd2ad7

Merge branch 'main' into feat/agent_evaluator

cc16866

dsf86 previously approved these changes Mar 12, 2026

View reviewed changes

HearyShen commented Mar 12, 2026

View reviewed changes

Comment thread backend/modules/evaluation/application/evaluator_app.go

minor refactor

fbc9641

HearyShen dismissed dsf86’s stale review via fbc9641 March 12, 2026 07:13

HymanShi approved these changes Mar 12, 2026

View reviewed changes

HearyShen commented Mar 12, 2026

View reviewed changes

Comment thread backend/modules/evaluation/infra/repo/evaluator/mysql/evaluator.go

dsf86 approved these changes Mar 12, 2026

View reviewed changes

HearyShen merged commit bb61e2f into main Mar 12, 2026
17 checks passed

HearyShen deleted the feat/agent_evaluator branch March 12, 2026 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][evaluation] agent evaluator#421

[feat][evaluation] agent evaluator#421
HearyShen merged 77 commits into
mainfrom
feat/agent_evaluator

HearyShen commented Feb 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HearyShen commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Check the PR title

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

(Optional) Which issue(s) this PR fixes

Uh oh!

codecov Bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HearyShen commented Feb 5, 2026 •

edited

Loading

codecov Bot commented Feb 5, 2026 •

edited

Loading