Standardize evaluator output schema + skipped status by Copilot · Pull Request #5044 · Azure/azureml-assets

Copilot · 2026-05-15T06:06:22Z

Replicates all changes from PR #5043 (Skipped_Status_and_Standardize_Output_v2) onto a clean branch off main. The original PR is a draft; this branch carries the identical diff so it can be reviewed and merged independently.

Changes (57 files, +2111/−954)

Evaluator output schema standardization

Code evaluators (bleu, f1, gleu, meteor, rouge): added logging, Union typing, and uniform result/reason/label output fields
Prompty evaluators (coherence, fluency, groundedness, relevance, retrieval, similarity, and agent-eval promptys): updated output schemas in .prompty files to return structured label/score/reason fields consistently
Complex evaluators (intent_resolution, relevance, response_completeness, task_adherence, task_completion, tool_call_accuracy, tool_call_success, tool_input_accuracy, tool_output_utilization, tool_selection): standardized internal result handling to use label field; added skipped status propagation throughout

Spec updates

document_retrieval/spec.yaml, rouge_score/spec.yaml: aligned output schema declarations with new field names

Test infrastructure

base_evaluator_runner, base_prompty_evaluator_runner, base_quality_evaluator_runner, base_code_evaluator_runner: updated to assert on new output shape (label, reason, passed)
evaluator_mock_config: updated mock return values to match new schema
Behavior and quality test files: removed assertions on deprecated fields; added coverage for skipped status and new quality scenarios (tool_call_accuracy, tool_input_accuracy, tool_output_utilization, tool_selection)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh gh auth status (http block)
- Triggering command: /usr/bin/gh gh auth status bash --no�� (http block)
- Triggering command: /usr/bin/gh gh auth status sOverlay,MediaRoHEAD --no�� (http block)
https://api.github.com/repos/Azure/azureml-assets
- Triggering command: /usr/bin/curl curl -v -H Authorization: token ****** REDACTED (http block)
- Triggering command: /usr/bin/curl curl -s -o - -H Authorization: token ****** REDACTED ACCEPT (http block)
- Triggering command: /usr/bin/gh gh api -H Authorization: token ****** repos/Azure/azureml-assets --hostname github.com ndor/bin/git (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/bleu_score/evaluator/_bleu.py
- Triggering command: /usr/bin/python3 python3 (http block)
- Triggering command: /usr/bin/gh gh api repos/Azure/azureml-assets/contents/assets/evaluators/builtin/bleu_score/evaluator/_bleu.py?ref=dae266aa788814ab488304d77bc8c5aaaaaafce9 --jq .sha (http block)
- Triggering command: /usr/bin/gh gh api repos/Azure/azureml-assets/contents/assets/evaluators/builtin/bleu_score/evaluator/_bleu.py?ref=dae266aa788814ab488304d77bc8c5aaaaaafce9 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/coherence/evaluator/coherence.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/document_retrieval/evaluator/_document_retrieval.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/document_retrieval/spec.yaml
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/f1_score/evaluator/_f1_score.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/fluency/evaluator/fluency.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/gleu_score/evaluator/_gleu.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/groundedness/evaluator/_groundedness.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/groundedness/evaluator/groundedness_with_query.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/groundedness/evaluator/groundedness_without_query.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/intent_resolution/evaluator/_intent_resolution.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/intent_resolution/evaluator/intent_resolution.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/meteor_score/evaluator/_meteor.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/relevance/evaluator/_relevance.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/relevance/evaluator/relevance.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/response_completeness/evaluator/_response_completeness.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/retrieval/evaluator/retrieval.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/rouge_score/evaluator/_rouge.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/rouge_score/spec.yaml
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/similarity/evaluator/similarity.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/task_adherence/evaluator/_task_adherence.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/task_adherence/evaluator/task_adherence.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/task_completion/evaluator/_task_completion.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/task_navigation_efficiency/evaluator/_task_navigation_efficiency.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_call_accuracy/evaluator/_tool_call_accuracy.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_call_accuracy/evaluator/tool_call_accuracy.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_call_success/evaluator/_tool_call_success.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_call_success/evaluator/tool_call_success.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_input_accuracy/evaluator/_tool_input_accuracy.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_input_accuracy/evaluator/tool_input_accuracy.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_output_utilization/evaluator/_tool_output_utilization.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_output_utilization/evaluator/tool_output_utilization.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_selection/evaluator/_tool_selection.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/builtin/tool_selection/evaluator/tool_selection.prompty
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/common/base_code_evaluator_REDACTED.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/common/base_evaluator_REDACTED.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/common/base_prompty_evaluator_REDACTED.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/common/base_quality_evaluator_REDACTED.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/common/evaluator_mock_config.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/base_evaluator_behavior_test.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/base_tool_calls_evaluator_behavior_test.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/base_tools_evaluator_behavior_test.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_customer_satisfaction_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_deflection_rate_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_document_retrieval_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_relevance_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_rouge_score_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_similarity_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_task_adherence_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_task_completion_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_task_navigation_efficiency_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_tool_call_accuracy_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_behavior/test_tool_input_accuracy_evaluator_behavior.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_quality/test_tool_call_accuracy_evaluator_quality.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_quality/test_tool_input_accuracy_evaluator_quality.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_quality/test_tool_output_utilization_evaluator_quality.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/contents/assets/evaluators/tests/test_evaluators_quality/test_tool_selection_evaluator_quality.py
- Triggering command: /usr/bin/python3 python3 (http block)
https://api.github.com/repos/Azure/azureml-assets/git/ref/heads/main
- Triggering command: /opt/copilot-runtime/copilot-developer-action-main/github-mcp-server/github-mcp-server /opt/copilot-runtime/copilot-developer-action-main/github-mcp-server/github-mcp-server stdio (http block)
https://api.github.com/repos/Azure/azureml-assets/git/ref/heads/mohessie/standardize_output_schema
- Triggering command: /usr/bin/curl curl -s -o /dev/null -w %{http_code} -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED git diff�� /test_evaluators_behavior/test_task_adherence_evaluator_behavior.py HEAD t _tool_call_accurgit (http block)
https://api.github.com/repos/Azure/azureml-assets/git/refs
- Triggering command: /usr/bin/curl curl -s -X POST -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"ref": "refs/heads/mohessie/standardize_output_schema", "sha": "1f8e9829cfe04c634c084f691603f2a031682a68"} git diff�� /test_evaluators_behavior/test_task_adherence_evaluator_behavior.py HEAD rgo/bin/git _tool_call_accurgit (http block)
- Triggering command: /usr/bin/curl curl -s -X POST -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"ref": "refs/heads/mohessie/standardize_output_schema", "sha": "1f8e9829cfe04c634c084f691603f2a031682a68"} git show�� /test_evaluators_behavior/test_task_navigation_efficiency_evaluator_behavior.py bash p/bin/git ool_call_successgit (http block)
- Triggering command: `/usr/bin/python3 python3 -c
  import urllib.request, json, os

token = os.environ['GITHUB_COPILOT_API_TOKEN']
commit_sha = '1f8e9829cfe04c634c084f691603f2a031682a68'

Try creating the branch

data = json.dumps({'ref': 'refs/heads/mohessie/standardize_output_schema', 'sha': commit_sh git show�� evaluators/builtOUTPUT head t _selection.prompbash` (http block)

https://api.github.com/repos/Azure/azureml-assets/git/trees

Triggering command: /usr/bin/python3 python3 (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled

Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Task

Replicate the exact code changes from PR #5043 onto a brand-new branch named mohessie/standardize_output_schema, branched directly off main (NOT off the existing PR branch Skipped_Status_and_Standardize_Output_v2). After replicating the changes, open a new pull request from mohessie/standardize_output_schema into main.

Source of truth

Source PR: Skipped status and standardize output v2 #5043
Source branch (head): Skipped_Status_and_Standardize_Output_v2
Source PR title: "Skipped status and standardize output v2"
Diff stats: 57 changed files, +2,111 additions, −954 deletions, 5 commits
Base of source PR: main (commit 3bf16f0b8c63fb9949c77202cb94887e8a273ff9)
Head SHA of source PR: dae266aa788814ab488304d77bc8c5aaaaaafce9

Required steps

Create a new branch named mohessie/standardize_output_schema directly off the latest main. Do NOT branch off Skipped_Status_and_Standardize_Output_v2. The new branch must be an independent branch off main.
Replicate every file change that PR Skipped status and standardize output v2 #5043 introduces relative to main. This includes:
- All added files (copy them verbatim from the head of Skipped_Status_and_Standardize_Output_v2).
- All modified files (copy them verbatim from the head of Skipped_Status_and_Standardize_Output_v2).
- All deleted files (delete them on the new branch as well).
- All renamed files (apply the rename — i.e., delete the old path and create the new path with the new content).
You can enumerate the exact set of changed files via GET /repos/Azure/azureml-assets/pulls/5043/files (paginate through all pages, since there are 57 files). For each file, retrieve its final content at SHA dae266aa788814ab488304d77bc8c5aaaaaafce9 and apply the change on the new branch.
Do not introduce any other changes. Do not reformat, do not run linters, do not change line endings, do not modify file modes. The resulting tree of mohessie/standardize_output_schema should produce the exact same diff vs main as PR Skipped status and standardize output v2 #5043 produces vs main (modulo any new commits that have landed on main since PR Skipped status and standardize output v2 #5043 was opened — in case of conflict with newer main, prefer the PR Skipped status and standardize output v2 #5043 content for the files it touches).
Open a new pull request from mohessie/standardize_output_schema → main with:
- Title: Skipped status and standardize output v2
- Body: A short note explaining that this PR replicates the contents of Skipped status and standardize output v2 #5043 onto a fresh branch off main, and links to the original PR (Skipped status and standardize output v2 #5043).
Verification step (critical): After pushing the branch and opening the PR, fetch the file list of both PRs and confirm that:
- The set of changed file paths is identical.
- For every file, the resulting content on mohessie/standardize_output_schema matches the content on Skipped_Status_and_Standardize_Output_v2 (at SHA dae266aa788814ab488304d77bc8c5aaaaaafce9) byte-for-byte.
- The additions/deletions counts in the new PR are consistent with PR Skipped status and standardize output v2 #5043 (allowing only for differences caused by newer commits on main touching files NOT in PR Skipped status and standardize output v2 #5043's changed set).
  Report any discrepancies in the PR description and fix them before finishing.

Constraints

Do not modify the original branch Skipped_Status_and_Standardize_Output_v2 or the original PR Skipped status and standardize output v2 #5043.
Do not merge anything.
Keep the new PR as a regular (non-draft) PR unless conflicts require otherwise.
The branch name must be exactly mohessie/standardize_output_schema.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: No no I want you to replicate the code, not branch out of the branch.

Assistant: I want to make sure I do this correctly. To "replicate the code" from PR #5043 without branching off the existing Skipped_Status_and_Standardize_Output_v2 branch, I need a few details:

Target branch name — what should the new branch be called? (e.g., Skipped_Status_and_Standardize_Output_v3)
Base branch — should the new branch be created from main (the same base as PR Skipped status and standardize output v2 #5043)?
Scope — PR Skipped status and standardize output v2 #5043 has 57 changed files (+2,111 / −954). Do you want me to replicate all of them onto the new branch, or only a subset?
Open a PR? — should I open a new pull request from the new branch into main after pushing the files?

Just to confirm my understanding: you want me to take the diff/contents from branch Skipped_Status_and_Standardize_Output_v2 and apply those same file changes onto a brand-new branch created directly off main (not off the existing PR branch), so it's an independent branch with the same code. Is that correct?

User: 1. mohessie/standardize_output_schema
2. yes
3. all
4. yes
Yes exactly, and after doing the PR, revisit the diffs and make sure it's the exact same as the PR

Assistant: I've handed the task off to the coding agent — waiting on your response to the confirmation dialog to proceed ...

This pull request was created from Copilot chat.

Replicate all changes from PR #5043 (#5043) to standardize the output schema across 57 evaluator files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/94e674f5-0aa4-4bd4-806f-5f8ddf8127ec Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

github-actions · 2026-05-15T07:16:17Z

Test Results for assets-test

1 499 tests 1 499 ✅ 59s ⏱️
23 suites 0 💤
23 files 0 ❌

Results for commit 355c0f0.

♻️ This comment has been updated with latest results.

Initial plan

a6edfc3

Copilot AI assigned Copilot and m7md7sien May 15, 2026

Copilot started work on behalf of m7md7sien May 15, 2026 06:06 View session

Copilot AI and others added 2 commits May 15, 2026 06:43

Standardize output schema for evaluators

1f8e982

Replicate all changes from PR #5043 (#5043) to standardize the output schema across 57 evaluator files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Push mohessie/standardize_output_schema branch to remote

7bc0d31

Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/94e674f5-0aa4-4bd4-806f-5f8ddf8127ec Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Replicate changes from PR #5043 for output schema~~ Replicate PR #5043 onto fresh branch: standardize evaluator output schema + skipped status May 15, 2026

Copilot AI requested a review from m7md7sien May 15, 2026 07:00

Copilot finished work on behalf of m7md7sien May 15, 2026 07:00

m7md7sien approved these changes May 15, 2026

View reviewed changes

m7md7sien changed the title ~~Replicate PR #5043 onto fresh branch: standardize evaluator output schema + skipped status~~ Standardize evaluator output schema + skipped status May 15, 2026

Finalize standardization migration

355c0f0

m7md7sien marked this pull request as ready for review May 15, 2026 07:14

m7md7sien requested review from a team as code owners May 15, 2026 07:14

m7md7sien temporarily deployed to Testing May 15, 2026 07:15 — with GitHub Actions Inactive

m7md7sien temporarily deployed to Testing May 15, 2026 07:16 — with GitHub Actions Inactive

m7md7sien approved these changes May 15, 2026

View reviewed changes

m7md7sien temporarily deployed to Testing May 15, 2026 07:19 — with GitHub Actions Inactive

m7md7sien closed this May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize evaluator output schema + skipped status#5044

Standardize evaluator output schema + skipped status#5044
Copilot wants to merge 4 commits into
mainfrom
copilot/mohessiestandardize-output-schema

Copilot AI commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes (57 files, +2111/−954)

Evaluator output schema standardization

Spec updates

Test infrastructure

I tried to connect to the following addresses, but was blocked by firewall rules:

Try creating the branch

Task

Source of truth

Required steps

Constraints

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results for assets-test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading