Skip to content

fix(eval): migrate authored expected output to vars#1657

Merged
christso merged 1 commit into
mainfrom
grading-expected-output-vars
Jul 5, 2026
Merged

fix(eval): migrate authored expected output to vars#1657
christso merged 1 commit into
mainfrom
grading-expected-output-vars

Conversation

@christso

@christso christso commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Authored Promptfoo-aligned YAML now treats expected_output as removed at the top level, in default_test, and in inline tests[] rows. Reference answers belong in vars.expected_output, where they are inert unless an authored assertion or grader explicitly consumes {{ expected_output }}.

The hard-deprecation codemod now migrates legacy authored references into vars.expected_output, preserving existing assertion/criteria strategies and adding a reference-matching llm-rubric only when the legacy case had no explicit grading strategy. Examples and local fixtures have been migrated to the supported shape, and the generated eval schema reflects that authored YAML expected_output is no longer accepted.

The TypeScript SDK materialization bridge keeps carrying internal expected_output payloads for now, so this PR stays within the authored YAML slice and leaves SDK/API compatibility cleanup to av-kfik.28.4.

Related: av-kfik.28.2

Validation

  • bun --filter @agentv/core test
  • bun --filter agentv test
  • bun test packages/core/test/evaluation/loaders/jsonl-parser.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts scripts/migrate-hard-deprecations.test.ts packages/core/test/evaluation/criteria-optional.test.ts packages/core/test/evaluation/suite-level-input.test.ts packages/core/test/evaluation/conversation-mode.test.ts packages/core/test/evaluation/validation/eval-schema-sync.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts
  • bun test packages/core/test/evaluation/eval-inline-experiment.test.ts
  • bun test apps/cli/test/commands/prepare/prepare.test.ts apps/cli/test/eval.integration.test.ts
  • bun test apps/cli/test/commands/runs/rerun.test.ts apps/cli/test/commands/grade/grade-prepared.test.ts
  • bun --filter @agentv/core build
  • bun --filter @agentv/sdk build
  • bun run generate:schema
  • bun run validate:examples
  • bun run lint
  • git diff --check

Live provider dogfood was not run because this slice changes authored YAML loading, validation, schema, codemod, and examples; it does not change provider execution, LLM grader execution, or run artifact layout.

Post-Deploy Monitoring & Validation

No additional production monitoring required. This is a local authoring/schema/codemod change with no deployed service path; CI validation and example validation are the release signals.


Compound Engineering
GPT-5

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 5, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: b318d2f
Status: ✅  Deploy successful!
Preview URL: https://535e62c2.agentv.pages.dev
Branch Preview URL: https://grading-expected-output-vars.agentv.pages.dev

View logs

@christso christso force-pushed the grading-expected-output-vars branch from a77661f to b89d5ae Compare July 5, 2026 03:39
@christso christso force-pushed the grading-expected-output-vars branch from b89d5ae to b318d2f Compare July 5, 2026 03:48
@christso christso merged commit 6f92049 into main Jul 5, 2026
8 checks passed
@christso christso deleted the grading-expected-output-vars branch July 5, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant