fix(eval): migrate authored expected output to vars#1657
Merged
Conversation
Deploying agentv with
|
| Latest commit: |
b318d2f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://535e62c2.agentv.pages.dev |
| Branch Preview URL: | https://grading-expected-output-vars.agentv.pages.dev |
a77661f to
b89d5ae
Compare
b89d5ae to
b318d2f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Authored Promptfoo-aligned YAML now treats
expected_outputas removed at the top level, indefault_test, and in inlinetests[]rows. Reference answers belong invars.expected_output, where they are inert unless an authored assertion or grader explicitly consumes{{ expected_output }}.The hard-deprecation codemod now migrates legacy authored references into
vars.expected_output, preserving existing assertion/criteria strategies and adding a reference-matchingllm-rubriconly when the legacy case had no explicit grading strategy. Examples and local fixtures have been migrated to the supported shape, and the generated eval schema reflects that authored YAMLexpected_outputis no longer accepted.The TypeScript SDK materialization bridge keeps carrying internal
expected_outputpayloads for now, so this PR stays within the authored YAML slice and leaves SDK/API compatibility cleanup to av-kfik.28.4.Related: av-kfik.28.2
Validation
bun --filter @agentv/core testbun --filter agentv testbun test packages/core/test/evaluation/loaders/jsonl-parser.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts scripts/migrate-hard-deprecations.test.ts packages/core/test/evaluation/criteria-optional.test.ts packages/core/test/evaluation/suite-level-input.test.ts packages/core/test/evaluation/conversation-mode.test.ts packages/core/test/evaluation/validation/eval-schema-sync.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.tsbun test packages/core/test/evaluation/eval-inline-experiment.test.tsbun test apps/cli/test/commands/prepare/prepare.test.ts apps/cli/test/eval.integration.test.tsbun test apps/cli/test/commands/runs/rerun.test.ts apps/cli/test/commands/grade/grade-prepared.test.tsbun --filter @agentv/core buildbun --filter @agentv/sdk buildbun run generate:schemabun run validate:examplesbun run lintgit diff --checkLive provider dogfood was not run because this slice changes authored YAML loading, validation, schema, codemod, and examples; it does not change provider execution, LLM grader execution, or run artifact layout.
Post-Deploy Monitoring & Validation
No additional production monitoring required. This is a local authoring/schema/codemod change with no deployed service path; CI validation and example validation are the release signals.