Skip to content

fix(eval): reject authored direct input#1646

Merged
christso merged 3 commits into
mainfrom
promptfoo-input-hard-deprecation
Jul 4, 2026
Merged

fix(eval): reject authored direct input#1646
christso merged 3 commits into
mainfrom
promptfoo-input-hard-deprecation

Conversation

@christso

@christso christso commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements Bead av-kfik.27 hard-deprecation for authored direct input in normal Promptfoo-aligned eval YAML.

  • Rejects authored top-level input and inline tests[].input at schema/validator/parser/runtime load boundaries with actionable guidance to use top-level prompts plus default_test.vars / tests[].vars.
  • Keeps raw-case/internal compatibility narrow: external raw-case imports through tests: file://... may still carry internal input; normal authored eval YAML cannot.
  • Updates focused tests, public eval docs, Promptfoo parity docs, eval-writer skill data, generated schema, and migration guidance to make prompts + vars the canonical authoring model.
  • Converts nearby legacy parser/input_files tests away from public authored direct input; input_files + input now rejects, while canonical prompt content file blocks with vars render successfully.

Promptfoo Source Evidence

Promptfoo local clone: /home/entity/projects/promptfoo/promptfoo
Evidence commit: 6bfc5a0c7f16f9c4717ac731d276b578e63d0769 (6bfc5a0 chore(deps): update modelaudit schema generator to v0.2.47 (#9635))

Claims verified from source:

  • src/types/index.ts:851-858: Promptfoo test cases represent unique prompt inputs after substituting vars; vars is the test-row data surface.
  • src/types/index.ts:1033-1049: suites own providers, prompts, tests, and defaultTest; prompt entries are top-level suite data.
  • src/evaluator.ts:2165-2173: Promptfoo merges defaultTest.vars, scenario/data vars, and test vars.
  • src/evaluator.ts:2318-2322: Promptfoo propagates default test prompts/providers onto test cases.

Verification

Passed locally:

  • bun test packages/core/test/evaluation/prompt-input-authoring.test.ts packages/core/test/evaluation/input-files-shorthand.test.ts packages/core/test/evaluation/yaml-parser-metadata.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts packages/core/test/evaluation/validation/eval-schema-sync.test.ts
  • bun run lint
  • bun run typecheck
  • bun --filter @agentv/core build
  • git diff --check

bun run validate:examples was run and currently fails because the repo still has broad existing authored tests[].input examples: 108 eval YAML files checked, 6 valid, 102 invalid. This is recorded on av-kfik.16 for the examples-authoring worker and on av-kfik.15 with exact codemod conversion requirements. This PR intentionally does not duplicate that broad examples/codemod sweep.

Live Dogfood

Passed with local OpenAI-compatible endpoint:

AGENTV_DOGFOOD_OPENAI_BASE_URL=http://127.0.0.1:10531/v1 \
AGENTV_DOGFOOD_OPENAI_API_KEY=local-proxy \
AGENTV_DOGFOOD_OPENAI_MODEL=gpt-5.4-mini \
AGENTV_NO_UPDATE_CHECK=1 \
bun apps/cli/src/cli.ts eval run prompt-vars.eval.yaml \
  --targets .agentv/targets.yaml \
  --output .agentv/results/prompt-vars-hard-deprecation \
  --grader-target local-grader \
  --threshold 0 \
  --no-cache

Result: PASS (1/1, mean 100%) using canonical .agentv/results/prompt-vars-hard-deprecation output.

Private evidence: EntityProcess/agentv-private branch evidence/av-kfik-27-input-hard-deprecation, commit 475b3d6.

Coordinator correction: the worker briefly pushed an agentv-private branch to the public EntityProcess/agentv remote while publishing evidence. That public branch was deleted with git push origin :agentv-private; do not use commit 19612e7c as evidence.

Compatibility Decision

Normal public authored eval YAML now rejects direct input everywhere this PR touches. External raw-case files loaded through tests: file://... remain the only tested internal compatibility path for input; they are deliberately kept out of canonical public docs and covered by prompt-input-authoring.test.ts.

Beads

  • av-kfik.27: implementation PR, left open for coordinator review/merge.
  • av-kfik.15: updated with exact remaining codemod conversion requirements for top-level input, inline tests input, message arrays, and input_files + input.
  • av-kfik.16: updated with the exact validate:examples blocker and examples migration handoff.

Current CI Status

As of coordinator review on 2026-07-04, this draft PR is not mergeable: GitHub Actions Test fails because full @agentv/core fixtures still contain legacy authored input surfaces (127 failures), and Validate Evals fails because repo examples still broadly use tests[].input. The follow-up migration work is recorded on av-kfik.15 and av-kfik.16. Keep this PR draft until those branches are sequenced and CI is green.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 4, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: b60671f
Status: ✅  Deploy successful!
Preview URL: https://70b42b69.agentv.pages.dev
Branch Preview URL: https://promptfoo-input-hard-depreca.agentv.pages.dev

View logs

@christso

christso commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Evidence correction verified: public EntityProcess/agentv no longer has an origin/agentv-private branch (git ls-remote --heads origin agentv-private returns empty), and the temporary local agentv-private worktree/branch at commit 19612e7 was removed. Correct evidence is in the separate EntityProcess/agentv-private repo on orphan branch evidence/av-kfik-27-input-hard-deprecation at commit 475b3d6d5f68496bf7c3377daafa22fb0136b96d; rev-list shows no parent and the branch contains source/prompt-vars.eval.yaml, source/targets.yaml, and run-bundle/.

@christso

christso commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Codemod migration for Beads av-kfik.15.1 / av-kfik.15 is now pushed directly to this PR branch at commit 1e5d130.

Validation evidence:

  • Local: bun run lint passed
  • Local: bun run typecheck passed
  • Local: bun run validate:examples passed 108/108
  • Local: bun run test passed (agentv 746 pass, dashboard 153 pass in captured tail)
  • README scan passed: no Promptfoo mentions and no public YAML/defineEval/evaluate tests[].input examples outside vars.input/task parameter usage
  • Remote CI on fix(eval): reject authored direct input #1646 is green: Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, and Cloudflare Pages all passed on run https://github.com/EntityProcess/agentv/actions/runs/28707200216

The earlier successor PR #1650 has been merged into this hard-deprecation branch; this PR is now the green-ready branch for av-kfik.27.

@christso

christso commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Coordinator verification update after merging #1652:

  • fix(eval): reject mixed criteria assertions #1652 merged into this branch and closed av-kfik.42.
  • CI on promptfoo-input-hard-deprecation is green at b60671ffdbe50ba5cf4823d229041f1684f4de62.
  • Local build in the integrated worktree passed (bun run build; existing dashboard chunk-size warning only).
  • Live dogfood passed with a real provider and real LLM grader using gpt-5.4-mini via the available Azure OpenAI secret.

Dogfood command shape:

bun apps/cli/src/cli.ts eval run examples/features/default-graders/evals/suite.yaml \
  --test-id greeting \
  --target azure \
  --grader-target azure \
  --workers 1 \
  --output .agentv/results/av-kfik-27-input-hard-deprecation-azure-openai-20260704T140222Z \
  --threshold 0.5 \
  --no-results-push

Result: PASS, 1/1, mean score 100%.

Credential note: standard local OPENAI_API_KEY was dummy, so the successful live run used Bitwarden secret azure-openai-chris-shared, which notes gpt-5.4 and gpt-5.4-mini; the process set AZURE_DEPLOYMENT_NAME=gpt-5.4-mini.

Private evidence branch: EntityProcess/agentv-private:evidence/av-kfik-27-input-hard-deprecation-20260704
Evidence commit: 2ca68df

@christso christso marked this pull request as ready for review July 4, 2026 14:04
@christso christso merged commit 7741f9d into main Jul 4, 2026
8 checks passed
@christso christso deleted the promptfoo-input-hard-deprecation branch July 4, 2026 14:05
@christso

christso commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Correction: local OpenAI-compatible endpoint dogfood was rerun and is now the primary evidence.

Local endpoint check:

curl http://127.0.0.1:10531/v1/models
# includes gpt-5.4-mini

Dogfood command shape:

LOCAL_OPENAI_PROXY_BASE_URL=http://127.0.0.1:10531/v1 \
LOCAL_OPENAI_PROXY_API_KEY=local \
LOCAL_OPENAI_PROXY_MODEL=gpt-5.4-mini \
bun apps/cli/src/cli.ts eval run examples/features/default-graders/evals/suite.yaml \
  --test-id greeting \
  --target local-openai \
  --grader-target local-openai-grader \
  --workers 1 \
  --output .agentv/results/av-kfik-27-input-hard-deprecation-local-openai-20260704T184248Z \
  --threshold 0.5 \
  --no-results-push

Result: PASS, 1/1, mean score 100%.

Private evidence branch: EntityProcess/agentv-private:evidence/av-kfik-27-input-hard-deprecation-20260704
Evidence commit: 4f476f0

The previous Azure OpenAI run remains in the evidence branch as secondary fallback evidence; the local OpenAI-compatible run is the correct primary dogfood for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant