fix(sdk): align script grader results by christso · Pull Request #1659 · EntityProcess/agentv

christso · 2026-07-05T03:43:07Z

Summary

SDK and script-grader authors can now return the finalized pass, score, reason, and optional checks[] vocabulary directly. The SDK builders, Zod schemas, Vitest/workspace adapters, Python helper example, generated assertion scaffolds, and script-grader docs all present that vocabulary as the public surface.

Core script-grader parsing now accepts the finalized JSON protocol, derives aggregate score/pass from checks when needed, and carries reason/checks through the internal evaluator result. It still bridges checks into the current internal assertion_results shape so the artifact writer can be replaced separately by av-kfik.28.6.

Publish-surface considerations: this intentionally updates the experimental @agentv/sdk result surface instead of documenting stale assertions[]/passed aliases as supported public API. The deprecated CodeGraderResult type name remains as an alias to the new result schema, but the wire/public shape is the finalized vocabulary.

Related: av-kfik.28.4

Validation

bun run build
bun test packages/sdk/test/define-script-grader.test.ts packages/sdk/test/workspace-grader.test.ts packages/sdk/test/vitest-workspace-grader.test.ts
bun test packages/core/test/evaluation/graders/script-grader-plain-text.test.ts packages/core/test/evaluation/script-grader-file-backed.test.ts packages/core/test/evaluation/script-grader-multimodal.test.ts
bun test packages/core/test/evaluation/graders.test.ts packages/core/test/evaluation/execution-metrics.test.ts
uv run pytest in examples/features/sdk-python
bun run lint
git diff --check
Smoke: bun --env-file=.env apps/cli/src/cli.ts eval run examples/features/script-grader-sdk/evals/suite.yaml --target local_cli exercised the SDK script grader path; script grader per-grader score was 1.0 with all returned checks passing. The overall eval scored 50% because the separate LLM rubric target did not resolve from the example-local target config.
Attempted live rerun with a temporary combined targets file and --grader-target openai; the LLM grader reached provider execution but failed after retries with pi-ai call failed: Connection error. Script-grader API dogfood is covered; live LLM rubric dogfood is blocked by provider connectivity in this worktree.

cloudflare-workers-and-pages · 2026-07-05T03:43:44Z

Deploying agentv with Cloudflare Pages

Latest commit:	`e8ddf96`
Status:	✅ Deploy successful!
Preview URL:	https://2886af14.agentv.pages.dev
Branch Preview URL:	https://grading-sdk-script-api.agentv.pages.dev

View logs

fix(sdk): align script grader results

e8ddf96

christso force-pushed the grading-sdk-script-api branch from 1b2cf44 to e8ddf96 Compare July 5, 2026 04:08

christso marked this pull request as ready for review July 5, 2026 04:10

christso merged commit 5001fa6 into main Jul 5, 2026
8 checks passed

christso deleted the grading-sdk-script-api branch July 5, 2026 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sdk): align script grader results#1659

fix(sdk): align script grader results#1659
christso merged 1 commit into
mainfrom
grading-sdk-script-api

christso commented Jul 5, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 5, 2026

Summary

Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jul 5, 2026 •

edited

Loading