Skip to content

MaxCode: Add VerificationAgent and ValidationAgent for conversion quality scoring#25

Open
gvanica wants to merge 1 commit intomainfrom
split/3-verification-agents
Open

MaxCode: Add VerificationAgent and ValidationAgent for conversion quality scoring#25
gvanica wants to merge 1 commit intomainfrom
split/3-verification-agents

Conversation

@gvanica
Copy link
Copy Markdown
Collaborator

@gvanica gvanica commented Apr 22, 2026

Summary

Introduces two new agents that together produce a quality scorecard for PyTorch-to-JAX conversions, plus a standalone ADK tool and CLI demo script.

VerificationAgent (agents/migration/verification_agent.py)

Produces a scorecard with two independent metrics:

  • Completeness (AST-based, no LLM) — Parses both PyTorch source and JAX output ASTs, extracts classes, methods, and standalone functions by name, and computes the fraction of source components that appear in the output. This is deterministic, fast, and free.
  • Correctness (LLM-based, optional) — Delegates to ValidationAgent to detect semantic deviations, then scores them with weighted penalties by severity (critical/major/minor) and category (numeric, structural, API, etc.). Requires a Gemini API key.

Returns a VerificationResult dataclass with both scores and an overall weighted score (40% completeness + 60% correctness).

ValidationAgent (agents/migration/validation_agent.py)

An LLM-powered agent that compares PyTorch source with JAX output and identifies faithfulness deviations — places where the conversion changes behavior, defaults, structure, or semantics beyond what is expected from the JAX idiom translation. Each deviation is classified by category and severity.

Files

File Description
agents/migration/verification_agent.py New — VerificationAgent with AST completeness + LLM correctness (408 lines)
agents/migration/validation_agent.py New — ValidationAgent for faithfulness deviation detection (399 lines)
tools/verification_tool.py New — ADK FunctionTool wrapper exposing verification to agent framework (69 lines)
examples/demo/step5_verify.py New — CLI script that runs verification and prints a formatted scorecard (231 lines)

Design decisions

  • Two-metric approach — Completeness catches missing components (cheap, deterministic); correctness catches semantic errors (expensive, LLM-based). Users can run completeness-only without an API key.
  • Weighted scoring — Overall score weights correctness higher (60%) since a complete but semantically wrong conversion is worse than a slightly incomplete but faithful one.
  • Separation of concerns — ValidationAgent handles the LLM interaction and deviation extraction; VerificationAgent handles scoring and aggregation.

Test plan

  • Run step5_verify.py with only a source and output path (no API key) — should produce completeness score only
  • Run step5_verify.py with an API key — should produce both completeness and correctness scores
  • Verify that verification_tool.py returns valid JSON when called through ADK

Split from #17 — PR 3 of 8

Introduces two new agents for measuring PyTorch-to-JAX conversion quality:

- VerificationAgent: AST-based completeness scoring + LLM correctness scoring
- ValidationAgent: LLM-based faithfulness deviation detection

Also adds verification_tool.py (ADK tool wrapper) and step5_verify.py
(CLI demo script).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 22, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant