Skip to content

Add MLflow evaluation scorer documentation for ADK evaluators #5123

@debu-sinha

Description

@debu-sinha

🔴 Required Information

Is your feature request related to a specific problem?

ADK's evaluation criteria (ToolTrajectory, ResponseMatch, Hallucinations, etc.) are powerful but users who also use MLflow for experiment tracking have no way to run ADK evaluators through mlflow.genai.evaluate(). MLflow already has ADK tracing integration but no evaluation scorer integration.

Describe the Solution You'd Like

Add a documentation page to ADK docs showing how to use ADK evaluators as MLflow scorers. An MLflow integration has been submitted as PR mlflow/mlflow#22299 that wraps ADK's TrajectoryEvaluator and RougeEvaluator as MLflow third-party scorers.

Example usage:

from mlflow.genai.scorers.google_adk import ToolTrajectory, ResponseMatch

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        ToolTrajectory(match_type="EXACT", threshold=0.5),
        ResponseMatch(threshold=0.6),
    ],
)

A docs page under docs/evaluate/ or docs/integrations/ showing this integration would help ADK users who track experiments with MLflow.

Impact on your work

This enables ADK users to evaluate agents through MLflow's unified evaluation pipeline, combining ADK's deterministic evaluators with MLflow's experiment tracking, tracing, and comparison tools.

Willingness to contribute

Yes. Happy to submit a docs PR if the team approves the direction.


🟡 Recommended Information

Describe Alternatives You've Considered

Users can manually create ADK Invocation objects and run evaluators outside MLflow, but this breaks the unified mlflow.genai.evaluate() workflow and loses integration with MLflow's experiment tracking.

Additional Context

Metadata

Metadata

Labels

documentation[Component] This issue is related to documentation, it will be transferred to adk-docsneeds review[Status] The PR/issue is awaiting review from the maintainer

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions