fix(eval): wire App.plugins / context-cache / resumability through adk eval#5534
Open
saifer82 wants to merge 5 commits intogoogle:mainfrom
Open
fix(eval): wire App.plugins / context-cache / resumability through adk eval#5534saifer82 wants to merge 5 commits intogoogle:mainfrom
saifer82 wants to merge 5 commits intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Eval flows currently access `agent_module.agent.root_agent` directly, which drops the wrapping `App` (and therefore its plugins, context-cache config, and resumability config). Add `get_app_or_root_agent` that returns the `(app, root_agent)` pair, mirroring the resolution order `AgentLoader._load_from_module_or_package` already uses on the web / run paths. Keep `get_root_agent` as a back-compat wrapper. This commit is the resolver and unit tests only; subsequent commits plumb the App through `EvaluationGenerator` and `LocalEvalService` so plugins fire during eval runs.
`_generate_inferences_from_root_agent` now accepts an optional `app` parameter. When provided, the eval Runner is built from a copy of the App with internal eval plugins (`_RequestIntercepterPlugin`, `EnsureRetryOptionsPlugin`) merged into `app.plugins`. The user's App is never mutated, and the App's `context_cache_config` / `resumability_config` ride along automatically. When `app` is None, the legacy bare-agent path is preserved. `_process_query` (used by the public `generate_responses` entry point) now resolves `agent.app` first and forwards it to the helper, so projects that wrap their root agent in an `App` get plugin coverage during eval without further changes. The CLI plumbing that hands the App down from `cli_eval` / `LocalEvalService` is in the next commit.
Closes the loop on https://github.com/google/adk-python/issues/<TBD>: when a project wraps its root agent in `App(root_agent=..., plugins=[...])` and runs `adk eval`, the registered plugins (e.g., `BigQueryAgentAnalyticsPlugin`) now fire on every invocation just like they do for `adk web` / `adk run`. Same applies to `App.context_cache_config` and `App.resumability_config`, which now ride along automatically. Changes: * `LocalEvalService.__init__` accepts an optional `app` keyword argument and forwards it to `_generate_inferences_from_root_agent` for each eval case. * `cli_tools_click.cli_eval` resolves the `App` via `get_app_or_root_agent` and passes it to `LocalEvalService`. * `cli_optimize` (GEPA prompt optimization) also routes through `LocalEvalService` but currently constructs it inside `LocalEvalSampler` with no `app` argument; bringing the optimize path under App-plugin coverage is a separate, narrower follow-up and is intentionally not included here.
The eval CLI now resolves agents via `get_app_or_root_agent`. Update the shared `mock_get_root_agent` fixture in test_cli_tools_click.py to patch the new resolver and yield `(None, root_agent)`, matching the non-App path the eval-set-id tests exercise.
4ca7875 to
456cd98
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to Issue or Description of Change
cli_evalbypassesApp.plugins, breaking observability plugins (e.g.,BigQueryAgentAnalyticsPlugin) during eval runs #5503The maintainer confirmed reproduction and asked for a PR:
Problem:
cli_evalresolves agents viaagent_module.agent.root_agent, which drops the wrappingAppand therefore itsplugins,context_cache_config, andresumability_config. As a result, when a project wraps its root agent inApp(root_agent=..., plugins=[...]), plugin lifecycle hooks (on_event_callback, etc.) fire duringadk web/adk runbut are silently skipped duringadk eval. Observability plugins likeBigQueryAgentAnalyticsPluginproduce no telemetry rows for eval runs — exactly the workload where per-case latency / token / trajectory data is most useful.Solution:
Resolve the
App(when present) at the eval CLI entrypoint and plumb it throughLocalEvalServicetoEvaluationGenerator._generate_inferences_from_root_agent, where the evalRunneris built. When an App is in play, the Runner is constructed from a copy of the App with the two internal eval plugins (_RequestIntercepterPlugin,EnsureRetryOptionsPlugin) merged intoapp.plugins. The user's App instance is never mutated. When no App is present the legacy bare-agent path is preserved.This also incidentally fixes the parallel gaps with
App.context_cache_configandApp.resumability_config, which were dropped by the same bypass.The four commits are sequenced for review readability:
fix(cli_eval): add get_app_or_root_agent resolver— new helper + back-compat shim forget_root_agent.fix(evaluation): forward App through to the eval Runner—_generate_inferences_from_root_agentacceptsapp=and merges plugins;_process_queryresolves the App for the publicgenerate_responsesentry point.fix(eval): plumb App through LocalEvalService to fix App.plugins bypass—LocalEvalService.__init__acceptsapp=;cli_tools_click.cli_evaluses the new resolver and passesappthrough.test(cli_tools_click): mock get_app_or_root_agent in eval CLI tests— fixture update for the renamed resolver.Testing Plan
Unit Tests:
New tests (10 cases across 3 files):
tests/unittests/cli/utils/test_cli_eval.py— 4 tests coveringget_app_or_root_agent: App present, App absent,appattribute exists but is not an App instance (falls back), andget_root_agentback-compat.tests/unittests/evaluation/test_evaluation_generator.py— 4 tests covering_generate_inferences_from_root_agentwithapp=: Runner built withapp=(merged plugins), legacy fallback whenapp=None, user's App not mutated across repeated runs, androot_agentoverride propagates to merged App copy (sub-agent eval scenario).tests/unittests/evaluation/test_local_eval_service.py— 2 tests assertingLocalEvalServiceforwardsapp(orNone) through to_generate_inferences_from_root_agent.Manual End-to-End (E2E) Tests:
Reproduction setup matches the issue: an agent wrapped in
App(...)withBigQueryAgentAnalyticsPluginregistered, evaluating a single case viaadk eval.INVOCATION_STARTINGrowsLLM_REQUEST/RESPONSErowsTOOL_STARTING/COMPLETEDrowsConcretely, after running
adk eval ./app routing_and_tools.evalset.json:route_sales_total_enagainst this PR:The plugin captures the full lifecycle (root + sub-agent) and the
Batch writer task cancelledlog line confirms its teardown ran inside the eval Runner.Checklist
Additional context
Scope deliberately excluded:
cli_optimize(GEPA prompt optimization) — also routes throughLocalEvalServicebut constructs it insideLocalEvalSamplerwith noappargument. Bringing the optimize path under App-plugin coverage is a small follow-up: threadappintoLocalEvalSampler.__init__and pass it on toLocalEvalService(...). Happy to do it in a separate PR.adk eval generate(generate_eval_cases) — switched to the new resolver for consistency only. It usesScenarioGenerator, not aRunner, so plugins don't apply there.AgentLoader— out of scope.cli_evaldoesn't useAgentLoadertoday; aligning the two loaders would be a larger refactor and not what this issue asks for.Open question for reviewers:
The issue raised the possibility of an opt-in flag (
adk eval --use-app-plugins) in case the bypass was intentional. This PR makes App-plugins-on-eval the default behavior, on the assumption that a plugin contract of "fires on every event" is what users expect. Happy to gate it behind a flag if you'd prefer the conservative default.🤖 Generated with Claude Code