Skip to content

adk-2.0 producer: minimum plan to unblock mid-June customer cutover #293

Description

@caohy1988

Revision history

Spawned from the customer-needs comment on #190: #190 (comment)
Scope: producer side only (the BQ AA Plugin in google/adk-python, src/google/adk/plugins/bigquery_agent_analytics_plugin.py). Consumer SDK views in #190 are not on this critical path — once the producer writes the rows, the customer can query them directly via SQL while the typed views land separately.

Why this issue exists

#190 is the long-form ADK 2.0 tracking issue (28 sub-issues, producer + consumer). The customer pinned a comment listing the specific fields they need before they take ADK 2.0 to production mid-June:

  • event.long_running_tool_ids
  • event.node_info
  • actions.compaction
  • actions.transfer_to_agent
  • actions.end_of_agent / agent_state
  • actions.route / UI / rewind
  • Workflow boundaries

This issue is a minimum, scoped subset of #190 that maps 1:1 to that list, ordered so the customer can ship without waiting for the full plan (workflow-node-boundary derivation, OTel correlation, optional adapters, consumer views, pause-orphan registry semantics — all deferred).

Producer state today

Already emitted (no work): INVOCATION_STARTING/COMPLETED, AGENT_STARTING/RESPONSE/COMPLETED, USER_MESSAGE_RECEIVED, LLM_REQUEST/RESPONSE/ERROR, TOOL_STARTING/COMPLETED/ERROR, STATE_DELTA, HITL_CREDENTIAL_REQUEST/HITL_CREDENTIAL_REQUEST_COMPLETED, HITL_CONFIRMATION_REQUEST/HITL_CONFIRMATION_REQUEST_COMPLETED, HITL_INPUT_REQUEST/HITL_INPUT_REQUEST_COMPLETED.

Missing for the customer ask: every item below.

Minimum plan

Group A — envelope every enriched row carries (blocks everything else)

A1 and A2 stamp on every ADK-enriched row regardless of whether the row originates from an Event. A3 only stamps on rows with a real originating Event (or a future typed telemetry context that carries an equivalent identity). The producer must not synthesize a source_event_id for callback-only rows; consumers can detect Event-originating rows via WHERE JSON_VALUE(attributes, '$.adk.source_event_id') IS NOT NULL. Note: any Event constructed without an id (e.g. _create_agent_state_event in agents/base_agent.py:213-218) is auto-stabilized by Event.model_post_init (events/event.py:271-275), which sets self.id = Event.new_id() before the plugin sees it — no additional producer-side stabilization needed.

  • A1. attributes.adk.schema_version — constant on every ADK-enriched row. Single source of truth for "this row was produced by an ADK 2.0-aware plugin build".
  • A2. attributes.adk.app_name — from InvocationContext.app_name, on every ADK-enriched row. Required for the pause registry composite key and for the general cross-event identity rule.
  • A3. attributes.adk.source_event_id — only on rows that originate from an Event (or a typed telemetry context carrying an equivalent identity, per B0). Leave null on rows produced by callbacks that don't receive an Event from the framework. Reliable join key against ADK OTel associated_event_ids. Never fabricate.

Group B — prerequisite plumbing (blocks A3 + C1/C2/C3 for non-on_event_callback paths)

Group C — customer-prioritized event/action capture

Each maps to one bullet in the customer's comment. Acceptance for each = a fixture turn that exercises the path produces the expected BigQuery row(s) on rows the B0 matrix marks as Event-originating.

Group D — cleanup

  • D1. Delete on_state_change_callback (deprecated stub at bigquery_agent_analytics_plugin.py:3131, never called by ADK 2.0). Tiny but removes a misleading surface area for the customer's team when they read the plugin code.

Explicitly deferred (post-cutover)

Each carries a one-line reason so the customer's team understands why their ask is not blocked:

Deferred Reason
WORKFLOW_NODE_STARTING/COMPLETED event types Design-blocked in #190 / #207 (OTel-span vs event-observation). Workflow boundaries are partially observable today via attributes.adk.node (C1) on Event-originating rows, which is enough for the customer's first production query needs. The dedicated boundary events come after #207's decision lands.
OTel attributes.adk.otel_span_id (#205) Best-effort only; consumer can join via A3 source_event_id ↔ ADK's span-side associated_event_ids in the meantime.
Oversized-state GCS offload for AGENT_STATE_CHECKPOINT Inline payloads cover all but very large state. File as follow-up if the customer's actual checkpoint sizes hit the limit.
Pause registry pause_orphan semantics + read-after-write visibility (#206) Blocks the orphan-flag contract, not the row-pair join keys. The customer can compute long-running tool durations from direct TOOL_PAUSEDTOOL_COMPLETED SQL joins during the mid-June window; orphan correctness lands when #206's strategy is chosen and the registry is added.
Optional bqaa_adk.py SDK adapter Not customer-blocking.
Consumer SDK view registration (#211, views.py) for the new event types Not customer-blocking — they can read base-table JSON until the typed views land. Tracked in #190.
Historical pre-2.0 table-row null-safety Concerns rows already in the table that were written by an older producer. Belongs to consumer view work (#190 / #211), not this producer-only issue.

Dependency graph

B0 (#194)  ───┬──►  A3, C1, C2, C3, C4–C7
              │     (only on B0-feasible callbacks)
A1, A2       ─┴──►  every ADK-enriched row (no Event dep)

C4  AGENT_TRANSFER          ◄── actions.transfer_to_agent
                                (from_agent=event.author)
C5  EVENT_COMPACTION        ◄── actions.compaction
                                (preserve float precision)
C6  AGENT_STATE_CHECKPOINT  ◄── actions.agent_state | end_of_agent
C7  TOOL_PAUSED + non-HITL TOOL_COMPLETED enrich
                            ◄── event.long_running_tool_ids
                            NOT blocked by #206 (orphan semantics deferred)
C8  attributes.adk.{route, render_ui_widgets, rewind_before_invocation_id}
                            ◄── actions.route / widgets / rewind
                            (covered by #203, same flat-with-prefix shape)

D1 — cleanup, no deps

Acceptance criteria

A representative ADK 2.0 invocation that includes: agent transfer + event compaction + agent-state checkpoint (both {agent_state: null, end_of_agent: true} and {agent_state: {...}, end_of_agent: false} shapes) + a long-running pause that spans invocations + a actions.route or rewind_before_invocation_id action → produces BigQuery rows where, from the customer's SQL only:

  1. Every ADK-enriched row exposes attributes.adk.schema_version and attributes.adk.app_name. Event-originating rows additionally expose attributes.adk.source_event_id, attributes.adk.node, attributes.adk.branch, attributes.adk.scope per the B0 coverage matrix; callback-only rows leave these JSON null without synthesis.
  2. AGENT_TRANSFER, EVENT_COMPACTION, AGENT_STATE_CHECKPOINT rows are joinable by (JSON_VALUE(attributes, '$.adk.app_name'), user_id, session_id, invocation_id).
  3. The customer can pair TOOL_PAUSED ↔ long-running TOOL_COMPLETED on (JSON_VALUE(attributes, '$.adk.app_name'), user_id, session_id, JSON_VALUE(attributes, '$.adk.function_call_id')) filtered to JSON_VALUE(attributes, '$.adk.pause_kind') = 'tool' on both rows, and compute long-running tool duration without the SDK's typed view and without the pause registry. Orphan rows are tolerated for the mid-June window; orphan correctness lands with adk-2.0 design: pause registry read-after-write strategy #206. HITL completion durations are computed separately from the existing HITL_*_REQUESTHITL_*_REQUEST_COMPLETED stream.
  4. actions.route / render_ui_widgets / rewind_before_invocation_id are surfaced as attributes.adk.route / attributes.adk.render_ui_widgets / attributes.adk.rewind_before_invocation_id (flat-with-prefix, matching adk-2.0 producer: capture or explicitly defer actions.route / render_ui_widgets / rewind_before_invocation_id #203) on the originating Event row.
  5. Producer-side tests stay green; new fixtures cover each emit path.
  6. on_state_change_callback is gone from the public surface.

Test plan (producer-only)

  • Per-emit-path fixture for C4–C7 + C8.
  • Envelope smoke (A1, A2) asserted on every row produced by an existing fixture.
  • Envelope smoke (A3) asserted only on rows the B0 coverage matrix marks Event-originating; non-Event-originating callback rows asserted to leave A3 / C1 / C2 / C3 JSON null (not sentinel-stringed).
  • Empty-path fixture: an Event with NodeInfo.path == "" must emit attributes.adk.node.path = "" (or a documented null form) and must not synthesize a workflow node or parent_path.
  • AGENT_STATE_CHECKPOINT id-stabilization smoke: an _create_agent_state_event-shaped Event (constructed without id=) must arrive at the plugin with a non-empty id, and the resulting BigQuery row must carry that id in attributes.adk.source_event_id. (This is a regression guard for the Pydantic model_post_init auto-assign path; it does not require new producer logic.)
  • node / branch / scope enrichment fixtures matching the shape-coverage list in Tracking: ADK 2.0 — workflow- and team-aware tracing across producer and consumer #190 v5 (node-run scope, function-call scope, unscoped, model-provided FC IDs).
  • C7 HITL non-routing assertion: a long-running pause whose function_call name is adk_request_confirmation (etc.) produces TOOL_PAUSED with pause_kind = 'hitl_confirmation', and the matching function response produces a HITL_CONFIRMATION_REQUEST_COMPLETED row — not a TOOL_COMPLETED row.
  • C7 pair-key assertion (no orphan logic): a non-HITL long-running pause + completion produces a TOOL_PAUSED and a TOOL_COMPLETED row with matching function_call_id and pause_kind = 'tool' on both. No pause_orphan assertion — that field's correctness defers to adk-2.0 design: pause registry read-after-write strategy #206.
  • C5 fractional-timestamp fixture: an actions.compaction with sub-second start_timestamp / end_timestamp must round-trip through the producer's JSON serialization with fractional precision intact.
  • No new consumer SDK tests required — that work is tracked under Tracking: ADK 2.0 — workflow- and team-aware tracing across producer and consumer #190 / adk-2.0 consumer: register six new event types across all SDK type surfaces #211 et al.

Effort + ordering recommendation

Roughly in order of "what unlocks the next thing":

  1. A1, A2 — single-day-each; no deps; lands on every row immediately.
  2. B0 (adk-2.0 producer: thread source Event / node-info into EventData for non-on_event callbacks #194) — biggest single change; gates A3 and most of Group C.
  3. A3, C1, C2, C3 — fast once B0 lands; lots of overlap (same Event-enrichment path).
  4. C4, C5, C6 (inline), C8 — independent emit paths; can parallelize.
  5. C7 — mid-sized once narrowed to pair-key emit only (no registry); ships independently of adk-2.0 design: pause registry read-after-write strategy #206.
  6. D1 — opportunistic cleanup, can ship in any PR.

If staffed seriously, A1–A3 + C1–C5 + C7 (pair-key) should be reachable by mid-June. C6 (inline) and C8 are small additions to the same PRs. Workflow node boundaries and pause-orphan semantics explicitly stay deferred.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions