Skip to content

Commit a5fa3da

Browse files
haiyuan-eng-googleGWeale
authored andcommitted
feat: BigQuery Agent Analytics reliability fixes
Three related reliability/observability fixes to the BigQuery Agent Analytics plugin. 1. Dropped-event observability. BigQuery logging is best-effort: events are dropped when the in-memory queue overflows or a write ultimately fails, and only a log line records the loss. Track dropped rows in BatchProcessor by reason (queue_full, arrow_prep_failed, retry_exhausted, non_retryable, unexpected_error), include the running total in each drop log line, and expose the counts via BatchProcessor.get_drop_stats()/dropped_event_count and an aggregating BigQueryAgentAnalyticsPlugin.get_drop_stats() so a host can poll them and export to its own monitoring. 2. Cross-region Storage Write API routing. The AppendRows streaming RPC does not auto-populate the request-routing header, so writes to a dataset outside the US multiregion could fail with a "session not found" / stream-not-found error and silently drop every row. Set x-goog-request-params: write_stream=<stream> on the append_rows call so the request reaches the region that owns the write stream. US-multiregion behavior is unchanged. 3. Stop exporting plugin-owned OTel spans. When Agent Engine telemetry is enabled (GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true) with Cloud Trace export on the global tracer provider, the plugin's ID-carrier spans were exported alongside the framework's real spans, producing a duplicate span for every instrumented operation. The plugin now tracks span_id / trace_id on its own contextvar stack without creating OTel spans; trace_id is inherited from the ambient span, so BigQuery rows still join to Cloud Trace by trace_id and the LLM/tool span_id-sharing contracts are preserved. All paths covered by unit tests. Change-Id: Ia7b73d816b14c574ef856a4c88c57243f6f38f7f
1 parent 77aeadf commit a5fa3da

2 files changed

Lines changed: 574 additions & 159 deletions

File tree

0 commit comments

Comments
 (0)