PRD: Unified Analytics Interface for BigQuery Agent Analytics SDK

Status: Draft Issue: #22 Author: SDK Product Team Date: 2026-03-06

1. Validation & Rationale

1.1 Is This a Necessary Update?

Yes. The SDK today is a Python-library-only toolkit with 16+ analytical capabilities (trace retrieval, evaluation, drift detection, insights, context graphs, etc.), but every one of them requires a user to write Python code and import the library. This creates three gaps:

Gap	Who is Affected	Why It Matters
SQL-native analytics	Data analysts, BI engineers, Looker/Data Studio users	Cannot run `SELECT analyze(session_id) FROM traces` — must leave BigQuery to run Python
Agent self-diagnostics	Autonomous AI agents (ADK, LangChain, CrewAI)	Agents cannot inspect their own performance without generating complex SQL; CLIs are the natural LLM tool interface
Automation & CI/CD	Platform engineers, SRE teams	No scriptable CLI for cron-based eval runs, alerting pipelines, or `git bisect`-style regression checks

1.2 Current State

┌──────────────────────────────────────────────────┐
│           BigQuery Agent Analytics SDK           │
│                                                  │
│  Client.get_trace()      Client.evaluate()       │
│  Client.insights()       Client.drift_detection()│
│  Client.doctor()         Client.deep_analysis()  │
│  Client.hitl_metrics()   Client.context_graph()  │
│  ViewManager             BigQueryTraceEvaluator   │
│  TrialRunner             GraderPipeline           │
│  EvalSuite               EvalValidator            │
│  BigQueryMemoryService   BigQueryAIClient         │
│                                                  │
│  ACCESS: Python import ONLY                      │
└──────────────────────────────────────────────────┘

1.3 Proposed State

┌───────────────────────────────────────────────────────────────────────────────┐
│                          Shared Core (Python)                                │
│     Client, evaluators, insights, feedback, trace, context_graph             │
├───────────────┬────────────────────┬────────────────────┬────────────────────┤
│ Python Lib    │ BQ Remote Function │ Continuous Query   │ CLI               │
│ (existing)    │ (Path A — Batch)   │ (Path A' — Stream) │ (Path B — Agent)  │
│               │                    │                    │                   │
│ import Client │ SELECT fn(...)     │ APPENDS() +        │ $ bq-agent-sdk    │
│ Notebooks     │ Scheduled queries  │ AI.GENERATE_TEXT   │   evaluate ...    │
│ Python apps   │ Looker, dashboards │ → BQ / Pub/Sub /   │   insights ...    │
│               │                    │   Bigtable / Spanner│                   │
└───────────────┴────────────────────┴────────────────────┴────────────────────┘

1.4 MVP Scope

v1.0 = CLI with evaluate, get-trace, doctor, and --exit-code.

This is the smallest cut that unblocks all three personas. Remote Function (Path A) and Continuous Query (Path A') ship in v1.1/v1.2. See §8.1 for the full MVP definition, feature-to-version mapping, and rationale.

2. User Personas

Persona A: Priya — Data Analyst (Remote Function Path)

Works in BigQuery console and Looker daily
Comfortable with SQL, not with Python notebooks
Needs to build dashboards showing agent quality metrics
Wants to run evaluation and insights directly inside scheduled SQL queries

Persona B: AgentX — Autonomous AI Agent (CLI Path)

An ADK-based agent deployed in production
Has tool-calling capability (can invoke shell commands)
Needs to check its own latency, error rates, and drift before responding
Must minimize token overhead — CLI commands are cheaper than SQL generation

Persona C: Marcus — Platform Engineer (CLI Path)

Manages agent fleet in CI/CD pipelines
Needs nightly eval runs with pass/fail gates
Wants bq-agent-sdk evaluate ... --exit-code in GitHub Actions
Pipes output to monitoring systems (Datadog, PagerDuty)

3. Path A: BigQuery Remote Function Interface

3.1 Overview

Deploy the SDK's analytical logic as a Google Cloud Function (or Cloud Run service), register it as a BigQuery Remote Function, and let SQL users call SDK features directly from SELECT statements.

3.2 Supported Operations

All operations go through a single multiplexed function: agent_analytics(operation STRING, params JSON) RETURNS JSON

Operation	SDK Method	Params (JSON keys)	Output
`analyze`	`Client.get_session_trace()` + metrics	`session_id`	JSON with span count, error count, latency, tool calls
`evaluate`	`SystemEvaluator`	`session_id`, `metric`, `threshold`	JSON with passed, score, details
`judge`	`LLMAsJudge`	`session_id`, `criterion`	JSON with score, feedback
`insights`	Facet extraction	`session_id`	JSON with intent, outcome, friction
`drift`	Drift detection	`golden_dataset`, `agent_filter`, `start_date`, `end_date`	JSON with coverage, gaps

3.3 Critical User Journeys (CUJ)

CUJ-A1: Priya Builds an Agent Quality Dashboard

Goal: Create a Looker dashboard showing per-session quality scores updated nightly.

Journey:

Step 1: Platform team deploys SDK as Cloud Function
        $ gcloud functions deploy bq-agent-analytics \
            --runtime python312 \
            --entry-point handle_request \
            --source ./deploy/remote_function/

Step 2: Platform team registers a single multiplexed Remote Function
        CREATE FUNCTION `project.analytics.agent_analytics`(
          operation STRING,
          params JSON
        ) RETURNS JSON
        REMOTE WITH CONNECTION `project.us.analytics-conn`
        OPTIONS (
          endpoint = 'https://us-central1-project.cloudfunctions.net/bq-agent-analytics'
        );

Step 3: Priya writes a scheduled query (no Python needed)
        -- Nightly materialization of agent quality scores
        CREATE OR REPLACE TABLE `project.analytics.daily_quality` AS
        WITH recent AS (
          SELECT DISTINCT session_id, MIN(timestamp) AS timestamp,
                 ANY_VALUE(agent) AS agent
          FROM `project.analytics.agent_events`
          WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
          GROUP BY session_id
        )
        SELECT
          session_id,
          timestamp,
          agent,
          JSON_VALUE(result, '$.error_count') AS error_count,
          CAST(JSON_VALUE(result, '$.avg_latency_ms') AS FLOAT64) AS avg_latency_ms,
          JSON_VALUE(result, '$.tool_call_count') AS tool_calls
        FROM recent,
        UNNEST([
          `project.analytics.agent_analytics`(
            'analyze',
            JSON_OBJECT('session_id', session_id)
          )
        ]) AS result;

Step 4: Priya connects Looker to `daily_quality` table
        → Dashboard shows latency trends, error rates, tool usage by agent

End-to-End Example — Batch Evaluation via SQL:

-- Evaluate all sessions from last 24h for latency compliance
WITH recent_sessions AS (
  SELECT DISTINCT session_id
  FROM `myproject.analytics.agent_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
)
SELECT
  s.session_id,
  JSON_VALUE(result, '$.passed') AS passed,
  JSON_VALUE(result, '$.score') AS latency_score,
  JSON_VALUE(result, '$.details') AS details
FROM recent_sessions s,
UNNEST([
  `myproject.analytics.agent_analytics`(
    'evaluate',
    JSON_OBJECT(
      'session_id', s.session_id,
      'metric', 'latency',
      'threshold', 5000
    )
  )
]) AS result;

Result:

session_id	passed	latency_score	details
sess-001	true	0.85	avg_latency_ms=2340, max=4200
sess-002	false	0.32	avg_latency_ms=7800, max=12400
sess-003	true	0.91	avg_latency_ms=1850, max=3100

CUJ-A2: Priya Runs LLM-as-Judge at Scale

Goal: Score all sessions for correctness using AI, directly in SQL.

-- Judge correctness of every session from the "support_bot" agent
SELECT
  session_id,
  CAST(JSON_VALUE(judgment, '$.score') AS FLOAT64) AS correctness_score,
  JSON_VALUE(judgment, '$.passed') AS passed,
  JSON_VALUE(judgment, '$.feedback') AS llm_feedback
FROM (
  SELECT DISTINCT session_id
  FROM `myproject.analytics.agent_events`
  WHERE agent = 'support_bot'
    AND timestamp >= '2026-03-01'
) sessions,
UNNEST([
  `myproject.analytics.agent_analytics`(
    'judge',
    JSON_OBJECT(
      'session_id', sessions.session_id,
      'criterion', 'correctness'
    )
  )
]) AS judgment
WHERE CAST(JSON_VALUE(judgment, '$.score') AS FLOAT64) < 0.7
ORDER BY correctness_score ASC;

Result: Surfaces the lowest-quality sessions for human review — no Python required.

CUJ-A3: Priya Creates a Drift Alert

Goal: Scheduled query that alerts when production questions drift from golden set.

-- Weekly drift check: compare production vs golden questions
SELECT
  JSON_VALUE(drift_result, '$.coverage_percentage') AS coverage_pct,
  JSON_VALUE(drift_result, '$.total_golden') AS golden_count,
  JSON_VALUE(drift_result, '$.total_production') AS prod_count,
  JSON_QUERY(drift_result, '$.uncovered_questions') AS gaps
FROM UNNEST([
  `myproject.analytics.agent_analytics`(
    'drift',
    JSON_OBJECT(
      'golden_dataset', 'myproject.analytics.golden_questions',
      'agent_filter', 'support_bot',
      'start_date', '2026-03-01',
      'end_date', '2026-03-06'
    )
  )
]) AS drift_result;

3.4 Remote Function Technical Design

Reference: BigQuery Remote Functions

Prerequisites

Requirement	Detail
Connection	A `CLOUD_RESOURCE` connection (`bq mk --connection --connection_type=CLOUD_RESOURCE`)
IAM — Creator	`bigquery.routines.create` on dataset + `bigquery.connections.delegate` on connection
IAM — Invoker	`bigquery.routines.get` on dataset + `bigquery.connections.use` on connection
Service account	The connection's auto-created SA needs Cloud Run Invoker role on the Cloud Function / Cloud Run service
Return types	`BOOL`, `BYTES`, `NUMERIC`, `STRING`, `DATE`, `DATETIME`, `TIME`, `TIMESTAMP`, `JSON`. Not supported: `ARRAY`, `STRUCT`, `INTERVAL`, `GEOGRAPHY`

Deployment Architecture

┌─────────────────┐     ┌──────────────────────┐     ┌─────────────┐
│   BigQuery      │────▶│  Cloud Function /     │────▶│  BigQuery   │
│   SELECT fn()   │     │  Cloud Run            │     │  (queries)  │
│                 │◀────│  + SDK Core            │◀────│             │
│   Returns JSON  │     │  handle_request()      │     │             │
└─────────────────┘     └──────────────────────┘     └─────────────┘

Request/Response Contract

BigQuery sends batched rows as HTTP POST requests. Each element in calls represents one row's arguments. BigQuery automatically determines batch size but respects the max_batching_rows limit. Retries happen on HTTP 408, 429, 500, 503, 504. Results are never cached (assumed non-deterministic).

Request (POST from BigQuery):

{
  "requestId": "124ab1c",
  "caller": "//bigquery.googleapis.com/projects/myproject/jobs/bqjob_r1234_00001",
  "sessionUser": "analyst@company.com",
  "userDefinedContext": {
    "project_id": "myproject",
    "dataset_id": "analytics"
  },
  "calls": [
    ["evaluate", "{\"session_id\":\"sess-001\",\"metric\":\"latency\",\"threshold\":5000}"],
    ["evaluate", "{\"session_id\":\"sess-002\",\"metric\":\"latency\",\"threshold\":5000}"]
  ]
}

Response (HTTP 200):

{
  "replies": [
    "{\"passed\": true, \"score\": 0.85, \"details\": \"avg=2340ms\"}",
    "{\"passed\": false, \"score\": 0.32, \"details\": \"avg=7800ms\"}"
  ]
}

Error response (non-retryable 4xx):

{
  "errorMessage": "Unknown operation: foo (max 1KB)"
}

API Versioning & Stability Contract

The remote function response includes a _version field for forward compatibility. Clients should ignore unknown fields.

{
  "replies": [
    "{\"_version\":\"1.0\",\"passed\":true,\"score\":0.85,\"details\":\"avg=2340ms\"}"
  ]
}

Version guarantees:

v1.x: Additive changes only (new fields, new operations). Existing fields and operations are never removed or renamed.
v2.0: Breaking changes require a new Cloud Function endpoint and new CREATE FUNCTION registration.

Error Codes

Every per-row error in replies uses a structured error object instead of a result:

{
  "replies": [
    "{\"passed\":true,\"score\":0.85}",
    "{\"_error\":{\"code\":\"SESSION_NOT_FOUND\",\"message\":\"No events for session sess-999\"}}"
  ]
}

Error Code	HTTP	Retryable	Description
`INVALID_OPERATION`	400	No	Unknown operation string
`INVALID_PARAMS`	400	No	Missing or malformed params JSON
`SESSION_NOT_FOUND`	200*	No	No events found for session_id
`EVALUATION_FAILED`	200*	No	Evaluator raised an exception
`UPSTREAM_TIMEOUT`	200*	Yes	BigQuery query timed out
`INTERNAL_ERROR`	200*	Yes	Unexpected SDK error

*Per-row errors return HTTP 200 with error in replies[i] so that other rows in the batch succeed. Batch-level errors (all rows fail) return HTTP 400 with top-level errorMessage.

Partial Failure Semantics

BigQuery batches multiple rows into a single HTTP request. The function processes each row independently:

If row 2 of 5 fails, rows 1, 3, 4, 5 return valid results. Row 2 returns a _error object in its replies slot.
BigQuery surfaces the _error JSON as the column value — the caller can filter with JSON_VALUE(result, '$._error.code') IS NOT NULL.
Only if all rows fail does the function return HTTP 400.

Idempotency Guidance

BigQuery may retry requests on transient errors (HTTP 408/429/5xx). The function should be safe to call multiple times with the same input:

analyze and evaluate are naturally idempotent (read-only queries).
judge calls AI.GENERATE which may return slightly different scores — this is acceptable (non-deterministic by nature).
The requestId field can be used for deduplication if the function performs any write operations in the future.

CREATE FUNCTION DDL

-- 1. Create CLOUD_RESOURCE connection (one-time)
--    Console: Explorer → +Add → External Connection → "Vertex AI remote models,
--             remote functions, BigLake and Spanner"
--    CLI:
--    bq mk --connection --location=US --project_id=myproject \
--           --connection_type=CLOUD_RESOURCE analytics-conn

-- 2. Grant Cloud Run Invoker to the connection's service account
--    (find the SA in connection details → IAM → add role)

-- 3. Register the remote function
CREATE FUNCTION `myproject.analytics.agent_analytics`(
  operation STRING,
  params JSON
) RETURNS JSON
REMOTE WITH CONNECTION `myproject.us.analytics-conn`
OPTIONS (
  endpoint = 'https://us-central1-myproject.cloudfunctions.net/bq-agent-analytics',
  user_defined_context = [
    ("project_id", "myproject"),
    ("dataset_id", "analytics")
  ],
  max_batching_rows = 50
);

Entry Point (`deploy/remote_function/main.py`)

A single multiplexed Cloud Function handles all operations. BigQuery sends (operation STRING, params JSON) tuples; the function dispatches to the appropriate SDK method and returns JSON results.

import functions_framework
import json
import os
from flask import jsonify
from bigquery_agent_analytics import Client, SystemEvaluator, LLMAsJudge, TraceFilter

# Initialized once per cold start. Config comes from userDefinedContext
# (forwarded by BigQuery) or environment variables as fallback.
_client = None

def _get_client(context: dict) -> Client:
    global _client
    if _client is None:
        _client = Client(
            project_id=context.get("project_id", os.environ["PROJECT_ID"]),
            dataset_id=context.get("dataset_id", os.environ["DATASET_ID"]),
        )
    return _client

@functions_framework.http
def handle_request(request):
    """Entry point called by BigQuery Remote Function framework.

    BigQuery batches rows into `calls`. Each call is [operation, params_json].
    The `replies` array must have the same length as `calls`.
    """
    try:
        body = request.get_json()
        context = body.get("userDefinedContext", {})
        client = _get_client(context)
        replies = []
        for call in body["calls"]:
            operation, params_raw = call[0], call[1]
            params = json.loads(params_raw) if isinstance(params_raw, str) else params_raw
            result = _dispatch(client, operation, params)
            replies.append(json.dumps(result))
        return jsonify({"replies": replies})
    except Exception as e:
        return jsonify({"errorMessage": str(e)[:1024]}), 400

def _dispatch(client, operation, params):
    if operation == "analyze":
        trace = client.get_session_trace(params["session_id"])
        return {
            "span_count": len(trace.spans),
            "error_count": len(trace.error_spans),
            "avg_latency_ms": trace.total_latency_ms,
            "tool_call_count": len(trace.tool_calls),
            "final_response": trace.final_response,
        }
    elif operation == "evaluate":
        evaluator = SystemEvaluator.latency(threshold_ms=params["threshold"])
        report = client.evaluate(evaluator=evaluator,
            filters=TraceFilter(session_ids=[params["session_id"]]))
        return report.details[0] if report.details else {}
    elif operation == "judge":
        judge = getattr(LLMAsJudge, params["criterion"])()
        report = client.evaluate(evaluator=judge,
            filters=TraceFilter(session_ids=[params["session_id"]]))
        return report.details[0] if report.details else {}
    elif operation == "drift":
        report = client.drift_detection(
            golden_dataset=params["golden_dataset"],
            filters=TraceFilter(
                agent_id=params.get("agent_filter"),
                start_time=params.get("start_date"),
                end_time=params.get("end_date"),
            ))
        return report.model_dump()
    else:
        raise ValueError(f"Unknown operation: {operation}")

Deploy:

gcloud functions deploy bq-agent-analytics \
    --gen2 \
    --runtime python312 \
    --region us-central1 \
    --entry-point handle_request \
    --source ./deploy/remote_function/ \
    --trigger-http \
    --no-allow-unauthenticated \
    --set-env-vars PROJECT_ID=myproject,DATASET_ID=analytics

3A. Path A-bis: Continuous Queries for Real-Time Analytics

3A.1 Overview

Reference: BigQuery Continuous Queries, Create Continuous Queries

BigQuery continuous queries run SQL continuously against new data as it arrives, using the APPENDS() table function. They process each row as it is ingested and write results to BigQuery tables, Pub/Sub, Bigtable, or Spanner — enabling real-time, event-driven analytics over agent traces.

Key Capabilities and Constraints

Aspect	Detail
Trigger	Automatically fires on new rows via `APPENDS(TABLE ..., start_timestamp)`
Destinations	`INSERT INTO` (BigQuery table), `EXPORT DATA` (Pub/Sub, Bigtable, Spanner)
AI functions	`AI.GENERATE_TEXT` and `ML.GENERATE_TEXT` — supported for remote models (Gemini, etc.). `AI.GENERATE_TABLE` — not supported in continuous queries. `ML.UNDERSTAND_TEXT`, `ML.TRANSLATE` — supported. See supported AI functions in CQ.
Remote functions	Not supported — continuous queries cannot call user-defined remote functions (`CREATE FUNCTION ... REMOTE`) because remote functions are UDF routines, and CQ disallows all UDFs
SQL restrictions	No `JOIN`, `GROUP BY`, `DISTINCT`, aggregates, window functions, `ORDER BY`, `LIMIT`
Execution	`bq query --continuous=true` or API `"continuous": true` — not a DDL statement
Reservation	Requires Enterprise / Enterprise Plus edition with `CONTINUOUS` job type assignment (max 500 slots)
Max runtime	2 days (user account), 150 days (service account)
Processing model	Stateless per-row; no cross-row state. Idle queries consume ~1 slot

Why This Matters for the SDK

The ADK plugin writes events to the agent_events table via the BigQuery Storage Write API. A continuous query can monitor this table in real-time and apply AI.GENERATE_TEXT — the same LLM evaluation engine the SDK uses — to score, classify, or flag every session as events arrive. No Cloud Function deployment needed; pure SQL.

This creates a third analytics path:

Path A:  Remote Function   → batch SQL analytics via Cloud Function
Path A': Continuous Query   → real-time streaming analytics via AI.GENERATE_TEXT
Path B:  CLI               → agent self-diagnostics and CI/CD

3A.2 Critical User Journeys (CUJ)

CUJ-A4: Priya Builds Real-Time Error Alerting with Continuous Query + AI.GENERATE_TEXT

Goal: Every time an agent session ends with an error, automatically classify the failure root cause using Gemini and push an alert to Pub/Sub (which routes to Slack/PagerDuty). No Cloud Function, no cron — pure streaming SQL.

Architecture:

┌──────────────┐    ┌─────────────────────────┐    ┌────────────────┐
│  ADK Plugin  │───▶│  agent_events table      │───▶│  Continuous    │
│  (writes     │    │  (BigQuery)              │    │  Query + AI.   │
│   events)    │    │                          │    │  GENERATE_TEXT │
└──────────────┘    └─────────────────────────┘    └──────┬─────────┘
                                                          │
                                          ┌───────────────┼───────────────┐
                                          ▼               ▼               ▼
                                   ┌────────────┐ ┌────────────┐ ┌──────────────┐
                                   │ error_      │ │ Pub/Sub    │ │ Bigtable     │
                                   │ analysis    │ │ (→ Slack)  │ │ (low-latency │
                                   │ table       │ │            │ │  dashboard)  │
                                   └────────────┘ └────────────┘ └──────────────┘

Step 1: Create the AI model endpoint for evaluation

-- Create a remote model pointing to Gemini (one-time setup)
CREATE OR REPLACE MODEL `myproject.analytics.gemini_flash`
REMOTE WITH CONNECTION `myproject.us.analytics-conn`
OPTIONS (
  endpoint = 'gemini-2.5-flash'
);

Step 2: Create destination table for analysis results

CREATE TABLE IF NOT EXISTS `myproject.analytics.realtime_error_analysis` (
  session_id STRING,
  agent STRING,
  event_type STRING,
  error_message STRING,
  timestamp TIMESTAMP,
  root_cause STRING,
  severity STRING,
  suggested_fix STRING,
  analyzed_at TIMESTAMP
);

Step 3: Launch continuous query (real-time error analysis)

bq query --project_id=myproject --use_legacy_sql=false \
  --continuous=true \
  --connection_property=service_account=analytics-cq@myproject.iam.gserviceaccount.com \
  '
INSERT INTO `myproject.analytics.realtime_error_analysis`
SELECT
  base.session_id,
  base.agent,
  base.event_type,
  base.error_message,
  base.timestamp,
  JSON_VALUE(analysis.ml_generate_text_llm_result, "$.root_cause") AS root_cause,
  JSON_VALUE(analysis.ml_generate_text_llm_result, "$.severity") AS severity,
  JSON_VALUE(analysis.ml_generate_text_llm_result, "$.suggested_fix") AS suggested_fix,
  CURRENT_TIMESTAMP() AS analyzed_at
FROM
  AI.GENERATE_TEXT(
    MODEL `myproject.analytics.gemini_flash`,
    (
      SELECT
        session_id,
        agent,
        event_type,
        error_message,
        timestamp,
        CONCAT(
          "Analyze this agent error and return JSON with keys: ",
          "root_cause (one of: tool_timeout, invalid_input, api_failure, ",
          "model_error, permission_denied, rate_limit, unknown), ",
          "severity (critical, high, medium, low), ",
          "suggested_fix (one sentence). ",
          "Agent: ", COALESCE(agent, "unknown"),
          " | Event: ", event_type,
          " | Error: ", COALESCE(error_message, "no message")
        ) AS prompt
      FROM
        APPENDS(
          TABLE `myproject.analytics.agent_events`,
          CURRENT_TIMESTAMP() - INTERVAL 10 MINUTE
        )
      WHERE
        ENDS_WITH(event_type, "_ERROR")
        OR error_message IS NOT NULL
        OR status = "ERROR"
    ),
    STRUCT(100 AS max_output_tokens, 0.1 AS temperature)
  ) AS analysis
'

What happens at runtime:

14:00:01  Plugin writes TOOL_ERROR for sess-042
14:00:02  Continuous query detects new error row via APPENDS()
14:00:03  AI.GENERATE_TEXT classifies: root_cause=tool_timeout, severity=high
14:00:03  Result inserted into realtime_error_analysis table
          → Looker dashboard updates in real-time
          → (Optional) Second continuous query exports to Pub/Sub → Slack

Step 4 (Optional): Chain a second continuous query to Pub/Sub for alerts

bq query --project_id=myproject --use_legacy_sql=false \
  --continuous=true \
  --connection_property=service_account=analytics-cq@myproject.iam.gserviceaccount.com \
  '
EXPORT DATA
  OPTIONS (
    format = "CLOUD_PUBSUB",
    uri = "https://pubsub.googleapis.com/projects/myproject/topics/agent-error-alerts"
  )
AS (
  SELECT
    TO_JSON_STRING(
      STRUCT(
        session_id,
        agent,
        root_cause,
        severity,
        error_message,
        suggested_fix
      )
    ) AS message,
    TO_JSON(
      STRUCT(
        severity AS severity,
        agent AS agent
      )
    ) AS _ATTRIBUTES
  FROM
    APPENDS(
      TABLE `myproject.analytics.realtime_error_analysis`,
      CURRENT_TIMESTAMP() - INTERVAL 10 MINUTE
    )
  WHERE severity IN ("critical", "high")
)
'

End-to-end result: Every high-severity agent error is automatically analyzed by Gemini within seconds of occurrence, stored in a queryable table, and routed to Slack/PagerDuty via Pub/Sub — all with zero application code.

CUJ-A5: Priya Streams Session Quality Scores to Bigtable for Low-Latency Dashboards

Goal: Score every completed agent session in real-time and write results to Bigtable for sub-millisecond dashboard reads.

bq query --project_id=myproject --use_legacy_sql=false \
  --continuous=true \
  --connection_property=service_account=analytics-cq@myproject.iam.gserviceaccount.com \
  '
EXPORT DATA
  OPTIONS (
    format = "CLOUD_BIGTABLE",
    overwrite = TRUE,
    uri = "https://bigtable.googleapis.com/projects/myproject/instances/agent-metrics/tables/session-scores"
  )
AS (
  SELECT
    CONCAT(base.session_id, "#", CAST(base.timestamp AS STRING)) AS rowkey,
    STRUCT(
      base.session_id,
      base.agent,
      base.timestamp,
      JSON_VALUE(analysis.ml_generate_text_llm_result, "$.quality_score") AS quality_score,
      JSON_VALUE(analysis.ml_generate_text_llm_result, "$.outcome") AS outcome,
      JSON_VALUE(analysis.ml_generate_text_llm_result, "$.summary") AS summary
    ) AS metrics
  FROM
    AI.GENERATE_TEXT(
      MODEL `myproject.analytics.gemini_flash`,
      (
        SELECT
          session_id,
          agent,
          timestamp,
          CONCAT(
            "Score this completed agent session. Return JSON with: ",
            "quality_score (0.0-1.0), outcome (success/partial/failure), ",
            "summary (one sentence). ",
            "Agent: ", COALESCE(agent, "unknown"),
            " | Response: ", COALESCE(
              JSON_VALUE(content, "$.response"),
              JSON_VALUE(content, "$.text_summary"),
              "no response"
            )
          ) AS prompt
        FROM
          APPENDS(
            TABLE `myproject.analytics.agent_events`,
            CURRENT_TIMESTAMP() - INTERVAL 10 MINUTE
          )
        WHERE event_type = "AGENT_COMPLETED"
      ),
      STRUCT(80 AS max_output_tokens, 0.1 AS temperature)
    ) AS analysis
)
'

3A.3 When to Use Which Path

Scenario	Path	Why
Nightly batch evaluation of all sessions	Remote Function (Path A)	Needs JOINs, GROUP BY, aggregation — not supported in continuous queries
Real-time error classification as events arrive	Continuous Query (Path A')	Stateless per-row processing; AI.GENERATE_TEXT on each error; no deployment needed
Dashboard with sub-second latency	Continuous Query → Bigtable	EXPORT DATA to Bigtable for low-latency reads
Alert on critical errors via Slack/PagerDuty	Continuous Query → Pub/Sub	EXPORT DATA to Pub/Sub with severity-based attributes
Agent self-diagnostic before responding	CLI (Path B)	Agent calls `bq-agent-sdk evaluate` as a tool
CI/CD gate blocking deployment	CLI (Path B)	`--exit-code` in GitHub Actions
Ad-hoc drift analysis comparing golden set	Remote Function (Path A)	Needs cross-table comparison (JOIN with golden set)
Semantic session clustering	Remote Function (Path A)	Needs aggregation and embedding distance (GROUP BY)

4. Path B: CLI Interface (`bq-agent-sdk`)

4.1 Overview

A command-line tool that wraps the SDK's Python API, designed for two primary consumers:

AI agents that invoke CLI commands as tools (low token overhead)
Platform engineers who script evaluation pipelines

4.1.1 Why CLI Instead of MCP for Agent Tool Integration

Recent research and community benchmarks demonstrate that CLI tools significantly outperform MCP (Model Context Protocol) servers when used as AI agent tools. The bq-agent-sdk CLI is designed with these findings in mind.

Token Efficiency: 35x Reduction

Approach	Context Tokens	Notes
MCP server schema (single service)	~55,000	Full tool definitions loaded at session start
MCP server schema (3 services stacked)	~150,000+	Each server adds its full schema
CLI command (`bq-agent-sdk evaluate --help`)	~4,150	Loaded just-in-time, only when needed
CLI manifest (all commands summarized)	~100	One-line descriptions; agent drills into `--help` on demand

A community benchmark measured a Token Efficiency Score of 202 for CLI vs 152 for MCP on identical tasks, with CLI achieving 28% higher task completion rate. MCP schema pre-loading consumed up to 40% of the available context window before any actual work began.

LLMs Are CLI-Native

Large language models are trained on billions of terminal interactions from public code repositories, documentation, and Stack Overflow. Commands like git, curl, jq, kubectl, and gcloud are deeply embedded in their training data. When an agent sees bq-agent-sdk evaluate --evaluator=latency --threshold=5000 --format=json, it can infer behavior from structural similarity to tools it has seen millions of times — no schema pre-loading required.

Unix Composability

CLI tools compose naturally via pipes and shell scripting, enabling workflows that are difficult or impossible with MCP:

# Evaluate → filter failures → alert (three tools, zero MCP schemas)
bq-agent-sdk evaluate --last=1h --format=json \
  | jq '.failed_sessions[]' \
  | xargs -I{} bq-agent-sdk get-trace --session-id={} --format=json \
  | jq '{session: .session_id, errors: .errors}' \
  | curl -X POST "$SLACK_WEBHOOK" -d @-

When MCP Is Still Appropriate

MCP remains valuable for: (a) structured validation where type-safe schemas prevent malformed requests, (b) multi-tenant environments with dynamic tool discovery, (c) services without CLI equivalents (e.g., browser automation, GUI testing), and (d) tool discovery in large catalogs. For the SDK's well-defined analytics operations, CLI is the simpler and more efficient path.

4.1.2 CLI Design Principles for Agent Consumption

The bq-agent-sdk CLI follows five principles that maximize its effectiveness as an AI agent tool:

Structured JSON output by default — --format=json is the default. Every command returns a parseable JSON object. Agents never need to parse free-form text.
Just-in-time --help — Instead of pre-loading a 55K-token schema, agents call bq-agent-sdk --help (~100 tokens) to discover commands, then bq-agent-sdk evaluate --help (~200 tokens) to learn a specific command's options. Total context cost: ~300 tokens vs ~55,000 for MCP.

Lightweight manifest — A one-line-per-command manifest can be embedded in an agent's system prompt at ~100 tokens total:

bq-agent-sdk doctor    — health check
bq-agent-sdk evaluate  — run evaluations (latency, errors, LLM judge)
bq-agent-sdk get-trace — retrieve session trace
bq-agent-sdk drift     — drift detection against golden set
bq-agent-sdk insights  — generate analytics report
bq-agent-sdk views     — manage BigQuery views

Machine-friendly exit codes — --exit-code returns 0/1 for pass/fail, enabling direct use in conditionals (if bq-agent-sdk evaluate ...; then ...).
Environment variable fallbacks — Global options like --project-id and --dataset-id can be set via BQ_AGENT_PROJECT and BQ_AGENT_DATASET, reducing per-invocation token cost for agents that call the CLI repeatedly.

4.2 Command Structure

bq-agent-sdk [GLOBAL OPTIONS] <command> [COMMAND OPTIONS]

Global Options:
  --project-id TEXT       GCP project ID [env: BQ_AGENT_PROJECT]
  --dataset-id TEXT       BigQuery dataset [env: BQ_AGENT_DATASET]
  --table-id TEXT         Events table [default: agent_events]
  --location TEXT         BQ location [default: us-central1]
  --endpoint TEXT         AI.GENERATE endpoint
  --connection-id TEXT    BQ connection ID
  --format TEXT           Output format: json|text|table [default: json]
  --quiet                 Suppress non-essential output

Commands:
  doctor                  Run diagnostic health check
  get-trace               Retrieve and render a trace
  list-traces             List recent traces with filters
  evaluate                Run code-based or LLM evaluation
  insights                Generate insights report
  drift                   Run drift detection against golden set
  distribution            Analyze question distribution
  hitl-metrics            Show HITL interaction metrics
  views                   Manage per-event-type BigQuery views

4.3 Critical User Journeys (CUJ)

CUJ-B1: AgentX Checks Its Own Latency Before Responding

Context: An ADK agent has a before_agent_callback that shells out to the CLI to check recent performance. If latency is high, it adjusts its strategy (e.g., skips expensive tool calls).

Agent's Tool Definition (ADK tool-calling schema):

{
  "name": "check_agent_performance",
  "description": "Check this agent's recent latency and error rate",
  "parameters": {
    "type": "object",
    "properties": {
      "session_count": {"type": "integer", "default": 10},
      "metric": {"type": "string", "enum": ["latency", "error_rate", "all"]}
    }
  }
}

Agent Invocation (what the LLM generates):

bq-agent-sdk evaluate \
  --project-id=myproject \
  --dataset-id=analytics \
  --agent-id=support_bot \
  --last=1h \
  --evaluator=latency \
  --threshold=5000 \
  --format=json

CLI Output (consumed by the agent):

{
  "evaluator": "latency",
  "threshold_ms": 5000,
  "total_sessions": 10,
  "passed": 7,
  "failed": 3,
  "pass_rate": 0.70,
  "aggregate_scores": {
    "avg_latency_ms": 3200,
    "max_latency_ms": 8400,
    "p95_latency_ms": 6100
  },
  "failed_sessions": ["sess-042", "sess-047", "sess-051"]
}

Agent's Decision Logic:

IF pass_rate < 0.8:
    → Switch to lighter model (gemini-flash instead of gemini-pro)
    → Skip optional enrichment tool calls
    → Add disclaimer: "Response may be less detailed due to system load"

End-to-End Flow:

User → "What's the refund policy for order #1234?"
       │
       ▼
AgentX (before responding):
  1. Calls: bq-agent-sdk evaluate --agent-id=support_bot --last=1h --evaluator=latency --threshold=5000
  2. Sees: pass_rate=0.70, avg_latency=3200ms
  3. Decides: latency is borderline, use lighter model
  4. Calls: bq-agent-sdk evaluate --agent-id=support_bot --last=1h --evaluator=error_rate --threshold=0.1
  5. Sees: pass_rate=0.95, error_rate=0.05
  6. Decides: errors are fine, proceed normally
       │
       ▼
AgentX → "The refund policy for order #1234 is..."

CUJ-B2: AgentX Retrieves a Past Session for Context

Context: A user returns and references a previous conversation. The agent retrieves the old session trace to understand context.

Agent Invocation:

bq-agent-sdk get-trace \
  --project-id=myproject \
  --dataset-id=analytics \
  --session-id=sess-previous-abc \
  --format=json \
  --quiet

CLI Output:

{
  "trace_id": "trace-abc-123",
  "session_id": "sess-previous-abc",
  "user_id": "user-42",
  "total_latency_ms": 4500,
  "span_count": 12,
  "tool_calls": [
    {"tool_name": "lookup_order", "args": {"order_id": "1234"}, "status": "OK"},
    {"tool_name": "check_refund_eligibility", "args": {"order_id": "1234"}, "status": "OK"}
  ],
  "final_response": "Your order #1234 is eligible for a full refund...",
  "errors": []
}

Agent uses this context: "I can see from your previous conversation that we confirmed order #1234 is eligible for a refund. Let me process that now."

CUJ-B3: Marcus Runs Nightly Eval in CI/CD

Goal: GitHub Actions workflow that gates deployment on evaluation pass rate.

.github/workflows/nightly-eval.yml:

name: Nightly Agent Evaluation
on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }}

      - name: Install SDK
        run: pip install bigquery-agent-analytics-sdk[cli]

      - name: Run latency evaluation
        run: |
          bq-agent-sdk evaluate \
            --project-id=${{ vars.GCP_PROJECT }} \
            --dataset-id=analytics \
            --agent-id=support_bot \
            --last=24h \
            --evaluator=latency \
            --threshold=5000 \
            --format=json \
            --exit-code \
          > eval_latency.json

      - name: Run error rate evaluation
        run: |
          bq-agent-sdk evaluate \
            --project-id=${{ vars.GCP_PROJECT }} \
            --dataset-id=analytics \
            --agent-id=support_bot \
            --last=24h \
            --evaluator=error_rate \
            --threshold=0.05 \
            --format=json \
            --exit-code \
          > eval_errors.json

      - name: Run correctness judge
        run: |
          bq-agent-sdk evaluate \
            --project-id=${{ vars.GCP_PROJECT }} \
            --dataset-id=analytics \
            --agent-id=support_bot \
            --last=24h \
            --evaluator=llm-judge \
            --criterion=correctness \
            --threshold=0.7 \
            --format=json \
            --exit-code \
          > eval_correctness.json

      - name: Run drift detection
        run: |
          bq-agent-sdk drift \
            --project-id=${{ vars.GCP_PROJECT }} \
            --dataset-id=analytics \
            --golden-dataset=golden_questions \
            --agent-id=support_bot \
            --last=24h \
            --min-coverage=0.85 \
            --exit-code \
          > drift_report.json

      - name: Generate insights summary
        if: always()
        run: |
          bq-agent-sdk insights \
            --project-id=${{ vars.GCP_PROJECT }} \
            --dataset-id=analytics \
            --agent-id=support_bot \
            --last=24h \
            --max-sessions=50 \
            --format=text \
          > insights_summary.txt

      - name: Upload reports
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: eval-reports
          path: |
            eval_*.json
            drift_report.json
            insights_summary.txt

Key behavior: --exit-code makes the command return exit code 1 when evaluation fails, causing the CI step to fail and blocking deployment.

CUJ-B4: Marcus Pipes CLI Output to Monitoring

Goal: Feed evaluation results into Slack alerts and Datadog metrics.

#!/bin/bash
# cron-eval.sh — runs every hour

RESULT=$(bq-agent-sdk evaluate \
  --project-id=myproject \
  --dataset-id=analytics \
  --agent-id=support_bot \
  --last=1h \
  --evaluator=latency \
  --threshold=5000 \
  --format=json)

PASS_RATE=$(echo "$RESULT" | jq -r '.pass_rate')
AVG_LATENCY=$(echo "$RESULT" | jq -r '.aggregate_scores.avg_latency_ms')

# Send to Datadog
curl -X POST "https://api.datadoghq.com/api/v1/series" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -d "{
    \"series\": [{
      \"metric\": \"agent.latency.pass_rate\",
      \"points\": [[$(date +%s), $PASS_RATE]],
      \"tags\": [\"agent:support_bot\"]
    }, {
      \"metric\": \"agent.latency.avg_ms\",
      \"points\": [[$(date +%s), $AVG_LATENCY]],
      \"tags\": [\"agent:support_bot\"]
    }]
  }"

# Alert Slack if pass rate drops
if (( $(echo "$PASS_RATE < 0.8" | bc -l) )); then
  curl -X POST "$SLACK_WEBHOOK" \
    -d "{\"text\": \"⚠️ Agent latency pass rate dropped to ${PASS_RATE} (threshold: 0.80). Avg: ${AVG_LATENCY}ms\"}"
fi

CUJ-B5: AgentX Performs Self-Correction Loop

Context: An agent notices repeated errors in a session and uses the CLI to diagnose and adapt in real-time.

Conversation Flow:

Turn 1: User asks complex multi-step question
Turn 2: Agent calls tool → TOOL_ERROR
Turn 3: Agent calls tool again → TOOL_ERROR
Turn 4: Agent invokes self-diagnostic:

  $ bq-agent-sdk get-trace \
      --project-id=myproject \
      --dataset-id=analytics \
      --session-id=current-session-456 \
      --format=json \
      --quiet

  Output:
  {
    "errors": [
      {"event_type": "TOOL_ERROR", "tool": "database_query",
       "error_message": "Connection timeout after 30s"},
      {"event_type": "TOOL_ERROR", "tool": "database_query",
       "error_message": "Connection timeout after 30s"}
    ],
    "error_count": 2,
    "tool_calls": [
      {"tool_name": "database_query", "status": "ERROR"},
      {"tool_name": "database_query", "status": "ERROR"}
    ]
  }

Turn 5: Agent recognizes "database_query" tool is timing out
         → Switches to cached data source
         → Tells user: "I'm experiencing delays with the live database.
           Let me check the cached data instead."

Turn 6: Agent calls fallback tool → success → responds with answer

CUJ-B6: Marcus Runs Doctor Check Before Deployment

Goal: Validate SDK configuration and table health before deploying a new agent version.

$ bq-agent-sdk doctor \
    --project-id=myproject \
    --dataset-id=analytics \
    --format=text

╔══════════════════════════════════════════════════╗
║         Agent Analytics Health Check             ║
╠══════════════════════════════════════════════════╣
║ Table: myproject.analytics.agent_events          ║
║ Schema: ✓ OK (16/16 required columns present)   ║
║                                                  ║
║ Event Coverage (last 24h):                       ║
║   USER_MESSAGE_RECEIVED    1,234                 ║
║   LLM_REQUEST              2,456                 ║
║   LLM_RESPONSE             2,450                 ║
║   TOOL_STARTING              890                 ║
║   TOOL_COMPLETED             875                 ║
║   TOOL_ERROR                  15                 ║
║   AGENT_STARTING             620                 ║
║   AGENT_COMPLETED            618                 ║
║   HITL_CONFIRMATION_REQ       42                 ║
║   STATE_DELTA                310                 ║
║                                                  ║
║ AI.GENERATE_TEXT: ✓ (gemini-2.5-flash)            ║
║ Connection:  ✓ us-central1.analytics-conn        ║
║                                                  ║
║ Warnings:                                        ║
║   ⚠ 2 AGENT_STARTING events without matching    ║
║     AGENT_COMPLETED (possible timeout)           ║
║   ⚠ TOOL_ERROR rate: 1.7% (15/890)              ║
╚══════════════════════════════════════════════════╝

CUJ-B7: Marcus Creates BigQuery Views via CLI

# Create all per-event-type views
$ bq-agent-sdk views create-all \
    --project-id=myproject \
    --dataset-id=analytics \
    --prefix=adk_

Created 18 views:
  ✓ adk_llm_requests
  ✓ adk_llm_responses
  ✓ adk_llm_errors
  ✓ adk_tool_starts
  ✓ adk_tool_completions
  ✓ adk_tool_errors
  ✓ adk_user_messages
  ✓ adk_agent_starts
  ✓ adk_agent_completions
  ✓ adk_invocation_starts
  ✓ adk_invocation_completions
  ✓ adk_state_deltas
  ✓ adk_hitl_credential_requests
  ✓ adk_hitl_confirmation_requests
  ✓ adk_hitl_input_requests
  ✓ adk_hitl_credential_completions
  ✓ adk_hitl_confirmation_completions
  ✓ adk_hitl_input_completions

# Create a single view
$ bq-agent-sdk views create LLM_RESPONSE \
    --project-id=myproject \
    --dataset-id=analytics

4.4 CLI Command Reference (Detailed)

`bq-agent-sdk get-trace`

Usage: bq-agent-sdk get-trace [OPTIONS]

  Retrieve and display a single trace or session.

Options:
  --trace-id TEXT       Retrieve by trace ID
  --session-id TEXT     Retrieve by session ID
  --render              Print hierarchical DAG tree [default: false]
  --format TEXT         json | text | tree [default: json]

Examples:

# JSON output for agent consumption
bq-agent-sdk get-trace --session-id=sess-001 --format=json

# Tree rendering for human debugging
bq-agent-sdk get-trace --trace-id=trace-abc --render --format=tree

`bq-agent-sdk list-traces`

Usage: bq-agent-sdk list-traces [OPTIONS]

  List recent traces matching filter criteria.

Options:
  --agent-id TEXT       Filter by agent name
  --user-id TEXT        Filter by user ID
  --session-ids TEXT    Comma-separated session IDs
  --last TEXT           Time window: 1h, 24h, 7d, 30d
  --start-time TEXT     ISO8601 start time
  --end-time TEXT       ISO8601 end time
  --has-error           Only sessions with errors
  --no-error            Only sessions without errors
  --min-latency INT     Minimum latency (ms)
  --max-latency INT     Maximum latency (ms)
  --event-types TEXT    Comma-separated event types
  --limit INT           Max traces [default: 20]
  --format TEXT         json | text | table [default: json]

Example:

# List error sessions from last hour
bq-agent-sdk list-traces \
  --agent-id=support_bot \
  --last=1h \
  --has-error \
  --format=table

SESSION_ID        SPANS  ERRORS  LATENCY_MS  STARTED_AT
sess-042          15     2       8400        2026-03-06T14:23:00Z
sess-047          8      1       6100        2026-03-06T14:45:00Z
sess-051          22     3       12400       2026-03-06T15:02:00Z

`bq-agent-sdk evaluate`

Usage: bq-agent-sdk evaluate [OPTIONS]

  Run code-based or LLM evaluation over traces.

Options:
  --evaluator TEXT      Evaluator type:
                          latency, error_rate, turn_count,
                          token_efficiency, cost,
                          llm-judge
  --threshold FLOAT     Pass/fail threshold
  --criterion TEXT      LLM judge criterion:
                          correctness, hallucination,
                          sentiment, custom
  --custom-prompt TEXT  Custom LLM judge prompt (with --criterion=custom)
  --agent-id TEXT       Filter by agent
  --last TEXT           Time window
  --start-time TEXT     ISO8601 start
  --end-time TEXT       ISO8601 end
  --limit INT           Max sessions [default: 100]
  --exit-code           Return exit code 1 on failure
  --format TEXT         json | text [default: json]

Examples:

# Code-based latency check (agent tool call)
bq-agent-sdk evaluate --evaluator=latency --threshold=5000 --agent-id=bot --last=1h

# LLM correctness judge (CI pipeline)
bq-agent-sdk evaluate --evaluator=llm-judge --criterion=correctness \
  --threshold=0.7 --last=24h --exit-code

# Custom LLM judge with user-defined prompt
bq-agent-sdk evaluate --evaluator=llm-judge --criterion=custom \
  --custom-prompt="Rate how well the agent handled PII. Score 0-1." \
  --threshold=0.9 --last=24h

`bq-agent-sdk insights`

Usage: bq-agent-sdk insights [OPTIONS]

  Generate comprehensive agent insights report.

Options:
  --agent-id TEXT       Filter by agent
  --last TEXT           Time window
  --max-sessions INT    Max sessions to analyze [default: 50]
  --format TEXT         json | text [default: json]

Example:

bq-agent-sdk insights --agent-id=support_bot --last=24h --format=text

══════════════════════════════════════════════════
          Agent Insights — support_bot
══════════════════════════════════════════════════
Sessions analyzed: 48 / 1,234 total

Goal Distribution:
  question_answering    62%  (30 sessions)
  task_automation       25%  (12 sessions)
  data_retrieval        13%  (6 sessions)

Outcome Distribution:
  success               78%
  partial_success       12%
  failure                8%
  abandoned              2%

Top Friction Points:
  1. high_latency         15 sessions (31%)
  2. too_many_tool_calls   8 sessions (17%)
  3. repetitive_responses  4 sessions  (8%)

Executive Summary:
  The support_bot agent shows strong overall performance
  with 78% success rate. Primary area for improvement is
  latency — 31% of sessions experienced high latency,
  particularly in multi-tool workflows. Consider caching
  frequently-accessed data or parallelizing tool calls.
══════════════════════════════════════════════════

`bq-agent-sdk drift`

Usage: bq-agent-sdk drift [OPTIONS]

  Run drift detection against a golden question set.

Options:
  --golden-dataset TEXT     Golden questions table (required)
  --agent-id TEXT           Filter by agent
  --last TEXT               Time window
  --embedding-model TEXT    Model for semantic matching
  --min-coverage FLOAT      Minimum coverage to pass [default: 0.0]
  --exit-code               Return exit code 1 if below min-coverage
  --format TEXT             json | text [default: json]

`bq-agent-sdk distribution`

Usage: bq-agent-sdk distribution [OPTIONS]

  Analyze question distribution patterns.

Options:
  --mode TEXT           Analysis mode:
                          frequently_asked, frequently_unanswered,
                          auto_group_using_semantics, custom
  --categories TEXT     Comma-separated custom categories (with --mode=custom)
  --top-k INT           Top items per category [default: 10]
  --agent-id TEXT       Filter by agent
  --last TEXT           Time window
  --format TEXT         json | text [default: json]

5. Security & IAM

5.1 Least-Privilege IAM Roles

Each path requires a distinct service account with minimum permissions. Never use the default Compute Engine or App Engine service account.

Remote Function Runtime SA (`bq-analytics-fn@PROJECT.iam`)

This SA runs the Cloud Function / Cloud Run service.

# Create SA
gcloud iam service-accounts create bq-analytics-fn \
  --display-name="BQ Agent Analytics Remote Function"

# Grant minimum roles
# 1. Read agent_events table
gcloud projects add-iam-policy-binding PROJECT \
  --member="serviceAccount:bq-analytics-fn@PROJECT.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer" \
  --condition="expression=resource.name.startsWith('projects/PROJECT/datasets/analytics'),title=analytics-only"

# 2. Run BigQuery jobs (for SDK queries)
gcloud projects add-iam-policy-binding PROJECT \
  --member="serviceAccount:bq-analytics-fn@PROJECT.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Role	Resource	Why
`roles/bigquery.dataViewer`	`analytics` dataset	Read `agent_events` and golden tables
`roles/bigquery.jobUser`	Project	Execute BQ queries from within Cloud Function

BigQuery Connection SA (auto-created)

When you create a CLOUD_RESOURCE connection, BigQuery auto-creates a SA. Grant it the invoker role on the Cloud Function.

# Find the connection's SA
CONNECTION_SA=$(bq show --connection --format=json PROJECT.us.analytics-conn \
  | jq -r '.cloudResource.serviceAccountId')

# Grant Cloud Run Invoker so BigQuery can call the function
gcloud functions add-invoker-policy-binding bq-agent-analytics \
  --region=us-central1 \
  --member="serviceAccount:${CONNECTION_SA}"

Role	Resource	Why
`roles/run.invoker`	Cloud Function / Cloud Run	Allow BigQuery to invoke the remote function

Continuous Query SA (`analytics-cq@PROJECT.iam`)

gcloud iam service-accounts create analytics-cq \
  --display-name="BQ Continuous Query SA"

# Read source, write destination, use AI models
gcloud projects add-iam-policy-binding PROJECT \
  --member="serviceAccount:analytics-cq@PROJECT.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor" \
  --condition="expression=resource.name.startsWith('projects/PROJECT/datasets/analytics'),title=analytics-dataset"

gcloud projects add-iam-policy-binding PROJECT \
  --member="serviceAccount:analytics-cq@PROJECT.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding PROJECT \
  --member="serviceAccount:analytics-cq@PROJECT.iam.gserviceaccount.com" \
  --role="roles/bigquery.connectionUser"

Role	Resource	Why
`roles/bigquery.dataEditor`	`analytics` dataset	Read `agent_events`, write analysis tables
`roles/bigquery.jobUser`	Project	Run continuous query jobs
`roles/bigquery.connectionUser`	Connection	Invoke AI.GENERATE_TEXT via remote model connection

5.2 CLI Authentication

The CLI uses Application Default Credentials (ADC). No service account key files.

# Interactive login (developer workstation)
gcloud auth application-default login

# Service account (CI/CD)
gcloud auth activate-service-account --key-file=sa-key.json
# or: use Workload Identity Federation (recommended for GitHub Actions)

Required user/SA permissions for CLI:

roles/bigquery.dataViewer on the analytics dataset
roles/bigquery.jobUser on the project

6. Cost Model

6.1 Per-1K Sessions Cost Estimate

Costs assume US multi-region, on-demand pricing (March 2026).

Component	Per 1K Sessions	Assumptions
BigQuery scan (CLI / Remote Fn)	~$0.03	~6 MB scanned per session × $6.25/TB
Cloud Function invocation (Remote Fn)	~$0.01	1K invocations × $0.40/million + 256MB × 500ms
AI.GENERATE_TEXT (Continuous Query)	~$1.25	1K calls × ~500 input tokens × $0.075/1M + ~100 output tokens × $0.30/1M (Gemini 2.5 Flash)
BigQuery continuous query slots	~$0.50/hr idle	1 slot minimum × Enterprise edition pricing; billed per reservation
Pub/Sub export (alerting)	~$0.004	1K messages × $40/million
Bigtable export	~$0.01	1K writes × $0.01/100K rows (depends on instance)

6.2 Monthly Cost Scenarios

Scenario	Volume	Estimated Monthly Cost
Small (dev/test)	10K sessions, CLI only	< $1 (BQ scan only)
Medium (production)	100K sessions, CLI + Remote Fn + nightly eval	~$15 (BQ scan + Cloud Function)
Large (streaming)	500K sessions, all paths + Continuous Query	~$700 (dominated by CQ reservation + AI.GENERATE_TEXT)

6.3 Cost Optimization

Partitioning: agent_events table should be partitioned by timestamp (ingestion-time or column). All queries use --last / WHERE timestamp >= which prunes partitions, reducing scan cost by 90%+.
Materialized views: Cache daily_quality table to avoid re-scanning raw events.
AI.GENERATE_TEXT batching: Continuous queries process rows as they arrive; no additional batching optimization needed.
Slot reservations: For continuous queries, a FLEX reservation (per-minute billing) is cheaper than on-demand for sustained workloads.

7. Per-Path SLOs & Operational Runbook

7.1 Service-Level Objectives

Path	Metric	Target	Measurement
CLI	Command latency (p95)	< 10s for `evaluate` (≤ 100 sessions)	CLI timing output (`--verbose`)
CLI	Availability	99.9% (bounded by BigQuery SLA)	BQ job success rate
CLI	Max error rate	< 1% of CLI invocations fail due to SDK bugs (vs infra)	Error log classification
Remote Function	Response latency (p95)	< 5s for `analyze`, < 15s for `judge`	Cloud Monitoring function latency
Remote Function	Availability	99.5% (Cloud Function + BQ connection)	Cloud Monitoring uptime check
Remote Function	Max error rate	< 2% of calls return non-retryable errors	`errorMessage` rate in BQ audit logs
Remote Function	Cold start	< 3s (Cloud Function gen2)	Cloud Monitoring cold start metric
Continuous Query	Processing latency	< 30s from event ingestion to analysis row written	`analyzed_at - timestamp` delta
Continuous Query	Availability	99% (bounded by Enterprise reservation)	`INFORMATION_SCHEMA.JOBS` status

7.2 Retry Behavior

Path	Retry Strategy
CLI	No automatic retry. User re-runs command. `--exit-code` returns 2 for infra errors (vs 1 for eval failure).
Remote Function	BigQuery automatically retries on HTTP 408, 429, 500, 503, 504. The Cloud Function must be idempotent for a given `(requestId, call_index)` pair. Non-retryable errors return HTTP 400.
Continuous Query	BigQuery restarts failed continuous queries automatically. If a row fails AI.GENERATE_TEXT, the row is skipped (no dead-letter). Monitor via `INFORMATION_SCHEMA.JOBS`.

7.3 Operational Runbook: "What Happens When It Fails"

Remote Function Failures

Failure	Symptom	Diagnosis	Resolution
Cold start timeout	First query after idle returns timeout	Cloud Monitoring → Function latency spike	Set `--min-instances=1` in deployment; increase timeout to 120s
Batch too large	HTTP 413 or OOM	Cloud Monitoring → memory usage	Reduce `max_batching_rows` in CREATE FUNCTION DDL (default 50 → 10)
SDK query fails	`errorMessage` in response	Cloud Function logs → BQ error	Check SA permissions (§5.1); verify `agent_events` table exists
Quota exhaustion	HTTP 429 from Cloud Function	Cloud quotas dashboard	Request quota increase; add `max_batching_rows` limit
Connection SA expired	"Permission denied" in BQ	Connection details → SA status	Re-grant `roles/run.invoker` to connection SA

Continuous Query Failures

Failure	Symptom	Diagnosis	Resolution
AI.GENERATE_TEXT quota	Query pauses / slows	`INFORMATION_SCHEMA.JOBS` → error message	Increase Vertex AI quota; reduce `max_output_tokens`
Reservation exhausted	Query queued, not processing	Reservation monitor → slot utilization	Add FLEX slots or reduce concurrent CQ count
Query exceeds 2-day limit	Query stops	`INFORMATION_SCHEMA.JOBS` → end_time	Use service account (150-day limit); set up auto-restart cron
Destination table schema mismatch	Insert fails	CQ error log → schema error	ALTER TABLE to add new columns; restart CQ
Pub/Sub topic deleted	EXPORT DATA fails	CQ error log	Recreate topic; restart CQ

CLI Failures

Failure	Symptom	Diagnosis	Resolution
ADC not configured	"Could not automatically determine credentials"	`gcloud auth list`	Run `gcloud auth application-default login`
Dataset not found	"Not found: Dataset"	Verify `--dataset-id`	Check project/dataset spelling; verify IAM access
Timeout on large evaluation	Command hangs > 60s	BQ job duration in console	Add `--limit=50` to reduce session count; use `--last=1h`

7.4 Alerting Recommendations

# Cloud Monitoring alert policy for Remote Function errors
gcloud monitoring policies create \
  --display-name="BQ Analytics Remote Fn Error Rate" \
  --condition-filter='resource.type="cloud_function" AND metric.type="cloudfunctions.googleapis.com/function/execution_count" AND metric.labels.status!="ok"' \
  --condition-threshold-value=0.02 \
  --condition-threshold-comparison=COMPARISON_GT \
  --notification-channels=CHANNEL_ID

# BigQuery INFORMATION_SCHEMA query for continuous query health
SELECT
  job_id,
  state,
  error_result.reason AS error_reason,
  creation_time,
  TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), creation_time, HOUR) AS running_hours
FROM `region-us`.INFORMATION_SCHEMA.JOBS
WHERE job_type = 'QUERY'
  AND configuration.query.continuous = TRUE
  AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY creation_time DESC;

8. Implementation Roadmap

8.1 MVP Definition (v1.0)

v1.0 ships CLI + evaluate + get-trace + --exit-code. This is the minimum surface that unblocks all three personas:

Feature	Persona Unblocked	Why MVP
`bq-agent-sdk evaluate`	Marcus (CI/CD), AgentX (self-diagnostic)	Core value: "is my agent healthy?"
`bq-agent-sdk get-trace`	AgentX (context retrieval)	Enables self-correction loop (CUJ-B5)
`--exit-code`	Marcus (CI/CD gate)	Blocks deploy on eval failure
`--format=json`	AgentX (machine consumption)	Structured output for agent tool use
`bq-agent-sdk doctor`	Marcus (pre-deploy check)	Validates setup before first eval

v1.1 (post-MVP):

Feature	Target
Remote Function (`agent_analytics()`)	v1.1 — requires Cloud Function deployment infra
`bq-agent-sdk insights`, `drift`, `distribution`	v1.1 — higher-level analytics
`bq-agent-sdk views`, `hitl-metrics`, `list-traces`	v1.1 — convenience commands
Continuous Query templates	v1.2 — requires Enterprise reservation

8.2 Phases

Phase 1: Core Refactoring (1 week) — v1.0

Extract filter-building helpers (--last, --agent-id, etc.) into shared utility that constructs TraceFilter from CLI args or remote function params
Add uniform response serialization layer for both interface boundaries (CLI and Remote Function): - Dataclass returns (Trace, Span, ContentPart, ObjectRef): Add Trace.to_dict() / Span.to_dict() methods that recursively convert nested dataclasses and datetime fields to JSON-safe dicts (ISO 8601 strings for datetimes) - Pydantic returns (EvaluationReport, InsightsReport, DriftReport, QuestionDistribution, WorldChangeReport): Use .model_dump(mode="json") or .model_dump_json() — not plain .model_dump(), which preserves raw datetime objects that fail json.dumps(). Affected fields include EvaluationReport.created_at, InsightsReport.created_at, SessionMetadata.start_time/end_time, etc. - dict returns (doctor(), hitl_metrics()): Verify datetime values are converted to ISO 8601 strings before output
Add --format output formatting layer (JSON, text table, tree)

Exit criteria:

json.dumps(serialize(result)) succeeds for every Client public method return type — including Trace, EvaluationReport, InsightsReport, DriftReport, and QuestionDistribution (unit test for each)
TraceFilter.from_cli_args() parses --last=1h, --agent-id=X, --session-id=Y (unit test coverage ≥ 90%)
Format layer renders JSON / text / table for a sample trace (snapshot test)

Phase 2: CLI MVP (`bq-agent-sdk`) (2 weeks) — v1.0

Add typer dependency (optional [cli] extra in pyproject.toml)
Implement CLI entry point in pyproject.toml [project.scripts]
Implement v1.0 commands: doctor, get-trace, evaluate
Add --exit-code support for CI/CD integration
Add --last time window parser (1h, 24h, 7d, 30d)
Write CLI integration tests (mock BQ client, ≥ 85% line coverage)
Write quickstart guide with copy-paste examples

Exit criteria:

pip install bigquery-agent-analytics-sdk[cli] → bq-agent-sdk --help works (smoke test in CI)
bq-agent-sdk evaluate --last=1h --evaluator=latency --threshold=5000 --exit-code returns exit code 1 on failure (integration test)
bq-agent-sdk get-trace --session-id=X --format=json returns valid JSON (integration test)
Quickstart guide timed walkthrough completes in < 10 minutes
Sample GitHub Actions workflow passes with mock credentials

Phase 3: Remote Function (2 weeks) — v1.1

Create deploy/remote_function/ directory with: - main.py (functions-framework entry point) - requirements.txt - deploy.sh (gcloud deployment script) - register.sql (CREATE FUNCTION DDL templates)
Implement dispatch for: analyze, evaluate, judge, insights, drift
Use the uniform serialization layer from Phase 1 in _dispatch() to convert all SDK return types to JSON-safe dicts before building the reply array (see Phase 1 serialization policy)
Reuse the existing _run_sync() bridge (client.py:247) for sync wrappers — insights(), drift_detection(), and deep_analysis() already route through it, so the Cloud Function entry point can call these sync methods directly without async/sync boundary concerns
Add Terraform/gcloud deployment automation
Write integration tests with BigQuery Remote Function simulator
Document deployment guide with IAM prerequisites (see §5)

Exit criteria:

deploy.sh deploys to a test project and SELECT agent_analytics('analyze', JSON'{"session_id":"test"}') returns valid JSON (end-to-end test)
Partial failure in a batch (1 of 5 calls errors) returns per-row error, not batch-level 400 (integration test)
All 5 operations return json.dumps()-safe responses (no raw datetime objects, no dataclass instances) — verified by integration test
deploy/remote_function/README.md includes IAM roles, cost estimate, and troubleshooting guide
Remote Function p95 latency < 5s for analyze operation (load test with 50 concurrent calls)

Phase 4: CLI v1.1 Commands + Continuous Query Templates (2 weeks) — v1.1/v1.2

Implement remaining CLI commands: insights, drift, distribution, hitl-metrics, list-traces, views
Document LLM tool-calling schema for agent integration
Create deploy/continuous_queries/ directory with: - realtime_error_analysis.sql — AI.GENERATE_TEXT error classification - session_scoring.sql — per-session quality scoring - pubsub_alerting.sql — critical error → Pub/Sub export - bigtable_dashboard.sql — session metrics → Bigtable - setup_reservation.md — Enterprise reservation guide
Document AI.GENERATE_TEXT prompt templates aligned with SDK evaluation criteria (correctness, hallucination, sentiment)
Add backfill guide (FOR SYSTEM_TIME AS OF → APPENDS handoff)
Document continuous query monitoring via INFORMATION_SCHEMA.JOBS

Exit criteria:

All 9 CLI commands pass integration tests (mock BQ client)
Each continuous query template runs without syntax errors in BigQuery dry-run mode (--dry_run flag)
Continuous query monitoring query returns job status for a running template

Phase 5: Documentation, Polish & Pilot (1 week) — v1.0 GA

Update SDK.md with CLI, Remote Function, and Continuous Query sections
Add examples/cli_agent_tool.py — example ADK agent using CLI as tool
Add examples/ci_eval_pipeline.sh — example CI/CD script
Add examples/remote_function_dashboard.sql — example Looker queries
Add examples/continuous_query_alerting.sql — real-time error alerting
Update README.md with new interfaces
Run design-partner pilot (see §10)

Exit criteria:

All examples run without errors against a test dataset
Pilot partners complete assigned CUJs within time targets (see §10.2)
Zero unresolved P0/P1 bugs from pilot feedback

8.3 Migration / Onboarding Path

Python-only user today → CLI in 15 minutes → Remote Function in 1 day

Step 1: CLI (15 minutes)

# Install with CLI extra
pip install bigquery-agent-analytics-sdk[cli]

# Set environment variables (avoid repeating in every command)
export BQ_AGENT_PROJECT=myproject
export BQ_AGENT_DATASET=analytics

# Health check
bq-agent-sdk doctor

# First evaluation
bq-agent-sdk evaluate --evaluator=latency --threshold=5000 --last=1h

# Retrieve a specific trace
bq-agent-sdk get-trace --session-id=sess-001 --format=json

Step 2: CI/CD gate (30 minutes)

# Add to .github/workflows/agent-eval.yml
- name: Install SDK
  run: pip install bigquery-agent-analytics-sdk[cli]

- name: Gate on latency
  run: bq-agent-sdk evaluate --evaluator=latency --threshold=5000 --last=24h --exit-code

Step 3: Remote Function (1 day)

# Clone and deploy
git clone https://github.com/haiyuan-eng-google/BigQuery-Agent-Analytics-SDK.git
cd BigQuery-Agent-Analytics-SDK/deploy/remote_function
./deploy.sh --project=myproject --region=us-central1

# Register in BigQuery (copy-paste from register.sql)
bq query --use_legacy_sql=false < register.sql

# First SQL evaluation
bq query --use_legacy_sql=false \
  "SELECT \`myproject.analytics.agent_analytics\`('analyze', JSON'{\"session_id\":\"sess-001\"}')"

Step 4: Continuous Query (optional, requires Enterprise edition)

# Deploy real-time error alerting template
bq query --use_legacy_sql=false --continuous=true \
  < deploy/continuous_queries/realtime_error_analysis.sql

9. Success Metrics

9.1 Adoption Metrics

Metric	Target	Measurement
Time-to-first-value (CLI)	First CLI eval run in < 10 minutes from `pip install`	Timed walkthrough in quickstart guide
Time-to-first-value (Remote Fn)	First SQL `agent_analytics()` call in < 30 minutes from repo clone	Timed deploy + query walkthrough
CLI adoption	20% of SDK users use CLI within 3 months	PyPI download stats for `[cli]` extra
Remote Function deployments	10 production deployments within 6 months	Deployment telemetry
Agent tool integration	5 agents use CLI for self-diagnostics	Community feedback / GitHub issues
CI/CD integration	3 orgs use `--exit-code` in pipelines	Community feedback
Token savings for agents	35x fewer tokens vs MCP schema loading (~4K vs ~145K); 60% fewer vs SQL generation	Benchmarked comparison against MCP baseline (see §4.1.1)

9.2 User Outcome Metrics

Outcome	Target	How Measured
Mean time to detect agent regression	< 15 minutes (from event to alert)	Continuous Query → Pub/Sub pipeline latency; CLI cron interval
MTTR reduction for agent incidents	30% reduction vs manual investigation	Before/after comparison during pilot (see §10)
Eval pipeline setup time	< 1 hour for full CI/CD gate (evaluate + drift + exit-code)	Pilot partner timed walkthrough
SQL analyst unblocked	Priya-persona can build dashboard without filing Python ticket	Pilot partner interview
Agent self-correction rate	Agents using CLI self-diagnostics resolve 50% of tool failures without human intervention	Session trace analysis (self-correction loop detected)

10. Design-Partner Validation Plan

10.1 Pilot Partners

Partner Profile	Persona	Focus Area	v1 Feature Set
Analyst team (1 data analyst / BI engineer)	Priya	Remote Function + Looker dashboard	`evaluate`, `analyze` via SQL
Agent team (1 agent developer)	AgentX	CLI as agent tool for self-diagnostics	`evaluate`, `get-trace` via CLI
Platform team (1 SRE / DevOps engineer)	Marcus	CI/CD pipeline with `--exit-code`	`evaluate`, `drift` via CLI + GitHub Actions

10.2 Pilot Success Criteria

Criterion	Measurement	Pass Threshold
Time-to-first-eval (CLI)	Timed from `pip install` to first `evaluate` output	< 10 minutes
Time-to-first-eval (Remote Fn)	Timed from repo clone to first SQL `agent_analytics()` result	< 30 minutes
Task completion without help	Pilot user completes assigned CUJ without asking SDK team questions	2 out of 3 users
Token overhead acceptable	Agent pilot measures context tokens consumed by CLI tool calls	< 5,000 tokens per CLI invocation
CI/CD gate works end-to-end	Platform pilot configures `--exit-code` in real pipeline, blocks on failure	Blocks deploy on eval failure
No P0 bugs	Pilot users report zero data-loss or incorrect evaluation results	0 P0 bugs

10.3 Pilot Protocol

Week 1: Onboard 3 pilot partners; provide quickstart guide + 30-min walkthrough
Weeks 2–3: Partners use their assigned path independently; SDK team collects:
- Setup friction (where did they get stuck?)
- Feature gaps (what did they need that was missing?)
- Bug reports (severity-tagged)
Week 4: Debrief interviews; collect NPS score (0–10) and written feedback
GA Decision Gate: Proceed to GA if:
- All 3 pilots achieve time-to-first-eval targets
- NPS ≥ 7 for 2 out of 3 partners
- Zero unresolved P0/P1 bugs
- Documentation rated "sufficient" or better by all 3

11. Non-Goals (Out of Scope)

Web UI / dashboard — Use Looker/Data Studio with remote functions instead
Custom real-time processing — The SDK does not build a streaming pipeline. Real-time analytics are handled via BigQuery Continuous Query templates (Path A', §3A) which use native AI.GENERATE_TEXT — the SDK provides SQL templates but no custom runtime
Agent framework integration — The CLI is framework-agnostic; specific ADK tool wrappers are a separate effort
Multi-cloud — BigQuery-only for now

12. Open Questions

CLI framework: click vs typer — typer has better auto-generated help and type inference, but click has broader ecosystem support. Recommendation: typer (better developer experience, auto-complete, and its auto-generated --help output is concise enough for LLM just-in-time consumption — see §4.1.2).
Authentication: Should the CLI handle gcloud auth automatically, or require users to have Application Default Credentials configured? Recommendation: Require ADC; add bq-agent-sdk auth check command.
Remote function granularity: ~~One function per operation vs one multiplexed function?~~ Decided: One multiplexed function agent_analytics(operation, params) (simpler deployment, single connection). All CUJs and examples in this document use this approach.
Versioning: Should the CLI version be tied to the SDK version? Recommendation: Yes, single version number for all interfaces.

Uh oh!

FilesExpand file tree

prd_unified_analytics_interface.md

Latest commit

History

prd_unified_analytics_interface.md

File metadata and controls

PRD: Unified Analytics Interface for BigQuery Agent Analytics SDK

1. Validation & Rationale

1.1 Is This a Necessary Update?

1.2 Current State

1.3 Proposed State

1.4 MVP Scope

2. User Personas

Persona A: Priya — Data Analyst (Remote Function Path)

Persona B: AgentX — Autonomous AI Agent (CLI Path)

Persona C: Marcus — Platform Engineer (CLI Path)

3. Path A: BigQuery Remote Function Interface

3.1 Overview

3.2 Supported Operations

3.3 Critical User Journeys (CUJ)

CUJ-A1: Priya Builds an Agent Quality Dashboard

CUJ-A2: Priya Runs LLM-as-Judge at Scale

CUJ-A3: Priya Creates a Drift Alert

3.4 Remote Function Technical Design

Prerequisites

Deployment Architecture

Request/Response Contract

API Versioning & Stability Contract

Error Codes

Partial Failure Semantics

Idempotency Guidance

CREATE FUNCTION DDL

Entry Point (deploy/remote_function/main.py)

3A. Path A-bis: Continuous Queries for Real-Time Analytics

3A.1 Overview

Key Capabilities and Constraints

Why This Matters for the SDK

3A.2 Critical User Journeys (CUJ)

CUJ-A4: Priya Builds Real-Time Error Alerting with Continuous Query + AI.GENERATE_TEXT

CUJ-A5: Priya Streams Session Quality Scores to Bigtable for Low-Latency Dashboards

3A.3 When to Use Which Path

4. Path B: CLI Interface (bq-agent-sdk)

4.1 Overview

4.1.1 Why CLI Instead of MCP for Agent Tool Integration

4.1.2 CLI Design Principles for Agent Consumption

4.2 Command Structure

4.3 Critical User Journeys (CUJ)

CUJ-B1: AgentX Checks Its Own Latency Before Responding

CUJ-B2: AgentX Retrieves a Past Session for Context

CUJ-B3: Marcus Runs Nightly Eval in CI/CD

CUJ-B4: Marcus Pipes CLI Output to Monitoring

CUJ-B5: AgentX Performs Self-Correction Loop

CUJ-B6: Marcus Runs Doctor Check Before Deployment

CUJ-B7: Marcus Creates BigQuery Views via CLI

4.4 CLI Command Reference (Detailed)

bq-agent-sdk get-trace

bq-agent-sdk list-traces

bq-agent-sdk evaluate

bq-agent-sdk insights

bq-agent-sdk drift

bq-agent-sdk distribution

5. Security & IAM

5.1 Least-Privilege IAM Roles

Remote Function Runtime SA (bq-analytics-fn@PROJECT.iam)

BigQuery Connection SA (auto-created)

Continuous Query SA (analytics-cq@PROJECT.iam)

5.2 CLI Authentication

6. Cost Model

6.1 Per-1K Sessions Cost Estimate

6.2 Monthly Cost Scenarios

6.3 Cost Optimization

7. Per-Path SLOs & Operational Runbook

7.1 Service-Level Objectives

7.2 Retry Behavior

7.3 Operational Runbook: "What Happens When It Fails"

Remote Function Failures

Continuous Query Failures

CLI Failures

7.4 Alerting Recommendations

Entry Point (`deploy/remote_function/main.py`)

4. Path B: CLI Interface (`bq-agent-sdk`)

`bq-agent-sdk get-trace`

`bq-agent-sdk list-traces`

`bq-agent-sdk evaluate`

`bq-agent-sdk insights`

`bq-agent-sdk drift`

`bq-agent-sdk distribution`

Remote Function Runtime SA (`bq-analytics-fn@PROJECT.iam`)

Continuous Query SA (`analytics-cq@PROJECT.iam`)

Phase 2: CLI MVP (`bq-agent-sdk`) (2 weeks) — v1.0