gemini-cli-extensions · copybara-service · Jun 18, 2026 · Jun 18, 2026
@@ -1,14 +1,15 @@
 ---
 name: developing-with-bigquery
 description: |
-  A repository of BigQuery-specific logic, knowledge, and specialized standards.
+  Provides BigQuery-specific logic, knowledge, and specialized standards.
   Use this skill whenever you are doing anything with BigQuery, including:
-    1. BigQuery query optimization
+    1. BigQuery query optimization (SQL)
     2. BigFrames Python code
-    3. BigQuery ML/AI functions.
+    3. BigQuery ML/AI functions (SQL & Python)
+    4. Graph Analytics (GQL & Property Graphs)
 license: Apache-2.0
 metadata:
-  version: v1
+  version: v2
   publisher: google
 ---
 
@@ -31,32 +32,43 @@ features:
 ### 1. Query Optimization
 
 Performance and efficiency guidelines for BigQuery SQL. Includes rules for
-column pruning, pushdown, and materialization strategies. - **Guide**:
-[OPTIMIZATION.md](references/OPTIMIZATION.md)
+column pruning, pushdown, and materialization strategies.
+- **Guide**: [OPTIMIZATION.md](references/sql/OPTIMIZATION.md)
 
 ### 2. BigFrames (BigQuery DataFrames)
 
 Guidelines for generating valid BigFrames code for data manipulation, model
-development, and visualization. - **Guide**:
-[BIGFRAMES.md](references/BIGFRAMES.md)
+development, and visualization.
+- **Guide**: [BIGFRAMES.md](references/bigframes/BIGFRAMES.md)
 
 Bigframes should be the default library/tool as it is more efficient than using
 the BigQuery Python client library.
 
 ### 3. BigQuery ML & AI Functions (BQML SQL)
 
 Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
-(Forecasting, Generative AI, Classification, etc.). - **Guide**:
-[BQML.md](references/BQML.md) - **Functions Reference**: -
-[AI.FORECAST](references/ai-forecast.md) -
-[AI.EVALUATE](references/ai-evaluate.md) -
-[AI.GENERATE_TABLE](references/ai-generate-table.md) -
-[AI.GENERATE_EMBEDDING](references/ai-generate-embedding.md) -
-[Remote Models](references/remote-models.md)
-[CONTRIBUTION_ANALYSIS](references/ml-contribution-analysis.md)
-[VECTOR_SEARCH](references/vector-search.md)
-
-### 4. Notebook SQL cells
-
-Refer to `@skill:notebook-guidance` for standards on running BigQuery in
-notebooks.
+(Forecasting, Generative AI, Classification, etc.).
+- **Best Practices**: [ai_function_best_practices.md](references/ai-ml/ai_function_best_practices.md)
+- **Functions Reference**:
+
+  - **AI.CLASSIFY**: [ai_classify.md](references/ai-ml/ai_classify.md) - Classify text.
+  - **AI.DETECT_ANOMALIES**: [ai_detect_anomalies.md](references/ai-ml/ai_detect_anomalies.md) - Detect anomalies.
+  - **AI.EVALUATE**: [ai_evaluate.md](references/ai-ml/ai_evaluate.md) - Evaluate models.
+  - **AI.FORECAST**: [ai_forecast.md](references/ai-ml/ai_forecast.md) - Time-series forecasting.
+  - **AI.GENERATE**: [ai_generate.md](references/ai-ml/ai_generate.md) - Generate text using LLMs.
+  - **AI.GENERATE_EMBEDDING**: [ai_generate_embedding.md](references/ai-ml/ai_generate_embedding.md) - Generate embeddings.
+  - **AI.GENERATE_TABLE**: [ai_generate_table.md](references/ai-ml/ai_generate_table.md) - Table-valued AI generation.
+  - **AI.IF**: [ai_if.md](references/ai-ml/ai_if.md) - Evaluate semantic conditions.
+  - **AI.KEY_DRIVERS**: [ai_key_drivers.md](references/ai-ml/ai_key_drivers.md) - Identify key drivers.
+  - **AI.SCORE**: [ai_score.md](references/ai-ml/ai_score.md) - Score data.
+  - **AI.SEARCH**: [ai_search.md](references/ai-ml/ai_search.md) - Semantic search.
+  - **AI.SIMILARITY**: [ai_similarity.md](references/ai-ml/ai_similarity.md) - Semantic similarity.
+  - **Remote Models**: [remote_models.md](references/ai-ml/remote_models.md) - Working with remote models (Vertex AI).
+  - **CONTRIBUTION_ANALYSIS**: [ml_contribution_analysis.md](references/ai-ml/ml_contribution_analysis.md) - Step-by-step contribution analysis.
+  - **VECTOR_SEARCH**: [vector_search.md](references/ai-ml/vector_search.md) - Vector search best practices.
+
+### 4. Graph Analytics (Property Graphs & GQL)
+
+Guidelines and best practices for querying property graphs in BigQuery.
+- **Property Graph Guidelines**: [graph_queries.md](references/graph/graph_queries.md) - Standard GQL syntax and query patterns.
+- **Semantic Graph Guidelines**: [semantic_queries.md](references/graph/semantic_queries.md) - Semantic graph operations and expand functions.
@@ -0,0 +1,92 @@
+# BigQuery AI.Classify
+
+`AI.CLASSIFY` categorizes unstructured data into a predefined set of labels.
+
+## Syntax Reference
+
+```sql
+AI.CLASSIFY(
+  [ input => ] 'INPUT',
+  [ categories => ] 'CATEGORIES'
+  [, connection_id => 'CONNECTION_ID' ]
+  [, endpoint => 'ENDPOINT' ]
+  [, output_mode => 'OUTPUT_MODE' ]
+)
+```
+
+### Input Arguments
+
+| Argument            | Requirement  | Type          | Description           |
+| :------------------ | :----------- | :------------ | :-------------------- |
+| **`input`**         | **Required** | String        | The text content to   |
+:                     :              :               : classify.             :
+| **`categories`**    | **Required** | Array<String> | A list of target      |
+:                     :              :               : categories/labels.    :
+:                     :              :               : Can be                :
+:                     :              :               : `ARRAY<STRING>` or    :
+:                     :              :               : `ARRAY<STRUCT<STRING, :
+:                     :              :               : STRING>>` (label,     :
+:                     :              :               : description).         :
+| **`connection_id`** | Optional     | String        | The connection ID to  |
+:                     :              :               : use for the LLM.      :
+| **`endpoint`**      | Optional     | String        | The model name, e.g., |
+:                     :              :               : `'gemini-2.5-flash'`. :
+| **`output_mode`**   | Optional     | String        | `'single'` (default)  |
+:                     :              :               : or `'multi'`.         :
+:                     :              :               : Determines the output :
+:                     :              :               : type.                 :
+
+### Output Schema
+
+The output type depends on the `output_mode` argument:
+
+| Output Mode      | output_mode Value | Type            | Description         |
+| :--------------- | :---------------- | :-------------- | :------------------ |
+| **Single Label** | `NULL` (Default)  | `STRING`        | The single category |
+:                  :                   :                 : that best fits the  :
+:                  :                   :                 : input.              :
+| **Single Label   | `'single'`        | `ARRAY<STRING>` | An array containing |
+: (Explicit)**     :                   :                 : exactly one         :
+:                  :                   :                 : category string.    :
+| **Multi Label**  | `'multi'`         | `ARRAY<STRING>` | An array containing |
+:                  :                   :                 : zero or more        :
+:                  :                   :                 : matching            :
+:                  :                   :                 : categories.         :
+
+## Examples
+
+### Classify text into categories
+
+```sql
+SELECT
+  content,
+  AI.CLASSIFY(
+    content,
+    categories => ['Spam', 'Not Spam', 'Urgent'],
+    connection_id => 'my-project.us.my-connection'
+  ) as classification
+FROM `dataset.emails`;
+```
+
+### Classify text into multiple topics
+
+```
+SELECT
+  title,
+  body,
+  AI.CLASSIFY(
+    body,
+    categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
+    output_mode => 'multi') AS categories
+FROM
+  `bigquery-public-data.bbc_news.fulltext`
+LIMIT 100;
+```
+
+### Classify reviews by sentiment
+
+SELECT AI.CLASSIFY( ('Classify the review by sentiment: ', review), categories
+=> [('green', 'The review is positive.'), ('yellow', 'The review is neutral.'),
+('red', 'The review is negative.')]) AS ai_review_rating, reviewer_rating AS
+human_provided_rating, review FROM `bigquery-public-data.imdb.reviews` WHERE
+title = 'The English Patient'
@@ -0,0 +1,110 @@
+# BigQuery AI.Detect_Anomalies
+
+`AI.DETECT_ANOMALIES` uses the pre-trained **TimesFM** model to identify
+deviations in time series data without needing to train a custom model.
+
+## Syntax Reference
+
+This function compares a target dataset against a historical dataset to identify
+anomalies.
+
+```sql
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  { TABLE `project.dataset.history_table` | (SELECT * FROM history_query) },
+  { TABLE `project.dataset.target_table` | (SELECT * FROM target_query) },
+  data_col => 'DATA_COL',
+  timestamp_col => 'TIMESTAMP_COL'
+  [, model => 'MODEL']
+  [, id_cols => ID_COLS]
+  [, anomaly_prob_threshold => ANOMALY_PROB_THRESHOLD]
+)
+
+```
+
+### Input Arguments
+
+Argument                     | Requirement  | Type          | Description
+:--------------------------- | :----------- | :------------ | :----------
+**`historical_data`**        | **Required** | Table/Query   | The source table or subquery containing historical data for training context.
+**`target_data`**            | **Required** | Table/Query   | The source table or subquery containing data to analyze for anomalies.
+**`data_col`**               | **Required** | String        | The numeric column to analyze.
+**`timestamp_col`**          | **Required** | String        | The column containing dates/timestamps.
+**`id_cols`**                | Optional     | Array<String> | Grouping columns for multiple series (e.g., `['store_id']`).
+**`anomaly_prob_threshold`** | Optional     | Float64       | Threshold for anomaly detection (0 to 1). Defaults to 0.95.
+**`model`**                  | Optional     | String        | Model version. Defaults to `'TimesFM 2.0'`.
+
+### Output Schema
+
+| Column                           | Type       | Description                  |
+| :------------------------------- | :--------- | :--------------------------- |
+| **`id_cols`**                    | (As Input) | Original identifiers for the |
+:                                  :            : series.                      :
+| **`time_series_timestamp`**      | TIMESTAMP  | Timestamp for the analyzed   |
+:                                  :            : points.                      :
+| **`time_series_data`**           | FLOAT64    | The original data value.     |
+| **`is_anomaly`**                 | BOOL       | TRUE if the point is         |
+:                                  :            : identified as an anomaly.    :
+| **`lower_bound`**                | FLOAT64    | Lower bound of the expected  |
+:                                  :            : range.                       :
+| **`upper_bound`**                | FLOAT64    | Upper bound of the expected  |
+:                                  :            : range.                       :
+| **`anomaly_probability`**        | FLOAT64    | Probability that the point   |
+:                                  :            : is an anomaly.               :
+| **`ai_detect_anomalies_status`** | STRING     | Error messages or empty      |
+:                                  :            : string on success. A minimum :
+:                                  :            : of 3 data points is          :
+:                                  :            : required.                    :
+
+## Examples
+
+### Basic Anomaly Detection
+
+Detect anomalies in daily bike trips for a specific 2-month window based on
+prior history.
+
+```sql
+WITH bike_trips AS (
+  SELECT EXTRACT(DATE FROM starttime) AS date, COUNT(*) AS num_trips
+  FROM `bigquery-public-data.new_york.citibike_trips`
+  GROUP BY date
+)
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  -- Historical context (Training data equivalent)
+  (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+  -- Target range (Data to inspect for anomalies)
+  (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+  data_col => 'num_trips',
+  timestamp_col => 'date'
+);
+
+```
+
+### Multivariate Detection (Multiple Series)
+
+Use `id_cols` to detect anomalies separately for different user types (e.g.,
+Subscriber vs. Customer) in the same query.
+
+```sql
+WITH bike_trips AS (
+    SELECT
+      EXTRACT(DATE FROM starttime) AS date, usertype, gender,
+      COUNT(*) AS num_trips
+    FROM `bigquery-public-data.new_york.citibike_trips`
+    GROUP BY date, usertype, gender
+  )
+SELECT *
+FROM
+  AI.DETECT_ANOMALIES(
+    # Historical data from a query
+    (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+    # Target data from a query
+    (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+    data_col => 'num_trips',
+    timestamp_col => 'date',
+    id_cols => ['usertype', 'gender'],
+    model => "TimesFM 2.5",
+    anomaly_prob_threshold => 0.8);
+
+```