gemini-cli-extensions · copybara-service · Jun 12, 2026
@@ -41,16 +41,16 @@ Guidelines for generating valid BigFrames code for data manipulation, model
 development, and visualization.
 - **Guide**: [BIGFRAMES.md](references/bigframes/BIGFRAMES.md)
 
-Bigframes should be the default library/tool as it is more efficient than using
-the BigQuery Python client library.
+Bigframes should be the default library/tool for data manipulation and analysis in Python. However, if the user explicitly requests "BigQuery ML", "BQML", "BigQuery SQL", or "SQL", you MUST use native BQML SQL (via %%bqsql, magics) instead of BigFrames.
 
 ### 3. BigQuery ML & AI Functions (BQML SQL)
 
+**CRITICAL** Best Practices: You MUST read and follow the global constraints and mandatory function routing rules in
+[ai_function_best_practices.md](references/ai-ml/ai_function_best_practices.md) before writing any BQML AI/ML SQL query.
+
 Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
 (Forecasting, Generative AI, Classification, etc.).
-- **Best Practices**: [ai_function_best_practices.md](references/ai-ml/ai_function_best_practices.md)
 - **Functions Reference**:
-
   - **AI.CLASSIFY**: [ai_classify.md](references/ai-ml/ai_classify.md) - Classify text.
   - **AI.DETECT_ANOMALIES**: [ai_detect_anomalies.md](references/ai-ml/ai_detect_anomalies.md) - Detect anomalies.
   - **AI.EVALUATE**: [ai_evaluate.md](references/ai-ml/ai_evaluate.md) - Evaluate models.
@@ -59,12 +59,12 @@ Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
   - **AI.GENERATE_EMBEDDING**: [ai_generate_embedding.md](references/ai-ml/ai_generate_embedding.md) - Generate embeddings.
   - **AI.GENERATE_TABLE**: [ai_generate_table.md](references/ai-ml/ai_generate_table.md) - Table-valued AI generation.
   - **AI.IF**: [ai_if.md](references/ai-ml/ai_if.md) - Evaluate semantic conditions.
-  - **AI.KEY_DRIVERS**: [ai_key_drivers.md](references/ai-ml/ai_key_drivers.md) - Identify key drivers.
   - **AI.SCORE**: [ai_score.md](references/ai-ml/ai_score.md) - Score data.
   - **AI.SEARCH**: [ai_search.md](references/ai-ml/ai_search.md) - Semantic search.
   - **AI.SIMILARITY**: [ai_similarity.md](references/ai-ml/ai_similarity.md) - Semantic similarity.
   - **Remote Models**: [remote_models.md](references/ai-ml/remote_models.md) - Working with remote models (Vertex AI).
-  - **CONTRIBUTION_ANALYSIS**: [ml_contribution_analysis.md](references/ai-ml/ml_contribution_analysis.md) - Step-by-step contribution analysis.
+  - **CONTRIBUTION_ANALYSIS**: [ml_contribution_analysis.md](references/ai-ml/ml_contribution_analysis.md) - Finds contributing factors, key drivers of change. Requires creating a MODEL entity.
+  - **AI.KEY_DRIVERS**: [ai_key_drivers.md](references/ai-ml/ai_key_drivers.md) - Identifies key drivers, this is a TVF.
   - **VECTOR_SEARCH**: [vector_search.md](references/ai-ml/vector_search.md) - Vector search best practices.
 
 ### 4. Graph Analytics (Property Graphs & GQL)

@@ -4,30 +4,58 @@ Rules and syntax standards for BigQuery AI and Machine Learning functions.
 
 ## 1. Global Constraints
 
-* **Connection ID**: Use `'DEFAULT'` for the `connection` argument in remote `CREATE MODEL` statements.
-* **Dataset Creation**: Use `CREATE SCHEMA IF NOT EXISTS <project>.<dataset>;`.
-
-## 2. Mandatory Function Routing
-
-Function/Use Case         | Required Reference File
-------------------------- | ----------------------------------------------
-**AI.FORECAST**           | [ai_forecast.md](ai_forecast.md)
-**AI.EVALUATE**           | [ai_evaluate.md](ai_evaluate.md)
-**AI.GENERATE_TABLE**     | [ai_generate_table.md](ai_generate_table.md)
-**AI.GENERATE_EMBEDDING** | [ai_generate_embedding.md](ai_generate_embedding.md)
-**Remote Models**         | [remote_models.md](remote_models.md)
-**CONTRIBUTION_ANALYSIS** | [ml_contribution_analysis.md](ml_contribution_analysis.md)
-**VECTOR_SEARCH**         | [vector_search.md](vector_search.md)
+*   **Connection ID**: Use `'DEFAULT'` for the `connection` argument in remote
+    `CREATE MODEL` statements.
+*   **Dataset Creation**: Use `CREATE SCHEMA IF NOT EXISTS
+    <project>.<dataset>;`.
+*   **SQL Only**: You MUST use native BigQuery SQL (via `%%bqsql` magics) for
+    all BQML operations (model training, evaluation, prediction). Do NOT use
+    BigFrames (`bigframes.ml`) or the BigQuery Python client.
 
 ## 3. Mandatory Syntax Checks
 
-* **Table-Valued Functions (TVFs)**: `AI.GENERATE_TABLE`, `AI.FORECAST`, `AI.EVALUATE`, and `AI.GENERATE_EMBEDDING` MUST be placed in the `FROM` clause.
-* **Named Arguments**: `AI.FORECAST` and `AI.EVALUATE` require the `=>` operator for optional arguments.
-* **The "Prompt" Alias**: For `AI.GENERATE_TABLE`, the input subquery must contain a column aliased as `prompt`.
-* **Schema Quotes**: Ensure the `output_schema` string is enclosed in quotes.
+*   **Table-Valued Functions (TVFs)**: Table-Valued Functions (including,
+    but not limited to, `AI.GENERATE_TABLE`, `AI.FORECAST`, `AI.EVALUATE`,
+    and `AI.GENERATE_EMBEDDING`) MUST be placed in the `FROM` clause.
+*   **Named Arguments**: `AI.FORECAST` and `AI.EVALUATE` require the `=>`
+    operator for optional arguments.
+*   **The "Prompt" Alias**: For `AI.GENERATE_TABLE`, the input subquery must
+    contain a column aliased as `prompt`.
+*   **Schema Quotes**: Ensure the `output_schema` string is enclosed in quotes.
 
 ## 4. Model Selection
 
-* **Time-series**: `AI.FORECAST` uses **TimesFM** endpoints.
-* **Generative**: `AI.GENERATE_TABLE` uses **Gemini** endpoints.
-* **Freshness**: Prefer current models (e.g., `gemini-2.5-flash`) over deprecated ones.
+*   **Time-series**: `AI.FORECAST` uses **TimesFM** endpoints.
+*   **Generative**: `AI.GENERATE_TABLE` uses **Gemini** endpoints.
+*   **Freshness**: Prefer current models (e.g., `gemini-2.5-flash`) over
+    deprecated ones.
+
+## 5. Data Exploration
+
+*   **Mandatory Exploration**: Before training any model or running AI
+    functions, you MUST perform data exploration using:
+    1.  `ML.DESCRIBE_DATA` to understand the statistics of the dataset.
+    2.  A simple `SELECT` query with a `LIMIT` operator (e.g., `LIMIT 5` or
+        `LIMIT 10`) to sample the first few rows.
+
+## 6. Model Training and Hyperparameters
+
+*   **Default Parameters**: Always rely on BQML's default parameters and
+    hyperparameters unless the prompt explicitly requests specific tuning. Do
+    not unnecessarily specify hyperparameters. If one is necessary, justify the
+    reasoning.
+*   **Data Splitting**: Most BQML models handle data splitting automatically
+    (default is `AUTO_SPLIT`). Do not perform manual training/validation/testing
+    splits (either via SQL subqueries or Python) unless explicitly instructed.
+    *   **TimesFM Exception**: If performing time-series forecasting with
+        TimesFM (`AI.FORECAST`), you MUST split your dataset chronologically
+        into exactly two parts:
+    *   **Historical Data (History)**: Used as history_data in `AI.EVALUATE` and
+        `input_data` in `AI.FORECAST`.
+    *   **Evaluation Data (Actuals)**: Used as actual_data in `AI.EVALUATE` to
+        compare against the forecast.
+
+## 7. Model Evaluation
+
+*   **Use BQML Functions**: Always use native BQML evaluation functions (e.g.,
+    `ML.EVALUATE`, `ML.ARIMA_EVALUATE`, `AI.EVALUATE`) to compute metrics.