gemini-cli-extensions · copybara-service · Jun 26, 2026
@@ -1,5 +1,5 @@
 ---
-name: developing-with-bigquery
+name: bigquery
 description: |
   Provides BigQuery-specific logic, knowledge, and specialized standards.
   Use this skill whenever you are doing anything with BigQuery, including:
@@ -9,17 +9,14 @@ description: |
     4. Graph Analytics (GQL & Property Graphs)
 license: Apache-2.0
 metadata:
-  version: v2
+  version: v4
   publisher: google
+
 ---
 
 This skill provides comprehensive guidance for BigQuery services, optimizations,
 and data handling. It acts as a routing table for specialized BigQuery topics.
 
-> [!IMPORTANT]
->
-> For general standards on running BigQuery in notebooks (SQL cells, `export`
-> keyword), see `@skill:notebook-guidance`.
 
 > [!IMPORTANT]
 >
@@ -33,12 +30,14 @@ features:
 
 Performance and efficiency guidelines for BigQuery SQL. Includes rules for
 column pruning, pushdown, and materialization strategies.
+
 - **Guide**: [OPTIMIZATION.md](references/sql/OPTIMIZATION.md)
 
 ### 2. BigFrames (BigQuery DataFrames)
 
 Guidelines for generating valid BigFrames code for data manipulation, model
 development, and visualization.
+
 - **Guide**: [BIGFRAMES.md](references/bigframes/BIGFRAMES.md)
 
 Bigframes should be the default library/tool as it is more efficient than using
@@ -48,6 +47,7 @@ the BigQuery Python client library.
 
 Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
 (Forecasting, Generative AI, Classification, etc.).
+
 - **Best Practices**: [ai_function_best_practices.md](references/ai-ml/ai_function_best_practices.md)
 - **Functions Reference**:
 
@@ -70,5 +70,10 @@ Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
 ### 4. Graph Analytics (Property Graphs & GQL)
 
 Guidelines and best practices for querying property graphs in BigQuery.
+
 - **Property Graph Guidelines**: [graph_queries.md](references/graph/graph_queries.md) - Standard GQL syntax and query patterns.
 - **Semantic Graph Guidelines**: [semantic_queries.md](references/graph/semantic_queries.md) - Semantic graph operations and expand functions.
+-   **Graph Schema DDL Advisor**:
+    [graph_schema_ddl_advisor.md](references/graph/graph-schema/graph_schema_ddl_advisor.md)
+    -   Assists in defining, correcting, and optimizing BigQuery Property Graph
+        and Semantic Graph schemas.
@@ -0,0 +1,160 @@
+# BigQuery Graph Schema Best Practices
+
+This document outlines best practices for designing and defining Property Graph
+and Semantic Graph schemas in BigQuery. Following these guidelines improves
+graph query performance, ensures referential integrity, and avoids common
+pitfalls in flattened views (`GRAPH_EXPAND`).
+
+--------------------------------------------------------------------------------
+
+## 1. Scope Property Definitions (Critical for Performance)
+
+Properties are key-value pairs attached to nodes or edges. By default, or if
+using `PROPERTIES ALL COLUMNS`, all columns from the source table are attached.
+
+*   **The Pitfall**: Exposing unnecessary properties forces BigQuery to perform
+    redundant column scans in graph queries, severely degrading performance.
+*   **Best Practice**: **Only include properties that are actually needed for
+    querying.** Use the explicit `PROPERTIES (col1, col2, ...)` syntax to
+    restrict the property list.
+*   **Example**:
+
+```sql
+-- POOR: Exposes all columns including large text or metadata
+NODE TABLES ( my_dataset.users PROPERTIES ALL COLUMNS )
+
+-- GOOD: Only exposes relevant querying attributes
+NODE TABLES ( my_dataset.users PROPERTIES (user_id, name, age) )
+```
+
+--------------------------------------------------------------------------------
+
+## 2. Define Key Constraints (PK / FK)
+
+BigQuery doesn't strictly enforce Primary Key (PK) or Foreign Key (FK)
+constraints at runtime, but it uses them to optimize execution plans.
+
+*   **Optimization**: If PK/FK constraints are defined on the underlying tables,
+    the query engine can leverage them to eliminate unnecessary table scans and
+    prune join paths.
+*   **Referential Integrity**: Ensure your application guarantees the uniqueness
+    of primary keys and referential integrity of foreign keys. If they are
+    violated, graph query results may be incorrect.
+*   **Best Practice**: Always define PK on node tables and FK on edge tables in
+    their source DDL, and reference them in `CREATE PROPERTY GRAPH`.
+
+--------------------------------------------------------------------------------
+
+## 3. Avoid Column Name Collisions in Flattened Schema (`GRAPH_EXPAND`)
+
+The `GRAPH_EXPAND` TVF flattens the graph by prefixing each property with the
+Node/Edge alias (e.g., `NodeAlias_propertyName`).
+
+*   **The Danger**: If the combination of alias and property name results in
+    identical column names, the query will fail with a generic internal Dremel
+    error: `Error encountered during execution. Retrying may solve the problem.`
+*   **Scenario**:
+    *   Node `N` with property `a_b` -> Generated column: `N_a_b`
+    *   Node `N_a` with property `b` -> Generated column: `N_a_b` (Collision!)
+*   **Best Practice**: Design your node/edge aliases and property names
+    carefully to avoid prefix-induced collisions. Renaming properties or using
+    distinct aliases in the DDL resolves this.
+
+--------------------------------------------------------------------------------
+
+## 4. Always Use Safe Aliases (`AS alias`)
+
+If you omit the `AS alias` clause, BigQuery defaults to using the full table
+path as the alias (e.g., `project.dataset.table`).
+
+*   **The Pitfall**: The generated column names in the flattened view will
+    contain dots and hyphens (e.g., `project.dataset.table_property`). This
+    violates standard SQL output schema rules, and queries like `SELECT *` will
+    fail with `Invalid field name`.
+*   **Best Practice**: **Always specify a simple, alphanumeric alias** using
+    standard SQL naming conventions (no dots, hyphens, or special characters).
+*   **Example**:
+
+```sql
+-- POOR (Omitted alias):
+NODE TABLES ( `my-project.my_dataset.user_profiles` KEY(id) ... )
+
+-- GOOD (Safe alias):
+NODE TABLES ( `my-project.my_dataset.user_profiles` AS User KEY(id) ... )
+```
+
+*   **TODO**: This explicit safe alias requirement can be omitted
+    once the BigQuery engine natively resolves default column names containing
+    dots/hyphens.
+
+--------------------------------------------------------------------------------
+
+## 5. Reusing the Same Physical Table as Node and Edge Tables
+
+In hierarchical schemas (such as employee-manager org charts or product category
+trees), the same physical table often represents both the entity (Node) and the
+parent-child relationship (Edge).
+
+When modeling this in DDL, you must decide how the reused table is exposed in
+`GRAPH_EXPAND`:
+
+1.  **Explicit Edges (Special/Explicit Properties)**:
+    *   **Approach**: Declare the table as an edge table and list specific
+        property columns in the `PROPERTIES(...)` clause.
+    *   **Result**: These property columns will be exposed in the flattened
+        output view as `EdgeAlias_propertyName`. Use this when the
+        self-referential relationship itself carries important metadata (e.g.,
+        `assignment_date`, `relation_type`).
+2.  **Logical/Structural Edges (No Properties)**:
+    *   **Approach**: Declare the table as an edge table but specify `NO
+        PROPERTIES`.
+    *   **Result**: The edge remains "invisible" in the output columns of the
+        flattened view, while still correctly representing the hierarchical
+        structure for navigation and query path resolution in the backend. Use
+        this to avoid cluttering the output view when only the connectivity
+        matters.
+
+*   **Example**: Self-referential organizational chart:
+
+```sql
+CREATE OR REPLACE PROPERTY GRAPH `my-project.my_dataset.org_chart`
+  NODE TABLES (
+    `my-project.my_dataset.employees` AS Employee
+      KEY(emp_id)
+      LABEL Employee
+      PROPERTIES(emp_id, name, department)
+  )
+  EDGE TABLES (
+    -- Reusing 'employees' table purely to represent the 'reports_to' edge
+    `my-project.my_dataset.employees` AS ReportsTo
+      KEY(emp_id)
+      SOURCE KEY(emp_id) REFERENCES Employee(emp_id)
+      DESTINATION KEY(manager_id) REFERENCES Employee(emp_id)
+      LABEL ReportsTo
+      NO PROPERTIES -- Logical edge (structural only)
+  );
+```
+
+--------------------------------------------------------------------------------
+
+## 6. Handling Special Characters in Aliases
+
+If you absolutely must use special characters (like hyphens or spaces) in your
+aliases, you must be extremely careful with quoting.
+
+*   **DDL Quoting**: Quoting is required in the DDL:
+
+```sql
+NODE TABLES ( my_table AS `My-Node` ... )
+```
+
+*   **Querying Quoting**: You **MUST** use backticks when referencing these
+    columns in queries:
+
+```sql
+SELECT `My-Node_property` FROM GRAPH_EXPAND(...)
+```
+
+*   **Pitfall**: Omitting backticks (e.g., `SELECT My-Node_property`) causes the
+    query engine to interpret the hyphen as a subtraction operator (`My` minus
+    `Node_property`), throwing syntax errors.