Skip to content

Commit 61f6984

Browse files
Data Cloud Agents Teamcopybara-github
authored andcommitted
Consolidate BigQuery developer skills
PiperOrigin-RevId: 929956038
1 parent 0deb62a commit 61f6984

22 files changed

Lines changed: 1327 additions & 92 deletions
Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
---
22
name: developing-with-bigquery
33
description: |
4-
A repository of BigQuery-specific logic, knowledge, and specialized standards.
4+
Provides BigQuery-specific logic, knowledge, and specialized standards.
55
Use this skill whenever you are doing anything with BigQuery, including:
6-
1. BigQuery query optimization
6+
1. BigQuery query optimization (SQL)
77
2. BigFrames Python code
8-
3. BigQuery ML/AI functions.
8+
3. BigQuery ML/AI functions (SQL & Python)
9+
4. Graph Analytics (GQL & Property Graphs)
910
license: Apache-2.0
1011
metadata:
11-
version: v1
12+
version: v2
1213
publisher: google
1314
---
1415

@@ -31,32 +32,43 @@ features:
3132
### 1. Query Optimization
3233

3334
Performance and efficiency guidelines for BigQuery SQL. Includes rules for
34-
column pruning, pushdown, and materialization strategies. - **Guide**:
35-
[OPTIMIZATION.md](references/OPTIMIZATION.md)
35+
column pruning, pushdown, and materialization strategies.
36+
- **Guide**: [OPTIMIZATION.md](references/sql/OPTIMIZATION.md)
3637

3738
### 2. BigFrames (BigQuery DataFrames)
3839

3940
Guidelines for generating valid BigFrames code for data manipulation, model
40-
development, and visualization. - **Guide**:
41-
[BIGFRAMES.md](references/BIGFRAMES.md)
41+
development, and visualization.
42+
- **Guide**: [BIGFRAMES.md](references/bigframes/BIGFRAMES.md)
4243

4344
Bigframes should be the default library/tool as it is more efficient than using
4445
the BigQuery Python client library.
4546

4647
### 3. BigQuery ML & AI Functions (BQML SQL)
4748

4849
Usage rules and syntax standards for all BigQuery AI/ML functions via SQL
49-
(Forecasting, Generative AI, Classification, etc.). - **Guide**:
50-
[BQML.md](references/BQML.md) - **Functions Reference**: -
51-
[AI.FORECAST](references/ai-forecast.md) -
52-
[AI.EVALUATE](references/ai-evaluate.md) -
53-
[AI.GENERATE_TABLE](references/ai-generate-table.md) -
54-
[AI.GENERATE_EMBEDDING](references/ai-generate-embedding.md) -
55-
[Remote Models](references/remote-models.md)
56-
[CONTRIBUTION_ANALYSIS](references/ml-contribution-analysis.md)
57-
[VECTOR_SEARCH](references/vector-search.md)
58-
59-
### 4. Notebook SQL cells
60-
61-
Refer to `@skill:notebook-guidance` for standards on running BigQuery in
62-
notebooks.
50+
(Forecasting, Generative AI, Classification, etc.).
51+
- **Best Practices**: [ai_function_best_practices.md](references/ai-ml/ai_function_best_practices.md)
52+
- **Functions Reference**:
53+
54+
- **AI.CLASSIFY**: [ai_classify.md](references/ai-ml/ai_classify.md) - Classify text.
55+
- **AI.DETECT_ANOMALIES**: [ai_detect_anomalies.md](references/ai-ml/ai_detect_anomalies.md) - Detect anomalies.
56+
- **AI.EVALUATE**: [ai_evaluate.md](references/ai-ml/ai_evaluate.md) - Evaluate models.
57+
- **AI.FORECAST**: [ai_forecast.md](references/ai-ml/ai_forecast.md) - Time-series forecasting.
58+
- **AI.GENERATE**: [ai_generate.md](references/ai-ml/ai_generate.md) - Generate text using LLMs.
59+
- **AI.GENERATE_EMBEDDING**: [ai_generate_embedding.md](references/ai-ml/ai_generate_embedding.md) - Generate embeddings.
60+
- **AI.GENERATE_TABLE**: [ai_generate_table.md](references/ai-ml/ai_generate_table.md) - Table-valued AI generation.
61+
- **AI.IF**: [ai_if.md](references/ai-ml/ai_if.md) - Evaluate semantic conditions.
62+
- **AI.KEY_DRIVERS**: [ai_key_drivers.md](references/ai-ml/ai_key_drivers.md) - Identify key drivers.
63+
- **AI.SCORE**: [ai_score.md](references/ai-ml/ai_score.md) - Score data.
64+
- **AI.SEARCH**: [ai_search.md](references/ai-ml/ai_search.md) - Semantic search.
65+
- **AI.SIMILARITY**: [ai_similarity.md](references/ai-ml/ai_similarity.md) - Semantic similarity.
66+
- **Remote Models**: [remote_models.md](references/ai-ml/remote_models.md) - Working with remote models (Vertex AI).
67+
- **CONTRIBUTION_ANALYSIS**: [ml_contribution_analysis.md](references/ai-ml/ml_contribution_analysis.md) - Step-by-step contribution analysis.
68+
- **VECTOR_SEARCH**: [vector_search.md](references/ai-ml/vector_search.md) - Vector search best practices.
69+
70+
### 4. Graph Analytics (Property Graphs & GQL)
71+
72+
Guidelines and best practices for querying property graphs in BigQuery.
73+
- **Property Graph Guidelines**: [graph_queries.md](references/graph/graph_queries.md) - Standard GQL syntax and query patterns.
74+
- **Semantic Graph Guidelines**: [semantic_queries.md](references/graph/semantic_queries.md) - Semantic graph operations and expand functions.

skills/developing-with-bigquery/references/ai-forecast.md

Lines changed: 0 additions & 62 deletions
This file was deleted.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# BigQuery AI.Classify
2+
3+
`AI.CLASSIFY` categorizes unstructured data into a predefined set of labels.
4+
5+
## Syntax Reference
6+
7+
```sql
8+
AI.CLASSIFY(
9+
[ input => ] 'INPUT',
10+
[ categories => ] 'CATEGORIES'
11+
[, connection_id => 'CONNECTION_ID' ]
12+
[, endpoint => 'ENDPOINT' ]
13+
[, output_mode => 'OUTPUT_MODE' ]
14+
)
15+
```
16+
17+
### Input Arguments
18+
19+
| Argument | Requirement | Type | Description |
20+
| :------------------ | :----------- | :------------ | :-------------------- |
21+
| **`input`** | **Required** | String | The text content to |
22+
: : : : classify. :
23+
| **`categories`** | **Required** | Array<String> | A list of target |
24+
: : : : categories/labels. :
25+
: : : : Can be :
26+
: : : : `ARRAY<STRING>` or :
27+
: : : : `ARRAY<STRUCT<STRING, :
28+
: : : : STRING>>` (label, :
29+
: : : : description). :
30+
| **`connection_id`** | Optional | String | The connection ID to |
31+
: : : : use for the LLM. :
32+
| **`endpoint`** | Optional | String | The model name, e.g., |
33+
: : : : `'gemini-2.5-flash'`. :
34+
| **`output_mode`** | Optional | String | `'single'` (default) |
35+
: : : : or `'multi'`. :
36+
: : : : Determines the output :
37+
: : : : type. :
38+
39+
### Output Schema
40+
41+
The output type depends on the `output_mode` argument:
42+
43+
| Output Mode | output_mode Value | Type | Description |
44+
| :--------------- | :---------------- | :-------------- | :------------------ |
45+
| **Single Label** | `NULL` (Default) | `STRING` | The single category |
46+
: : : : that best fits the :
47+
: : : : input. :
48+
| **Single Label | `'single'` | `ARRAY<STRING>` | An array containing |
49+
: (Explicit)** : : : exactly one :
50+
: : : : category string. :
51+
| **Multi Label** | `'multi'` | `ARRAY<STRING>` | An array containing |
52+
: : : : zero or more :
53+
: : : : matching :
54+
: : : : categories. :
55+
56+
## Examples
57+
58+
### Classify text into categories
59+
60+
```sql
61+
SELECT
62+
content,
63+
AI.CLASSIFY(
64+
content,
65+
categories => ['Spam', 'Not Spam', 'Urgent'],
66+
connection_id => 'my-project.us.my-connection'
67+
) as classification
68+
FROM `dataset.emails`;
69+
```
70+
71+
### Classify text into multiple topics
72+
73+
```
74+
SELECT
75+
title,
76+
body,
77+
AI.CLASSIFY(
78+
body,
79+
categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
80+
output_mode => 'multi') AS categories
81+
FROM
82+
`bigquery-public-data.bbc_news.fulltext`
83+
LIMIT 100;
84+
```
85+
86+
### Classify reviews by sentiment
87+
88+
SELECT AI.CLASSIFY( ('Classify the review by sentiment: ', review), categories
89+
=> [('green', 'The review is positive.'), ('yellow', 'The review is neutral.'),
90+
('red', 'The review is negative.')]) AS ai_review_rating, reviewer_rating AS
91+
human_provided_rating, review FROM `bigquery-public-data.imdb.reviews` WHERE
92+
title = 'The English Patient'
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# BigQuery AI.Detect_Anomalies
2+
3+
`AI.DETECT_ANOMALIES` uses the pre-trained **TimesFM** model to identify
4+
deviations in time series data without needing to train a custom model.
5+
6+
## Syntax Reference
7+
8+
This function compares a target dataset against a historical dataset to identify
9+
anomalies.
10+
11+
```sql
12+
SELECT *
13+
FROM AI.DETECT_ANOMALIES(
14+
{ TABLE `project.dataset.history_table` | (SELECT * FROM history_query) },
15+
{ TABLE `project.dataset.target_table` | (SELECT * FROM target_query) },
16+
data_col => 'DATA_COL',
17+
timestamp_col => 'TIMESTAMP_COL'
18+
[, model => 'MODEL']
19+
[, id_cols => ID_COLS]
20+
[, anomaly_prob_threshold => ANOMALY_PROB_THRESHOLD]
21+
)
22+
23+
```
24+
25+
### Input Arguments
26+
27+
Argument | Requirement | Type | Description
28+
:--------------------------- | :----------- | :------------ | :----------
29+
**`historical_data`** | **Required** | Table/Query | The source table or subquery containing historical data for training context.
30+
**`target_data`** | **Required** | Table/Query | The source table or subquery containing data to analyze for anomalies.
31+
**`data_col`** | **Required** | String | The numeric column to analyze.
32+
**`timestamp_col`** | **Required** | String | The column containing dates/timestamps.
33+
**`id_cols`** | Optional | Array<String> | Grouping columns for multiple series (e.g., `['store_id']`).
34+
**`anomaly_prob_threshold`** | Optional | Float64 | Threshold for anomaly detection (0 to 1). Defaults to 0.95.
35+
**`model`** | Optional | String | Model version. Defaults to `'TimesFM 2.0'`.
36+
37+
### Output Schema
38+
39+
| Column | Type | Description |
40+
| :------------------------------- | :--------- | :--------------------------- |
41+
| **`id_cols`** | (As Input) | Original identifiers for the |
42+
: : : series. :
43+
| **`time_series_timestamp`** | TIMESTAMP | Timestamp for the analyzed |
44+
: : : points. :
45+
| **`time_series_data`** | FLOAT64 | The original data value. |
46+
| **`is_anomaly`** | BOOL | TRUE if the point is |
47+
: : : identified as an anomaly. :
48+
| **`lower_bound`** | FLOAT64 | Lower bound of the expected |
49+
: : : range. :
50+
| **`upper_bound`** | FLOAT64 | Upper bound of the expected |
51+
: : : range. :
52+
| **`anomaly_probability`** | FLOAT64 | Probability that the point |
53+
: : : is an anomaly. :
54+
| **`ai_detect_anomalies_status`** | STRING | Error messages or empty |
55+
: : : string on success. A minimum :
56+
: : : of 3 data points is :
57+
: : : required. :
58+
59+
## Examples
60+
61+
### Basic Anomaly Detection
62+
63+
Detect anomalies in daily bike trips for a specific 2-month window based on
64+
prior history.
65+
66+
```sql
67+
WITH bike_trips AS (
68+
SELECT EXTRACT(DATE FROM starttime) AS date, COUNT(*) AS num_trips
69+
FROM `bigquery-public-data.new_york.citibike_trips`
70+
GROUP BY date
71+
)
72+
SELECT *
73+
FROM AI.DETECT_ANOMALIES(
74+
-- Historical context (Training data equivalent)
75+
(SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
76+
-- Target range (Data to inspect for anomalies)
77+
(SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
78+
data_col => 'num_trips',
79+
timestamp_col => 'date'
80+
);
81+
82+
```
83+
84+
### Multivariate Detection (Multiple Series)
85+
86+
Use `id_cols` to detect anomalies separately for different user types (e.g.,
87+
Subscriber vs. Customer) in the same query.
88+
89+
```sql
90+
WITH bike_trips AS (
91+
SELECT
92+
EXTRACT(DATE FROM starttime) AS date, usertype, gender,
93+
COUNT(*) AS num_trips
94+
FROM `bigquery-public-data.new_york.citibike_trips`
95+
GROUP BY date, usertype, gender
96+
)
97+
SELECT *
98+
FROM
99+
AI.DETECT_ANOMALIES(
100+
# Historical data from a query
101+
(SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
102+
# Target data from a query
103+
(SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
104+
data_col => 'num_trips',
105+
timestamp_col => 'date',
106+
id_cols => ['usertype', 'gender'],
107+
model => "TimesFM 2.5",
108+
anomaly_prob_threshold => 0.8);
109+
110+
```

skills/developing-with-bigquery/references/ai-evaluate.md renamed to skills/developing-with-bigquery/references/ai-ml/ai_evaluate.md

File renamed without changes.

0 commit comments

Comments
 (0)