Skip to content

Commit 1ff456e

Browse files
authored
feat: Sync data cloud skills (#31)
1 parent f0fecd6 commit 1ff456e

15 files changed

Lines changed: 423 additions & 267 deletions

File tree

skills/bigquery-data-transfer-service/SKILL.md

Lines changed: 66 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,24 @@ metadata related to ingestion when needed.
3131

3232
## Workflow
3333

34-
### Step 0: Check for Existing Transfers
34+
### Step 0: Discover Environment Parameters
35+
36+
Before generating configurations, discover the actual values for the target
37+
project and region.
38+
39+
> [!TIP]
40+
> If `deployment.yaml` already exists in the repository root, prioritize
41+
> extracting `project` and `region` from the target environment configuration
42+
> (e.g., `dev`).
43+
44+
1. **Project**: `gcloud config get project`
45+
2. **Region**: `gcloud config get compute/region`
46+
47+
> [!TIP]
48+
> Use these commands to replace placeholders like `<PROJECT_ID>` with actual
49+
> values. Always remove associated comments that start with TODO once replaced.
50+
51+
### Step 1: Check for Existing Transfers
3552

3653
Before assuming a new transfer is needed, check for existing ones in the target
3754
region.
@@ -40,18 +57,19 @@ region.
4057

4158
```bash
4259
bq ls --transfer_config \
43-
--transfer_location=[LOCATION] \
44-
--project_id=[PROJECT_ID]
60+
--transfer_location=<REGION> \
61+
--project_id=<PROJECT_ID>
4562
```
4663

47-
2. **Evaluate Results**:
64+
2. **Analyze Existing Transfers**:
4865

4966
- **Single Transfer Found**:
5067

5168
- Check if the transfer has at least one successful run: `bq ls
52-
--transfer_run --transfer_config=[RESOURCE_NAME]`
53-
- If found: Use existing or manage via deployment framework.
54-
- If not found: Guess tables from config.
69+
--transfer_run --transfer_config=<RESOURCE_NAME>`
70+
- If found: Use existing transfer config.
71+
- If not found: Confirm with user if it's ok to trigger
72+
the transfer run.
5573
5674
- **Multiple Transfers Found**:
5775
@@ -61,66 +79,84 @@ region.
6179
- **Disabled Transfers Found**:
6280
6381
- Ask user if they want to enable it or create a new one.
64-
- Enable: `bq update --disabled=false
65-
--transfer_config=[RESOURCE_NAME]`
82+
- To Enable: Instruct the user to update the transfer configuration
83+
within their `deployment.yaml` file by setting the `disabled`
84+
field to `false` for the specific transfer resource.
6685
6786
- **No Transfers Found**: Proceed to create new if needed.
6887
69-
### Step 1: Discover & Validate Parameters (New Transfers)
88+
### Step 2: Discover & Validate Parameters (New Transfers)
7089
7190
If creating a new transfer, discover the required parameters using the REST API
7291
and validate them with the user.
7392
74-
> [!TIP] If `DATA_SOURCE_ID` is unknown, run `bq show --transfer_data_sources`
75-
> `--location=[LOCATION] --project_id=[PROJECT_ID]` to list available source IDs
76-
> (e.g., `google_cloud_storage`, `salesforce`).
93+
> [!TIP] If `<DATA_SOURCE_ID>` is unknown, run the discovery script
94+
> without `<DATA_SOURCE_ID>` argument to list available source IDs
95+
> (e.g., `google_cloud_storage`).
96+
> It uses the derived project and location from Step 0.
97+
> ```bash
98+
> python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID>
99+
> ```
77100
78101
1. **Run Discovery Script**: Use the `bigquery_dts.py` script to inspect Data
79102
Source parameters via the REST API.
80103
81104
```bash
82-
# Use the path to the script in your workspace
83-
python3 scripts/bigquery_dts.py --project_id [PROJECT_ID] [DATA_SOURCE_ID] [LOCATION]
105+
# Passes the derived project and region to the script.
106+
python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID> <DATA_SOURCE_ID> <REGION>
84107
```
85108
86109
> [!IMPORTANT] Run this command every time a new transfer is being planned.
87110
88-
2. **Mandatory User Questionnaire (CRITICAL)**:
89-
90-
- Identify mandatory parameters.
91-
- Present them to the user BEFORE generating config files.
92-
- Ask for verification of assets/tables.
93-
94-
3. **Wait for User Response**: Do NOT proceed until parameters are confirmed.
95-
96-
### Step 2: Extract Transfer Config Data
111+
2. > [!CAUTION] **Mandatory User Questionnaire (CRITICAL)**:
112+
113+
- **Explicitly identify ALL specific parameters** returned by the
114+
discovery script. **You MUST NOT generalize or vaguely summarize them.**
115+
- **OAuth Authorization (Google Data Sources)**: For Google ecosystem data
116+
sources (Google Ads, Youtube, etc.), if the user is not using a service
117+
account to configure the DTS transfer config (meaning the user is using
118+
End User Credentials or EUC to configure the transfer config), then
119+
generate an OAuth URI. Ask the user to visit this URL to authorize.
120+
Once the user provides the versionInfo code, use the code as
121+
`definition.versionInfo` in `deployment.yaml` and then you can proceed.
122+
- If any parameters are related to authentication,
123+
explicitly ask the user to provide the Secret Manager Resource ID
124+
(e.g., projects/my-project/secrets/my-secret) for these parameters
125+
- Present every required parameter to the user BEFORE generating
126+
config files.
127+
- Ask for verification of assets/tables to be ingested.
128+
129+
3. **Wait for User Response**: You **MUST NOT** proceed until parameters are
130+
confirmed.
131+
132+
### Step 3: Extract Transfer Config Data
97133
98134
Retrieve the configuration details for the selected transfer.
99135
100136
```bash
101-
bq show --format=prettyjson --transfer_config [RESOURCE_NAME]
137+
bq show --format=prettyjson --transfer_config <RESOURCE_NAME>
102138
```
103139
104-
### Step 3: Trigger and Verify Transfer
140+
### Step 4: Trigger and Verify Transfer
105141
106142
After the transfer is deployed via the resource provisioning framework, you MUST
107143
ensure there is at least a single successful run before proceeding with the rest
108144
of the tasks.
109145
110-
1. **Trigger a Manual Run**: If no successful runs are found, or the transfer
111-
was just created, trigger a manual run for the current time.
146+
1. **Trigger a Manual Run**: If no successful runs or ongoing runs are found,
147+
or the transfer was just created, trigger a manual run for the current time.
112148
113149
```bash
114150
bq mk --transfer_run \
115-
--transfer_config=[RESOURCE_NAME] \
151+
--transfer_config=<RESOURCE_NAME> \
116152
--run_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
117153
```
118154
119155
2. **Poll for Completion (5-Minute Rule)**: Attempt to check the status of the
120156
run every 30-60 seconds for up to **5 minutes**.
121157
122158
```bash
123-
bq ls --transfer_run --transfer_config=[RESOURCE_NAME]
159+
bq ls --format=prettyjson --transfer_run --transfer_config=<RESOURCE_NAME>
124160
```
125161
126162
- **Success**: If the run completes successfully, proceed with the rest of

skills/bigquery-data-transfer-service/scripts/bigquery_dts.py

Lines changed: 90 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -19,71 +19,65 @@
1919
import os
2020
import subprocess
2121
import sys
22+
from typing import Optional
2223
import urllib.error
24+
import urllib.parse
2325
import urllib.request
2426

2527

26-
def get_project_id():
27-
"""Retrieves the active Google Cloud project ID from the environment or CLI."""
28-
project_id = os.environ.get("PROJECT_ID")
29-
if project_id:
30-
return project_id
31-
try:
32-
result = subprocess.run(
33-
["gcloud", "config", "get-value", "project"],
34-
capture_output=True,
35-
text=True,
36-
check=True,
37-
)
38-
return result.stdout.strip()
39-
except subprocess.CalledProcessError:
40-
print("Error: Could not determine PROJECT_ID.", file=sys.stderr)
41-
sys.exit(1)
42-
except FileNotFoundError:
28+
def _run_gcloud(*args: str) -> Optional[str]:
29+
"""Executes a gcloud command and returns the stripped stdout, or None."""
30+
for cmd in ["gcloud", "gcloud.cmd"]:
4331
try:
4432
result = subprocess.run(
45-
["gcloud.cmd", "config", "get-value", "project"],
33+
[cmd, *args],
4634
capture_output=True,
4735
text=True,
4836
check=True,
4937
)
5038
return result.stdout.strip()
51-
except Exception: # pylint: disable=broad-exception-caught
52-
print("Error: gcloud command not found.", file=sys.stderr)
53-
sys.exit(1)
39+
except (subprocess.CalledProcessError, FileNotFoundError):
40+
continue
41+
return None
42+
43+
44+
def get_project_id() -> str:
45+
"""Retrieves the active Google Cloud project ID."""
46+
project_id = os.environ.get("PROJECT_ID")
47+
if project_id:
48+
return project_id
5449

50+
val = _run_gcloud("config", "get-value", "project")
51+
if val and "(unset)" not in val:
52+
return val
5553

56-
def get_token():
54+
print("Error: Could not determine PROJECT_ID.", file=sys.stderr)
55+
sys.exit(1)
56+
57+
58+
def get_token() -> str:
5759
"""Retrieves the access token using the gcloud CLI."""
58-
try:
59-
result = subprocess.run(
60-
["gcloud", "auth", "print-access-token"],
61-
capture_output=True,
62-
text=True,
63-
check=True,
64-
)
65-
return result.stdout.strip()
66-
except FileNotFoundError:
67-
try:
68-
result = subprocess.run(
69-
["gcloud.cmd", "auth", "print-access-token"],
70-
capture_output=True,
71-
text=True,
72-
check=True,
73-
)
74-
return result.stdout.strip()
75-
except Exception: # pylint: disable=broad-exception-caught
76-
print("Error: gcloud command not found.", file=sys.stderr)
77-
sys.exit(1)
78-
except subprocess.CalledProcessError:
79-
print(
80-
"Error: Could not obtain access token. Are you logged in?",
81-
file=sys.stderr,
82-
)
83-
sys.exit(1)
60+
token = _run_gcloud("auth", "print-access-token")
61+
if token:
62+
return token
8463

64+
print(
65+
"Error: Could not obtain access token. Are you logged in?",
66+
file=sys.stderr,
67+
)
68+
sys.exit(1)
8569

86-
def main():
70+
71+
def get_region() -> str:
72+
"""Retrieves the default compute region from gcloud config."""
73+
val = _run_gcloud("config", "get-value", "compute/region")
74+
if val and "(unset)" not in val:
75+
return val
76+
77+
return "us"
78+
79+
80+
def main() -> None:
8781
"""Main entry point for the script."""
8882
parser = argparse.ArgumentParser(
8983
description=(
@@ -92,9 +86,11 @@ def main():
9286
)
9387
)
9488
parser.add_argument("--project_id", help="The GCP project ID to use")
95-
parser.add_argument("data_source_id", help="The DATA_SOURCE_ID to inspect")
9689
parser.add_argument(
97-
"region", nargs="?", default="us", help="The GCP region (default: us)"
90+
"data_source_id", nargs="?", help="The DATA_SOURCE_ID to inspect"
91+
)
92+
parser.add_argument(
93+
"region", nargs="?", help="The GCP region (default: derived or us)"
9894
)
9995
args = parser.parse_args()
10096

@@ -106,16 +102,27 @@ def main():
106102
)
107103
sys.exit(1)
108104

109-
print(
110-
f"Retrieving Data Source parameters for: {args.data_source_id} "
111-
f"in {args.region}..."
112-
)
105+
region = args.region or get_region() or "us"
113106

114-
base_url = (
115-
"https://bigquerydatatransfer.googleapis.com/v1/"
116-
f"projects/{project_id}/locations/{args.region}"
117-
)
118-
url = f"{base_url}/dataSources/{args.data_source_id}"
107+
if args.data_source_id:
108+
print(
109+
f"Retrieving Data Source parameters for: {args.data_source_id} "
110+
f"in {region}..."
111+
)
112+
url = (
113+
"https://bigquerydatatransfer.googleapis.com/v1/"
114+
f"projects/{project_id}/locations/{region}/dataSources/"
115+
f"{args.data_source_id}"
116+
)
117+
else:
118+
print(
119+
f"Listing available Data Sources in {region} for project "
120+
f"{project_id}..."
121+
)
122+
url = (
123+
"https://bigquerydatatransfer.googleapis.com/v1/"
124+
f"projects/{project_id}/locations/{region}/dataSources"
125+
)
119126

120127
token = get_token()
121128

@@ -124,9 +131,31 @@ def main():
124131
req.add_header("Content-Type", "application/json")
125132

126133
try:
127-
with urllib.request.urlopen(req) as response:
134+
with urllib.request.urlopen(req, timeout=30) as response:
128135
data = json.loads(response.read().decode("utf-8"))
129136
print(json.dumps(data, indent=4))
137+
138+
# Generate OAuth authorization URI for Google data sources
139+
client_id = data.get("clientId")
140+
scopes = data.get("scopes")
141+
if client_id and scopes:
142+
print("\n" + "=" * 40)
143+
print("MANDATORY OAUTH AUTHORIZATION STEP")
144+
print("=" * 40)
145+
print(
146+
"This Data Source requires user authorization. "
147+
"Please follow the URL below to authorize:"
148+
)
149+
params = {
150+
"redirect_uri": "urn:ietf:wg:oauth:2.0:oob",
151+
"response_type": "version_info",
152+
"client_id": client_id,
153+
"scope": " ".join(scopes),
154+
}
155+
query_string = urllib.parse.urlencode(params)
156+
auth_url = f"https://bigquery.cloud.google.com/datatransfer/oauthz/auth?{query_string}"
157+
print(f"\n{auth_url}\n")
158+
print("=" * 40 + "\n")
130159
except urllib.error.HTTPError as e:
131160
print(f"HTTP Error: {e.code} {e.reason}", file=sys.stderr)
132161
print(e.read().decode("utf-8"), file=sys.stderr)

skills/building-data-apps/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: |
66
Analytics chat integration for data analytics.
77
88
Relevant when any of the following conditions are true:
9-
1. The user explicitly requests to build a data dashboard, data application, or visualization UI, and the UI pulls data from a GCP database (e.g., BigQuery, Spanner).
9+
1. The user explicitly requests to build a data dashboard, data application, or visualization UI, and the UI pulls data from a GCP database (defaulting to BigQuery unless an alternative is specified).
1010
2. You need to generate a frontend web application to interact with, query, and visualize data from GCP data sources.
1111
3. The user wants to build a "chat with your data" experience or integrate the Gemini Data Analytics chat API into a web interface.
1212
@@ -16,7 +16,7 @@ description: |
1616
3. The web application is not data-centric or does not involve visualizing/querying data from GCP sources.
1717
license: Apache-2.0
1818
metadata:
19-
version: v1
19+
version: v2
2020
publisher: google
2121
---
2222

@@ -26,6 +26,8 @@ metadata:
2626

2727
- **Framework:** React in Vite
2828
- **Styling:** Tailwind CSS (Dark Mode by default)
29+
- **Database:** BigQuery (Default database unless the user specifies an
30+
alternative)
2931
- **Icons:** `lucide-react`
3032
- **Date Formatting:** `date-fns`
3133
- **Data Fetching:** Axios (REST API calls)

skills/developing-with-bigquery/resources/BIGFRAMES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,4 @@ Guidelines for generating valid code with the BigFrames (BigQuery DataFrame) lib
2727
- Sort data chronologically and split around a timepoint before training.
2828
- Prediction horizon must be less than or equal to training horizon.
2929
* **PCA**: BigFrames' PCA class lacks simple `transform()` method. Use `predict()` instead.
30+
* **Model Persistence**: To persist a model. use `model.to_gbq()`. To load a persisted model, use `bpd.read_gbq_model()`.

0 commit comments

Comments
 (0)