Skip to content

Commit 2c6debc

Browse files
committed
feat: update skills from cloudtop
1 parent 04c4354 commit 2c6debc

14 files changed

Lines changed: 751 additions & 169 deletions

File tree

skills/bigquery-data-transfer-service/SKILL.md

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Before generating configurations, discover the actual values for the target
3737
project and region.
3838

3939
> [!TIP]
40+
>
4041
> If `deployment.yaml` already exists in the repository root, prioritize
4142
> extracting `project` and `region` from the target environment configuration
4243
> (e.g., `dev`).
@@ -45,6 +46,7 @@ project and region.
4546
2. **Region**: `gcloud config get compute/region`
4647

4748
> [!TIP]
49+
>
4850
> Use these commands to replace placeholders like `<PROJECT_ID>` with actual
4951
> values. Always remove associated comments that start with TODO once replaced.
5052
@@ -68,8 +70,8 @@ region.
6870
- Check if the transfer has at least one successful run: `bq ls
6971
--transfer_run --transfer_config=<RESOURCE_NAME>`
7072
- If found: Use existing transfer config.
71-
- If not found: Confirm with user if it's ok to trigger
72-
the transfer run.
73+
- If not found: Confirm with user if it's ok to trigger the transfer
74+
run.
7375
7476
- **Multiple Transfers Found**:
7577
@@ -80,8 +82,8 @@ region.
8082
8183
- Ask user if they want to enable it or create a new one.
8284
- To Enable: Instruct the user to update the transfer configuration
83-
within their `deployment.yaml` file by setting the `disabled`
84-
field to `false` for the specific transfer resource.
85+
within their `deployment.yaml` file by setting the `disabled` field
86+
to `false` for the specific transfer resource.
8587
8688
- **No Transfers Found**: Proceed to create new if needed.
8789
@@ -90,10 +92,12 @@ region.
9092
If creating a new transfer, discover the required parameters using the REST API
9193
and validate them with the user.
9294
93-
> [!TIP] If `<DATA_SOURCE_ID>` is unknown, run the discovery script
94-
> without `<DATA_SOURCE_ID>` argument to list available source IDs
95-
> (e.g., `google_cloud_storage`).
96-
> It uses the derived project and location from Step 0.
95+
> [!TIP]
96+
>
97+
> If `<DATA_SOURCE_ID>` is unknown, run the discovery script without
98+
> `<DATA_SOURCE_ID>` argument to list available source IDs (e.g.,
99+
> `google_cloud_storage`). It uses the derived project and location from Step 0.
100+
>
97101
> ```bash
98102
> python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID>
99103
> ```
@@ -106,24 +110,28 @@ and validate them with the user.
106110
python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID> <DATA_SOURCE_ID> <REGION>
107111
```
108112
109-
> [!IMPORTANT] Run this command every time a new transfer is being planned.
113+
> [!IMPORTANT]
114+
>
115+
> Run this command every time a new transfer is being planned.
110116
111-
2. > [!CAUTION] **Mandatory User Questionnaire (CRITICAL)**:
117+
2. > [!CAUTION]
118+
>
119+
> **Mandatory User Questionnaire (CRITICAL)**:
112120
113121
- **Explicitly identify ALL specific parameters** returned by the
114122
discovery script. **You MUST NOT generalize or vaguely summarize them.**
115123
- **OAuth Authorization (Google Data Sources)**: For Google ecosystem data
116124
sources (Google Ads, Youtube, etc.), if the user is not using a service
117125
account to configure the DTS transfer config (meaning the user is using
118126
End User Credentials or EUC to configure the transfer config), then
119-
generate an OAuth URI. Ask the user to visit this URL to authorize.
120-
Once the user provides the versionInfo code, use the code as
127+
generate an OAuth URI. Ask the user to visit this URL to authorize. Once
128+
the user provides the versionInfo code, use the code as
121129
`definition.versionInfo` in `deployment.yaml` and then you can proceed.
122-
- If any parameters are related to authentication,
123-
explicitly ask the user to provide the Secret Manager Resource ID
124-
(e.g., projects/my-project/secrets/my-secret) for these parameters
125-
- Present every required parameter to the user BEFORE generating
126-
config files.
130+
- If any parameters are related to authentication, explicitly ask the user
131+
to provide the Secret Manager Resource ID (e.g.,
132+
projects/my-project/secrets/my-secret) for these parameters
133+
- Present every required parameter to the user BEFORE generating config
134+
files.
127135
- Ask for verification of assets/tables to be ingested.
128136
129137
3. **Wait for User Response**: You **MUST NOT** proceed until parameters are

skills/dataform-bigquery/SKILL.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: Expertise in generating clean, correct, and efficient Dataform pipe
77
up a new Dataform project or configuring workflow_settings.yaml.
88
license: Apache-2.0
99
metadata:
10-
version: v1
10+
version: v2
1111
publisher: google
1212
---
1313

@@ -270,10 +270,8 @@ Usage in models:
270270
SELECT * FROM ${ref("my_iceberg_table")}
271271
```
272272

273-
> [!WARNING]
274-
>
275-
> You cannot create a BigQuery view directly from a source BigLake table (using
276-
> 4-part naming). It needs to be a native BigQuery table.
273+
You cannot create a BigQuery view directly from a source BigLake table (using
274+
4-part naming). This feature is only for native BigQuery tables.
277275

278276
## Unit Testing
279277

skills/dbt-bigquery/SKILL.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: Expert guidance for creating, modifying, and optimizing dbt pipelin
77
**setting up a new dbt project** or configuring existing one
88
license: Apache-2.0
99
metadata:
10-
version: v1
10+
version: v2
1111
publisher: google
1212
---
1313

@@ -45,8 +45,7 @@ Follow these steps when fulfilling dbt-related requests:
4545
### 1. Understand the Current State
4646

4747
- Locate the dbt project root by searching for a `dbt_project.yml` file.
48-
- **If `dbt_project.yml` is NOT found**: Assume the repository is
49-
uninitialized and guide the user through `dbt init`.
48+
- **If `dbt_project.yml` is NOT found**: Assume the repository/project is uninitialized.
5049
- Compile the dbt pipeline (`dbt compile`) to map the existing DAG.
5150
- Use the compiled graph as the **source of truth** for existing assets.
5251

@@ -116,8 +115,14 @@ Follow these steps when fulfilling dbt-related requests:
116115
dbt-bigquery`).
117116
- Instruct and help the user to add the venv/bin path to their PATH so
118117
the agent can use the dbt CLI in future steps.
119-
- **Repo Initialization**: If the repository or dbt project does not exist,
120-
instruct on how to initialize it.
118+
- **Repo Initialization**: If the repository or dbt project does not exist:
119+
- Generate all dbt artifacts under a dedicated subdirectory
120+
(e.g., `dbt/`) rather than the root.
121+
- **Silent & Scaffolded Initialization**: Initialize silently.
122+
Run `dbt init --skip-profile-setup` and manually create/edit the
123+
scaffolding: `dbt_project.yml`, `profiles.yml`,
124+
and other directories for `models/` and `tests/` as needed
125+
(i.e: if dbt init fails).
121126
- **Output Validation**: After generating code, ALWAYS attempt to validate and
122127
compile the project using `dbt compile` or similar commands to ensure
123128
integrity.
@@ -169,14 +174,16 @@ acceptable."
169174

170175
### Project & Profiles Config
171176

172-
- When initializing a new dbt project ensure `dbt_project.yml` is created with
173-
correct settings.
177+
- Always generate the dbt project and files within a dedicated folder (e.g.,
178+
`dbt/`) rather than the root folder to avoid orchestrator errors.
179+
- When initializing a new dbt project ensure `dbt_project.yml` is created
180+
with correct settings.
174181
- **Profiles Config**: ALWAYS ensure that a `profiles.yml` file is generated
175-
inside the dbt project folder alongside `dbt_project.yml` (or explicitly
176-
point `DBT_PROFILES_DIR` to it). Uncreated profiles are a leading cause of
177-
DAG pipeline failures (e.g., "Could not find profile named 'X'"). The
178-
`profiles.yml` must match the profile requested in `dbt_project.yml` and map
179-
correct BigQuery settings (project, dataset, location).
182+
inside the dedicated dbt project folder alongside `dbt_project.yml` (or
183+
explicitly point `DBT_PROFILES_DIR` to it). Uncreated profiles are a leading
184+
cause of DAG pipeline failures (e.g., "Could not find profile named 'X'").
185+
The `profiles.yml` must match the profile requested in `dbt_project.yml` and
186+
map correct BigQuery settings (project, dataset, location).
180187

181188
### Model Configuration
182189

@@ -210,11 +217,9 @@ The `dbt-bigquery` adapter does not natively support 4-part
210217
If you don't use environment prefixes for schemas, you can concatenate the
211218
`catalog` and `namespace` (dataset) into the `schema` field.
212219

213-
> [!WARNING]
214-
>
215-
> This approach breaks standard dbt environment management (e.g.,
216-
> `generate_schema_name`) if it attempts to prefix the combined string (e.g.,
217-
> `dev_my_catalog.my_namespace` is invalid in BigQuery).
220+
This approach is incompatible with standard dbt environment management (e.g.,
221+
`generate_schema_name`) if it attempts to prefix the combined string (e.g.,
222+
`dev_my_catalog.my_namespace` is invalid in BigQuery).
218223

219224
```yaml
220225
version: 2

skills/discovering-gcp-data-assets/SKILL.md

Lines changed: 66 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -19,32 +19,37 @@ description: |
1919
- Assets are outside Google Cloud
2020
license: Apache-2.0
2121
metadata:
22-
version: v1
22+
version: v4
2323
publisher: google
2424
---
2525

2626
# Instructions
2727

28-
## Step 1: Handle Public Datasets or Proceed to Search
28+
## Step 1: Prioritize Assets from the Conversation
2929

30-
Dataplex Entries Lookup provides the richest metadata for data assets. You MUST
30+
If the asset was created or mentioned earlier in the same conversation, then
31+
proceed with that asset instead of searching. Skip steps 2, 3, and 4.
32+
33+
## Step 2: Handle Public Datasets or Proceed to Search
34+
35+
Dataplex Lookup Context provides the richest metadata for data assets. You MUST
3136
prioritize using it for all Google Cloud assets, even if you already know their
3237
IDs.
3338

3439
- **Public Datasets (Direct Inspection)**: If the requested asset belongs to
35-
the `bigquery-public-data` project, Dataplex Entries Lookup will fail. You
36-
MUST skip Steps 2 and 3 and inspect the table directly using the `bq` CLI or
40+
the `bigquery-public-data` project, Dataplex Lookup Context will fail. You
41+
MUST skip Steps 3 and 4 and inspect the table directly using the `bq` CLI or
3742
BigQuery MCP tools instead.
38-
- **All Other Assets (Proceed to Step 2)**: For all other BigQuery, Cloud
43+
- **All Other Assets (Proceed to Step 3)**: For all other BigQuery, Cloud
3944
Storage, Spanner, BigLake Iceberg or general GCP data assets (whether their
40-
IDs are known or missing), you MUST proceed to **Step 2** to search the
45+
IDs are known or missing), you MUST proceed to **Step 3** to search the
4146
Dataplex catalog and obtain their full Entry Name.
4247

43-
## Step 2: Execute Discovery Search
48+
## Step 3: Execute Discovery Search
4449

4550
You MUST use the Dataplex search command to discover assets and retrieve their
4651
full `projects/...` entry names. This step is required even if you already know
47-
the asset's short ID (e.g., `my_dataset.my_table`), because Step 3 strictly
52+
the asset's short ID (e.g., `my_dataset.my_table`), because Step 4 strictly
4853
requires the full entry name.
4954

5055
> [!IMPORTANT] The `--project` parameter MUST ALWAYS be provided. This
@@ -86,8 +91,9 @@ Use this for exact keyword matches or technical strings (e.g., `name:order_v2`).
8691
- Use `:` for token/substring matches (e.g., `name:sales`).
8792
- Use `=` for exact matches. REQUIRED for `system`, `type`, and
8893
`location`.
89-
- **Singular Keywords**: ALWAYS convert plurals to singular (e.g., "product"
90-
NOT "products").
94+
- **Singular Keywords**: When performing keyword search, ALWAYS convert
95+
plurals to singular (e.g., "product" NOT "products"). Semantic search
96+
handles singular/plural variations and synonyms automatically.
9197
- **Scope Restriction**: You SHOULD restrict the search scope using a `parent`
9298
filter if the project or dataset is known (e.g.,
9399
`parent:projects/<PROJECT_ID>`).
@@ -126,26 +132,50 @@ gcloud dataplex entries search "<KEYWORD_SEARCH_QUERY>" \
126132
--limit=50
127133
```
128134

129-
*Criteria*: Once candidate assets are returned, proceed to Step 3 using the
135+
> [!IMPORTANT] Handling Search Results and Avoiding Loops:
136+
>
137+
> 1. **No Results:** If the search returns no entries:
138+
> * **Variation Rule:** You may try AT MOST 3 variations of the search
139+
> query (e.g., switching AND/OR clauses, adding/removing `parent:`,
140+
> removing `projectid:` or `location:`, trying `fully_qualified_name=`).
141+
> * **Stop Rule:** If after 3 attempts no results are found, STOP and
142+
> inform the user. Ask for clarification, specifically the Dataplex
143+
> **full entry name** if known, or identifiers such as **project ID**,
144+
> **dataset ID**, or **instance ID** to help narrow the search. Example:
145+
> "I couldn't find any tables matching that description after several
146+
> attempts. If you know the Dataplex full entry name (`projects/...`),
147+
> please provide it. Otherwise, please provide any identifiers you know,
148+
> such as project, dataset, or instance name, to help locate the asset."
149+
> 2. **Multiple Results:**
150+
> * If more than 10 results are returned, state that many matches were
151+
> found. Show the names of the first 5 entries and ask for
152+
> clarification.
153+
> * If 2-10 results are returned and you cannot definitively choose, list
154+
> them and ask the user.
155+
> 3. **Single Result:** Proceed to Step 3 with the full entry name.
156+
> 4. **Avoid Infinite Loops:** MUST NOT re-run identical or near-identical
157+
> queries. If Dataplex fails to return the expected asset, prioritize asking
158+
> the user for the exact resource ID or using Fallback 2 (Native Tools).
159+
160+
*Criteria*: Once candidate assets are returned, proceed to Step 4 using the
130161
**full entry names** from the search results.
162+
## Step 4: Lookup Context
131163

132-
## Step 3: Lookup Entry
133-
134-
You MUST use the **Entries Lookup** command to fetch schema and deep metadata
135-
for the relevant results obtained from Step 2.
164+
You MUST use the **Lookup Context** command to fetch schema and deep metadata
165+
for the relevant results obtained from Step 3.
136166

137-
> [!IMPORTANT] The argument MUST be the **name** (starting with `projects/`)
138-
> returned by the search result. Passing short table IDs, GCS URIs, or fully
139-
> qualified `bigquery:` prefixes is PROHIBITED and will fail.
167+
> [!IMPORTANT] The `--resources` parameter MUST be the **full name** (starting
168+
> with `projects/`) returned by the search result. Passing short table IDs, GCS
169+
> URIs, or fully qualified `bigquery:` prefixes is PROHIBITED and will fail.
140170
141171
### Command Execution
142172

143-
Use the `lookup_entry` MCP tool
173+
Use the `lookup_context` MCP tool
144174

145175
OR
146176

147177
```bash
148-
gcloud dataplex entries lookup "<FULL_ENTRY_NAME>"
178+
gcloud dataplex context lookup --resources="<FULL_ENTRY_NAME>"
149179
```
150180

151181
*Completion Criteria*: The command returns the detailed schema and business
@@ -155,7 +185,7 @@ context.
155185

156186
## Troubleshooting
157187

158-
### Lookup Fails or "Resource not found"
188+
### Context Lookup Fails or "Resource not found"
159189

160190
- **Cause**: Short table names were used improperly.
161191
- **Fix**: Ensure you use the correct entry name format from the search
@@ -167,12 +197,24 @@ context.
167197
- **Fix**: Switch to singular keywords. For semantic search, try more
168198
descriptive natural language.
169199

170-
### Lookup Fails with "NOT_FOUND" (despite correct format)
200+
### Context Lookup Fails with "NOT_FOUND" (despite correct format)
171201

172202
- **Cause**: The table belongs to a project (e.g., `bigquery-public-data`)
173203
that has not fully synchronized its metadata with the Dataplex Universal
174-
Catalog. While the entry appears in search, `entries lookup` is unavailable.
204+
Catalog. While the entry appears in search, `context lookup` is unavailable.
175205
- **Fix**: Fall back to direct inspection using native tools (e.g., `bq` CLI).
206+
- **Stop Rule:** If the native tool (e.g., `bq show`) also returns "Not
207+
Found", STOP. Do not restart the Dataplex discovery loop. Specifically ask
208+
the user to verify the **project ID** and **table ID**.
209+
210+
### Breaking the Research Loop
211+
212+
If you find yourself repeatedly searching for the same asset:
213+
214+
1. **STOP.**
215+
2. State what you have tried (e.g., "I tried searching Dataplex with X and Y,
216+
and checked `bq show`").
217+
3. Ask the user for the exact project, dataset, and table ID.
176218

177219
### Search Fails with "--project: Must be specified."
178220

0 commit comments

Comments
 (0)