11---
22name : data-docs
33description : >-
4- Fetch up-to-date, version-aware documentation for data engineering tools.
5- Use this skill when writing code that uses dbt, Airflow, Spark, Snowflake,
6- BigQuery, Databricks, Kafka, SQLAlchemy, Polars, or Great Expectations.
7- Activates for API lookups, configuration questions, code generation, or
8- debugging involving these data tools.
4+ Fetch up-to-date, version-aware documentation for data engineering tools
5+ and database platforms. Use this skill when writing code or SQL that uses
6+ dbt, Airflow, Spark, Snowflake, BigQuery, Databricks, DuckDB, PostgreSQL,
7+ ClickHouse, Kafka, SQLAlchemy, Polars, or Great Expectations. Activates
8+ for API lookups, SQL syntax, configuration questions, code generation, or
9+ debugging involving these data tools and platforms.
910---
1011
1112# Data Engineering Documentation Lookup
@@ -23,15 +24,30 @@ Activate this skill when the user:
2324- Works with PySpark transformations or Spark SQL
2425- Uses Snowflake SQL, Snowpark, or the Snowflake Python connector
2526- Uses BigQuery SQL or the Python client library
26- - Works with Databricks SDK or notebook code
27+ - Works with Databricks SQL or the Python SDK
28+ - Writes DuckDB SQL or uses the DuckDB Python API
29+ - Writes PostgreSQL SQL, functions, or extensions
30+ - Works with ClickHouse SQL, engines, or functions
2731- Writes Kafka producer/consumer code
2832- Uses SQLAlchemy ORM or Core queries
2933- Works with Polars DataFrame operations
3034- Sets up Great Expectations data validation
31- - Asks "how do I" questions about any data engineering library
32- - Needs API references, method signatures, or configuration options
35+ - Asks "how do I" questions about any data engineering library or platform
36+ - Needs SQL syntax, API references, method signatures, or configuration options
3337
34- ## How to Fetch Documentation
38+ ## Documentation Sources
39+
40+ This skill uses ** two methods** depending on the type of documentation:
41+
42+ 1 . ** Context7 CLI** (` ctx7 ` ) — For Python libraries and SDKs (dbt-core, Airflow,
43+ PySpark, Snowpark, etc.). These have indexed documentation in Context7.
44+ 2 . ** Web Fetch** (` webfetch ` ) — For database platform SQL documentation (Snowflake SQL,
45+ BigQuery SQL, Databricks SQL, DuckDB, PostgreSQL, ClickHouse). These platforms
46+ maintain official docs sites that can be fetched directly.
47+
48+ Check ` references/library-ids.md ` for the full mapping of which method to use.
49+
50+ ## Method 1: Context7 CLI (for Python libraries/SDKs)
3551
3652### Step 1: Identify the Library
3753
@@ -69,30 +85,58 @@ Write **specific, detailed queries** for better results:
6985- Good: ` "How to create incremental models with merge strategy in dbt" `
7086- Bad: ` "incremental" `
7187
72- ### Step 4: Use the Documentation
88+ ## Method 2: Web Fetch (for database platform SQL docs)
89+
90+ For Snowflake, BigQuery, Databricks, DuckDB, PostgreSQL, and ClickHouse
91+ platform documentation (SQL syntax, functions, DDL, configuration), use
92+ the ` webfetch ` tool to fetch specific documentation pages.
93+
94+ ### Step 1: Find the Right URL
95+
96+ Check ` references/library-ids.md ` for the ** Platform Documentation URLs**
97+ section. Each platform has a base URL and common page paths listed.
98+
99+ ### Step 2: Fetch the Documentation
100+
101+ Use the ` webfetch ` tool with the specific documentation URL and a prompt
102+ describing what information to extract:
103+
104+ ```
105+ webfetch(url="https://docs.snowflake.com/en/sql-reference/sql/merge",
106+ prompt="Extract the full MERGE syntax, parameters, and examples")
107+ ```
108+
109+ ### Step 3: Use the Documentation
73110
74111- Answer using the fetched documentation, not training data
75112- Include relevant code examples from the docs
76- - Cite the library version when relevant
113+ - Cite the documentation URL for reference
77114- If docs mention deprecations or breaking changes, highlight them
78115
79116## Guidelines
80117
81- - Maximum 3 CLI calls per user question to avoid rate limits
82- - Works without authentication; set ` CONTEXT7_API_KEY ` env var for higher rate limits
83- - If a CLI call fails (network error, rate limit), fall back to training data
118+ - Maximum 3 CLI/webfetch calls per user question to avoid rate limits
119+ - Context7 works without authentication; set ` CONTEXT7_API_KEY ` for higher limits
120+ - If a call fails (network error, rate limit), fall back to training data
84121 and note that the docs could not be fetched
85122- For dbt: always check ` dbt_project.yml ` for version and ` packages.yml ` for packages
86123- For Python tools: check ` requirements.txt ` or ` pyproject.toml ` for pinned versions
87124- When multiple libraries are relevant (e.g., dbt-core + dbt-snowflake), fetch docs
88125 for the most specific one first
126+ - For SQL platform docs, prefer the most specific page URL (e.g., the MERGE
127+ statement page, not the general SQL reference index)
89128
90129## Usage
91130
92131- ` /data-docs How do I create an incremental model in dbt? `
93132- ` /data-docs What Airflow operators are available for BigQuery? `
94133- ` /data-docs How to use window functions in PySpark? `
95134- ` /data-docs Snowpark DataFrame API for joins `
96-
97- Use the bash tool to run ` ctx7 ` CLI commands. Reference ` library-ids.md ` for
98- pre-mapped library IDs to skip the resolution step.
135+ - ` /data-docs Snowflake MERGE statement syntax `
136+ - ` /data-docs DuckDB window functions `
137+ - ` /data-docs PostgreSQL JSONB operators `
138+ - ` /data-docs ClickHouse MergeTree engine settings `
139+
140+ Use the bash tool to run ` ctx7 ` CLI commands for libraries, and the ` webfetch `
141+ tool for platform SQL documentation. Reference ` library-ids.md ` for the full
142+ mapping of tools, IDs, and URLs.
0 commit comments