Skip to content

Commit 1174942

Browse files
committed
feat: add data-docs skill for version-aware documentation lookup
Add a built-in skill that uses Context7 CLI to fetch up-to-date, version-specific documentation for data engineering tools. Covers dbt, Airflow, Spark, Snowflake, BigQuery, Databricks, Kafka, SQLAlchemy, Polars, and Great Expectations. Ships out of the box with pre-mapped library IDs — no user configuration needed. https://claude.ai/code/session_01NZPdvEHNXDcmhgJt9RLMu1
1 parent 94f4536 commit 1174942

2 files changed

Lines changed: 187 additions & 0 deletions

File tree

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
name: data-docs
3+
description: >-
4+
Fetch up-to-date, version-aware documentation for data engineering tools.
5+
Use this skill when writing code that uses dbt, Airflow, Spark, Snowflake,
6+
BigQuery, Databricks, Kafka, SQLAlchemy, Polars, or Great Expectations.
7+
Activates for API lookups, configuration questions, code generation, or
8+
debugging involving these data tools.
9+
---
10+
11+
# Data Engineering Documentation Lookup
12+
13+
When writing code or answering questions about data engineering tools,
14+
use this skill to fetch current, version-specific documentation instead
15+
of relying on training data.
16+
17+
## When to Use
18+
19+
Activate this skill when the user:
20+
21+
- Writes or modifies dbt models, macros, or configurations
22+
- Develops Airflow DAGs, operators, or hooks
23+
- Works with PySpark transformations or Spark SQL
24+
- Uses Snowflake SQL, Snowpark, or the Snowflake Python connector
25+
- Uses BigQuery SQL or the Python client library
26+
- Works with Databricks SDK or notebook code
27+
- Writes Kafka producer/consumer code
28+
- Uses SQLAlchemy ORM or Core queries
29+
- Works with Polars DataFrame operations
30+
- Sets up Great Expectations data validation
31+
- Asks "how do I" questions about any data engineering library
32+
- Needs API references, method signatures, or configuration options
33+
34+
## How to Fetch Documentation
35+
36+
### Step 1: Identify the Library
37+
38+
Check the `references/library-ids.md` file for pre-mapped Context7 library IDs.
39+
If you find a match, skip to Step 3.
40+
41+
If the library isn't in the reference file, resolve it:
42+
43+
```bash
44+
npx -y ctx7@latest library <library-name> "<user's question>"
45+
```
46+
47+
Pick the result with the closest name match and highest score.
48+
Note the Library ID (format: `/org/project` or `/org/project/version`).
49+
50+
### Step 2: Check for Project Version
51+
52+
Look for version info in the user's project:
53+
54+
- `requirements.txt` or `pyproject.toml` — Python package versions
55+
- `dbt_project.yml` — dbt version (`require-dbt-version`)
56+
- `packages.yml` — dbt package versions
57+
- `setup.py` or `setup.cfg` — Python package versions
58+
59+
If a specific version is found, prefer version-specific library IDs
60+
(format: `/org/project/vX.Y.Z`) when available from the resolution step.
61+
62+
### Step 3: Query Documentation
63+
64+
```bash
65+
npx -y ctx7@latest docs <libraryId> "<specific question>"
66+
```
67+
68+
Write **specific, detailed queries** for better results:
69+
- Good: `"How to create incremental models with merge strategy in dbt"`
70+
- Bad: `"incremental"`
71+
72+
### Step 4: Use the Documentation
73+
74+
- Answer using the fetched documentation, not training data
75+
- Include relevant code examples from the docs
76+
- Cite the library version when relevant
77+
- If docs mention deprecations or breaking changes, highlight them
78+
79+
## Guidelines
80+
81+
- Maximum 3 CLI calls per user question to avoid rate limits
82+
- Works without authentication; set `CONTEXT7_API_KEY` env var for higher rate limits
83+
- If a CLI call fails (network error, rate limit), fall back to training data
84+
and note that the docs could not be fetched
85+
- For dbt: always check `dbt_project.yml` for version and `packages.yml` for packages
86+
- For Python tools: check `requirements.txt` or `pyproject.toml` for pinned versions
87+
- When multiple libraries are relevant (e.g., dbt-core + dbt-snowflake), fetch docs
88+
for the most specific one first
89+
90+
## Usage
91+
92+
- `/data-docs How do I create an incremental model in dbt?`
93+
- `/data-docs What Airflow operators are available for BigQuery?`
94+
- `/data-docs How to use window functions in PySpark?`
95+
- `/data-docs Snowpark DataFrame API for joins`
96+
97+
Use the bash tool to run `ctx7` CLI commands. Reference `library-ids.md` for
98+
pre-mapped library IDs to skip the resolution step.
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Data Engineering Library IDs for Context7
2+
3+
Use these Context7 library IDs directly with `npx -y ctx7@latest docs <libraryId> "<query>"`
4+
to skip the library resolution step.
5+
6+
If a library isn't listed here, resolve it first with:
7+
`npx -y ctx7@latest library <name> "<query>"`
8+
9+
## Transformation & Modeling
10+
11+
| Tool | Library ID | Python Package |
12+
|------|-----------|----------------|
13+
| dbt Core | `/dbt-labs/dbt-core` | dbt-core |
14+
| SQLAlchemy | `/sqlalchemy/sqlalchemy` | SQLAlchemy |
15+
| Polars | `/pola-rs/polars` | polars |
16+
| Pandas | `/pandas-dev/pandas` | pandas |
17+
18+
## Orchestration
19+
20+
| Tool | Library ID | Python Package |
21+
|------|-----------|----------------|
22+
| Apache Airflow | `/apache/airflow` | apache-airflow |
23+
24+
## Processing
25+
26+
| Tool | Library ID | Python Package |
27+
|------|-----------|----------------|
28+
| Apache Spark / PySpark | `/apache/spark` | pyspark |
29+
30+
## Cloud Data Warehouses
31+
32+
| Tool | Library ID | Python Package |
33+
|------|-----------|----------------|
34+
| Snowflake Connector | `/snowflakedb/snowflake-connector-python` | snowflake-connector-python |
35+
| Snowpark Python | `/snowflakedb/snowpark-python` | snowpark-python |
36+
| BigQuery Python Client | `/googleapis/python-bigquery` | google-cloud-bigquery |
37+
| Databricks SDK | `/databricks/databricks-sdk-py` | databricks-sdk |
38+
39+
## Streaming
40+
41+
| Tool | Library ID | Python Package |
42+
|------|-----------|----------------|
43+
| Confluent Kafka | `/confluentinc/confluent-kafka-python` | confluent-kafka |
44+
45+
## Data Quality
46+
47+
| Tool | Library ID | Python Package |
48+
|------|-----------|----------------|
49+
| Great Expectations | `/great-expectations/great_expectations` | great-expectations |
50+
51+
## dbt Packages
52+
53+
| Package | Library ID |
54+
|---------|-----------|
55+
| dbt-utils | `/dbt-labs/dbt-utils` |
56+
| dbt-expectations | `/calogica/dbt-expectations` |
57+
| dbt-date | `/calogica/dbt-date` |
58+
| dbt-codegen | `/dbt-labs/dbt-codegen` |
59+
| elementary | `/elementary-data/elementary` |
60+
61+
## dbt Adapters
62+
63+
| Adapter | Library ID |
64+
|---------|-----------|
65+
| dbt-snowflake | `/dbt-labs/dbt-snowflake` |
66+
| dbt-bigquery | `/dbt-labs/dbt-bigquery` |
67+
| dbt-databricks | `/databricks/dbt-databricks` |
68+
| dbt-postgres | `/dbt-labs/dbt-postgres` |
69+
| dbt-redshift | `/dbt-labs/dbt-redshift` |
70+
| dbt-spark | `/dbt-labs/dbt-spark` |
71+
72+
## Example Usage
73+
74+
```bash
75+
# Fetch dbt incremental model docs
76+
npx -y ctx7@latest docs /dbt-labs/dbt-core "how to create incremental models with merge strategy"
77+
78+
# Fetch Airflow operator reference
79+
npx -y ctx7@latest docs /apache/airflow "BigQueryInsertJobOperator parameters"
80+
81+
# Fetch Snowpark DataFrame API
82+
npx -y ctx7@latest docs /snowflakedb/snowpark-python "DataFrame join operations"
83+
84+
# Fetch PySpark window functions
85+
npx -y ctx7@latest docs /apache/spark "window functions in PySpark"
86+
87+
# Fetch Polars lazy evaluation
88+
npx -y ctx7@latest docs /pola-rs/polars "lazy evaluation and collect"
89+
```

0 commit comments

Comments
 (0)