Skip to content

Commit 2035bab

Browse files
experimental: import ai-dev-kit skills as best-effort skills
Adds an experimental/ directory containing the 26 agent skills from databricks-solutions/ai-dev-kit. These are imported as a snapshot on a best-effort basis — they are not officially supported skills and follow a looser contract than skills/ (no agents/openai.yaml, no shared-asset sync, no SKILL_METADATA gate). The manifest now exposes them under a new top-level experimental_skills map so consumers can distinguish them from stable skills and skip them by default. scripts/skills.py handles the new directory; the existing generate / validate flow is unchanged for stable skills. Co-authored-by: Isaac
1 parent d21d74e commit 2035bab

153 files changed

Lines changed: 49408 additions & 25 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,23 @@ Run this command in chat:
2424

2525
- **databricks-apps** - Build full-stack TypeScript apps on Databricks using AppKit
2626

27+
See [`skills/`](./skills/) for the full list of supported skills.
28+
29+
## Experimental Skills
30+
31+
The [`experimental/`](./experimental/) directory contains additional skills
32+
imported from [databricks-solutions/ai-dev-kit](https://github.com/databricks-solutions/ai-dev-kit)
33+
on a **best-effort basis**.
34+
35+
- Experimental skills are **not officially supported** — they may be used, but
36+
do not follow the same review / quality bar as the stable skills under
37+
[`skills/`](./skills/).
38+
- They are **not installed by default** by `databricks experimental aitools
39+
skills install`. Pass `--experimental` to install all of them, or install a
40+
specific one by name.
41+
- See [`experimental/README.md`](./experimental/README.md) for the full list
42+
and caveats.
43+
2744
## Structure
2845

2946
Each skill follows the [Agent Skills Specification](https://agentskills.io/specification):

experimental/README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
> ⚠️ **Experimental — best-effort, not officially supported**
2+
>
3+
> The skills in this directory are imported from
4+
> [databricks-solutions/ai-dev-kit](https://github.com/databricks-solutions/ai-dev-kit)
5+
> on a best-effort basis. They may be useful, but they are **not officially
6+
> supported** as part of `databricks-agent-skills`:
7+
>
8+
> - They do not follow the same review / quality bar as the skills in
9+
> [`../skills/`](../skills/).
10+
> - They may be out of date relative to upstream `ai-dev-kit`.
11+
> - They may overlap or conflict with the stable skills (e.g.
12+
> `databricks-jobs`, `databricks-model-serving` exist in both directories).
13+
> - They are not installed by `databricks experimental aitools skills install`
14+
> by default — you have to opt in (see the root README).
15+
>
16+
> File issues against this directory in this repo; do not file issues against
17+
> `ai-dev-kit` for skills installed via `databricks-agent-skills`.
18+
19+
---
20+
21+
# Databricks Skills for Claude Code
22+
23+
Skills that teach Claude Code how to work effectively with Databricks - providing patterns, best practices, and code examples that work with Databricks MCP tools.
24+
25+
## Installation
26+
27+
These experimental skills are **not** installed by default. To install them via the Databricks CLI:
28+
29+
```bash
30+
# Install all experimental skills at once
31+
databricks experimental aitools skills install --experimental
32+
33+
# Install a single experimental skill by name
34+
databricks experimental aitools skills install databricks-iceberg
35+
```
36+
37+
See the root [README](../README.md) for details on the stable install path.
38+
39+
## Available Skills
40+
41+
### 🤖 AI & Agents
42+
- **databricks-ai-functions** - Built-in AI Functions (ai_classify, ai_extract, ai_summarize, ai_query, ai_forecast, ai_parse_document, and more) with SQL and PySpark patterns, function selection guidance, document processing pipelines, and custom RAG (parse → chunk → index → query)
43+
- **databricks-agent-bricks** - Knowledge Assistants, Genie Spaces, Supervisor Agents
44+
- **databricks-genie** - Genie Spaces: create, curate, and query via Conversation API
45+
- **databricks-model-serving** - Deploy MLflow models and AI agents to endpoints *(also available as stable skill)*
46+
- **databricks-mlflow-evaluation** - End-to-end agent evaluation workflow
47+
- **databricks-unstructured-pdf-generation** - Generate synthetic PDFs for RAG
48+
- **databricks-vector-search** - Vector similarity search for RAG and semantic search
49+
50+
### 📊 Analytics & Dashboards
51+
- **databricks-aibi-dashboards** - Databricks AI/BI dashboards (with SQL validation workflow)
52+
- **databricks-metric-views** - Metric Views for governed metrics
53+
- **databricks-unity-catalog** - System tables for lineage, audit, billing
54+
55+
### 🔧 Data Engineering
56+
- **databricks-dbsql** - Databricks SQL warehouse patterns
57+
- **databricks-iceberg** - Apache Iceberg tables (Managed/Foreign), UniForm, Iceberg REST Catalog, Iceberg Clients Interoperability
58+
- **databricks-spark-declarative-pipelines** - SDP (formerly DLT) in SQL/Python
59+
- **databricks-spark-structured-streaming** - Spark Structured Streaming patterns
60+
- **databricks-jobs** - Multi-task workflows, triggers, schedules *(also available as stable skill)*
61+
- **databricks-synthetic-data-gen** - Realistic test data with Faker
62+
- **databricks-zerobus-ingest** - Zerobus ingest patterns
63+
- **spark-python-data-source** - Python data sources for Spark
64+
65+
### 🚀 Development & Deployment
66+
- **databricks-bundles** - DABs for multi-environment deployments
67+
- **databricks-apps-python** - Python web apps (Dash, Streamlit, Flask) with foundation model integration
68+
- **databricks-python-sdk** - Python SDK, Connect, CLI, REST API
69+
- **databricks-config** - Profile authentication setup
70+
- **databricks-execution-compute** - Execute on Databricks compute
71+
- **databricks-lakebase-autoscale** - Autoscaling for Lakebase
72+
- **databricks-lakebase-provisioned** - Managed PostgreSQL for OLTP workloads
73+
74+
### 📚 Reference
75+
- **databricks-docs** - Documentation index via llms.txt
76+
77+
## Provenance
78+
79+
These skills are imported as a snapshot from
80+
[`databricks-solutions/ai-dev-kit/databricks-skills/`](https://github.com/databricks-solutions/ai-dev-kit/tree/main/databricks-skills).
81+
Upstream changes are not automatically synced — see the
82+
[contributing notes](../CONTRIBUTING.md) for the current sync process.
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Knowledge Assistants (KA)
2+
3+
Knowledge Assistants are document-based Q&A systems that use RAG (Retrieval-Augmented Generation) to answer questions from indexed documents.
4+
5+
## What is a Knowledge Assistant?
6+
7+
A KA connects to documents stored in a Unity Catalog Volume and allows users to ask natural language questions. The system:
8+
9+
1. **Indexes** all documents in the volume (PDFs, text files, etc.)
10+
2. **Retrieves** relevant chunks when a question is asked
11+
3. **Generates** an answer using the retrieved context
12+
13+
## When to Use
14+
15+
Use a Knowledge Assistant when:
16+
- You have a collection of documents (policies, manuals, guides, reports)
17+
- Users need to find specific information without reading entire documents
18+
- You want to provide a conversational interface to documentation
19+
20+
## Prerequisites
21+
22+
Before creating a KA, you need documents in a Unity Catalog Volume:
23+
24+
**Option 1: Use existing documents**
25+
- Upload PDFs/text files to a Volume manually or via SDK
26+
27+
**Option 2: Generate synthetic documents**
28+
- Use the `databricks-unstructured-pdf-generation` skill to create realistic PDF documents
29+
- Each PDF gets a companion JSON file with question/guideline pairs for evaluation
30+
31+
## Creating a Knowledge Assistant
32+
33+
Use the `manage_ka` tool with `action="create_or_update"`:
34+
35+
- `name`: "HR Policy Assistant"
36+
- `volume_path`: "/Volumes/my_catalog/my_schema/raw_data/hr_docs"
37+
- `description`: "Answers questions about HR policies and procedures"
38+
- `instructions`: "Be helpful and always cite the specific policy document when answering. If you're unsure, say so."
39+
40+
The tool will:
41+
1. Create the KA with the specified volume as a knowledge source
42+
2. Scan the volume for JSON files with example questions (from PDF generation)
43+
3. Queue examples to be added once the endpoint is ready
44+
45+
## Provisioning Timeline
46+
47+
After creation, the KA endpoint needs to provision:
48+
49+
| Status | Meaning | Duration |
50+
|--------|---------|----------|
51+
| `PROVISIONING` | Creating the endpoint | 2-5 minutes |
52+
| `ONLINE` | Ready to use | - |
53+
| `OFFLINE` | Not currently running | - |
54+
55+
Use `manage_ka` with `action="get"` to check the status:
56+
57+
- `tile_id`: "<the tile_id from create>"
58+
59+
## Adding Example Questions
60+
61+
Example questions help with:
62+
- **Evaluation**: Test if the KA answers correctly
63+
- **User onboarding**: Show users what to ask
64+
65+
### Automatic (from PDF generation)
66+
67+
If you used `generate_pdf_documents`, each PDF has a companion JSON with:
68+
```json
69+
{
70+
"question": "What is the company's remote work policy?",
71+
"guideline": "Should mention the 3-day minimum in-office requirement"
72+
}
73+
```
74+
75+
These are automatically added when `add_examples_from_volume=true` (default).
76+
77+
### Manual
78+
79+
Examples can also be specified in the `manage_ka` create_or_update call if needed.
80+
81+
## Best Practices
82+
83+
### Document Organization
84+
85+
- **One volume per topic**: e.g., `/Volumes/catalog/schema/raw_data/hr_docs`, `/Volumes/catalog/schema/raw_data/tech_docs`
86+
- **Clear naming**: Name files descriptively so chunks are identifiable
87+
88+
### Instructions
89+
90+
Good instructions improve answer quality:
91+
92+
```
93+
Be helpful and professional. When answering:
94+
1. Always cite the specific document and section
95+
2. If multiple documents are relevant, mention all of them
96+
3. If the information isn't in the documents, clearly say so
97+
4. Use bullet points for multi-part answers
98+
```
99+
100+
### Updating Content
101+
102+
To update the indexed documents:
103+
1. Add/remove/modify files in the volume
104+
2. Call `manage_ka` with `action="create_or_update"`, the same name and `tile_id`
105+
3. The KA will re-index the updated content
106+
107+
## Example Workflow
108+
109+
1. **Generate PDF documents** using `databricks-unstructured-pdf-generation` skill:
110+
- Creates PDFs in `/Volumes/catalog/schema/raw_data/pdf_documents`
111+
- Creates JSON files with question/guideline pairs
112+
113+
2. **Create the Knowledge Assistant**:
114+
- `name`: "My Document Assistant"
115+
- `volume_path`: "/Volumes/catalog/schema/raw_data/pdf_documents"
116+
117+
3. **Wait for ONLINE status** (2-5 minutes)
118+
119+
4. **Examples are automatically added** from the JSON files
120+
121+
5. **Test the KA** in the Databricks UI
122+
123+
## Using KA in Supervisor Agents
124+
125+
Knowledge Assistants can be used as agents in a Supervisor Agent (formerly Multi-Agent Supervisor, MAS). Each KA has an associated model serving endpoint.
126+
127+
### Finding the Endpoint Name
128+
129+
Use `manage_ka` with `action="get"` to retrieve the KA details. The response includes:
130+
- `tile_id`: The unique identifier for the KA
131+
- `name`: The KA name (sanitized)
132+
- `endpoint_status`: Current status (ONLINE, PROVISIONING, etc.)
133+
134+
The endpoint name follows this pattern: `ka-{tile_id}-endpoint`
135+
136+
### Finding a KA by Name
137+
138+
If you know the KA name but not the tile_id, use `manage_ka` with `action="find_by_name"`:
139+
140+
```python
141+
manage_ka(action="find_by_name", name="HR_Policy_Assistant")
142+
# Returns: {"found": True, "tile_id": "01abc...", "name": "HR_Policy_Assistant", "endpoint_name": "ka-01abc...-endpoint"}
143+
```
144+
145+
### Example: Adding KA to Supervisor Agent
146+
147+
```python
148+
# First, find the KA
149+
manage_ka(action="find_by_name", name="HR_Policy_Assistant")
150+
151+
# Then use the tile_id in a Supervisor Agent
152+
manage_mas(
153+
action="create_or_update",
154+
name="Support_MAS",
155+
agents=[
156+
{
157+
"name": "hr_agent",
158+
"ka_tile_id": "<tile_id from find_by_name>",
159+
"description": "Answers HR policy questions from the employee handbook"
160+
}
161+
]
162+
)
163+
```
164+
165+
## Troubleshooting
166+
167+
### Endpoint stays in PROVISIONING
168+
169+
- Check workspace capacity and quotas
170+
- Verify the volume path is accessible
171+
- Wait up to 10 minutes before investigating further
172+
173+
### Documents not indexed
174+
175+
- Ensure files are in a supported format (PDF, TXT, MD)
176+
- Check file permissions in the volume
177+
- Verify the volume path is correct
178+
179+
### Poor answer quality
180+
181+
- Add more specific instructions
182+
- Ensure documents are well-structured
183+
- Consider breaking large documents into smaller files

0 commit comments

Comments
 (0)