Skip to content

Commit 6fc33f7

Browse files
experimental: re-sync from a-d-k PR #533 (appkit-on-experimental)
Replaces the previous import (a-d-k commit 2228c3e on add_appkit) with the head of a-d-k PR #533 (commit 9c7a5b3 on appkit-on-experimental), which targets a-d-k's experimental branch. Changes: - Refresh 23 experimental skill directories from the new source. - Drop databricks-lakebase-provisioned — removed on a-d-k experimental. - databricks-apps-python: rename + SKILL.md now leads with AppKit (TypeScript + React SDK) and demotes Python frameworks to alternatives; 6-mcp-approach.md replaced with 6-cli-approach.md. - databricks-lakebase-autoscale/references/connection-patterns.md: change placeholder `user:password` to `<user>:<password>` so the secret scanner doesn't flag the doc-only example. Cosmetic only. - Continue to exclude databricks-model-serving and databricks-spark-declarative-pipelines (PR #73 TODOs #1b and #5). - Regenerate manifest.json and agents/openai.yaml stubs via scripts/skills.py generate. - Update experimental/README.md provenance section with the new SHA, branch, and divergence notes. Co-authored-by: Isaac
1 parent 2079f66 commit 6fc33f7

62 files changed

Lines changed: 4337 additions & 4364 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

experimental/README.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -68,12 +68,11 @@ See the root [README](../README.md) for details on the stable install path.
6868

6969
### 🚀 Development & Deployment
7070
- **databricks-bundles** - DABs for multi-environment deployments
71-
- **databricks-apps-python** - Python web apps (Dash, Streamlit, Flask) with foundation model integration
71+
- **databricks-apps-python** - Databricks apps. Prefers AppKit (TypeScript + React SDK) for new apps; falls back to Python frameworks (Dash, Streamlit, Gradio, Flask, FastAPI, Reflex) when Python is required
7272
- **databricks-python-sdk** - Python SDK, Connect, CLI, REST API
7373
- **databricks-config** - Profile authentication setup
7474
- **databricks-execution-compute** - Execute on Databricks compute
7575
- **databricks-lakebase-autoscale** - Autoscaling for Lakebase
76-
- **databricks-lakebase-provisioned** - Managed PostgreSQL for OLTP workloads
7776

7877
### 📚 Reference
7978
- **databricks-docs** - Documentation index via llms.txt
@@ -83,16 +82,34 @@ See the root [README](../README.md) for details on the stable install path.
8382
These skills are imported as a snapshot from
8483
[`databricks-solutions/ai-dev-kit/databricks-skills/`](https://github.com/databricks-solutions/ai-dev-kit/tree/main/databricks-skills).
8584

86-
**Source SHA**: [`2228c3e`](https://github.com/databricks-solutions/ai-dev-kit/commit/2228c3e880fbadd871882a5f99628300dcb9f2f1)
87-
on the `add_appkit` branch (5 commits ahead of `origin/main` at the time
88-
of import). Divergence from public main is small but meaningful: the
89-
`databricks-app-python``databricks-apps-python` rename had not yet been
90-
merged upstream, and importing from the renamed version is what prevents a
91-
3rd skill name collision with d-a-s's own `databricks-apps`. A few other
92-
local commits touch `databricks-bundles/SKILL.md` (2 lines),
93-
`databricks-lakebase-provisioned/SKILL.md` (2 lines), and
94-
`databricks-apps-python/SKILL.md` (64 lines). The full set of local
95-
deltas is tracked by the import commit on this branch.
85+
**Source SHA**: [`9c7a5b3`](https://github.com/databricks-solutions/ai-dev-kit/commit/9c7a5b3a3bf187c2b19d0b777768ecb52dd2de22)
86+
on the `appkit-on-experimental` branch of `jamesbroadhead/ai-dev-kit`
87+
the head of [a-d-k PR #533](https://github.com/databricks-solutions/ai-dev-kit/pull/533),
88+
which targets a-d-k's `experimental` branch. One commit ahead of
89+
`origin/experimental` at import time. Divergence from `experimental`
90+
is the PR #533 change set:
91+
92+
- `databricks-app-python``databricks-apps-python` rename (folder,
93+
baselines, manifests, install scripts, cross-skill mentions). The
94+
rename prevents a 3rd skill-name collision with d-a-s's own
95+
`databricks-apps` — alongside the two we already handle for
96+
`databricks-jobs` and `databricks-model-serving`.
97+
- `databricks-apps-python/SKILL.md` leads with AppKit (TypeScript +
98+
React SDK) as the recommended approach for new apps; Python
99+
frameworks (Dash, Streamlit, Gradio, Flask, FastAPI, Reflex) are
100+
demoted to an explicit alternative.
101+
- `install.sh` / `install.ps1` upstream changes wiring a-d-k to
102+
install d-a-s skills via a single GitHub tree call (out of scope
103+
for this snapshot, not imported here).
104+
105+
**Note**: the `experimental` branch of a-d-k previously removed
106+
`databricks-lakebase-provisioned`, which is why it is not present in
107+
this import. `databricks-model-serving` and
108+
`databricks-spark-declarative-pipelines` are intentionally excluded
109+
from this snapshot — see TODOs #1b and #5 on the import PR.
110+
111+
The full set of paths brought in is tracked by the import commit on
112+
this branch.
96113

97114
**Transition phase (until `ai-dev-kit` skills are locked):**
98115
- Source of truth is **upstream `ai-dev-kit`**. New work and bug fixes go there.
Lines changed: 49 additions & 158 deletions
Original file line numberDiff line numberDiff line change
@@ -1,183 +1,74 @@
1-
# Knowledge Assistants (KA)
1+
# Knowledge Assistants - Details
22

3-
Knowledge Assistants are document-based Q&A systems that use RAG (Retrieval-Augmented Generation) to answer questions from indexed documents.
3+
For commands, see [SKILL.md](SKILL.md).
44

5-
## What is a Knowledge Assistant?
5+
## Source Types
66

7-
A KA connects to documents stored in a Unity Catalog Volume and allows users to ask natural language questions. The system:
7+
Both shapes go inside the `--json` body alongside `display_name` and `description` — see SKILL.md for the full invocation.
88

9-
1. **Indexes** all documents in the volume (PDFs, text files, etc.)
10-
2. **Retrieves** relevant chunks when a question is asked
11-
3. **Generates** an answer using the retrieved context
9+
### Files (Volume)
1210

13-
## When to Use
14-
15-
Use a Knowledge Assistant when:
16-
- You have a collection of documents (policies, manuals, guides, reports)
17-
- Users need to find specific information without reading entire documents
18-
- You want to provide a conversational interface to documentation
19-
20-
## Prerequisites
21-
22-
Before creating a KA, you need documents in a Unity Catalog Volume:
23-
24-
**Option 1: Use existing documents**
25-
- Upload PDFs/text files to a Volume manually or via SDK
26-
27-
**Option 2: Generate synthetic documents**
28-
- Use the `databricks-unstructured-pdf-generation` skill to create realistic PDF documents
29-
- Each PDF gets a companion JSON file with question/guideline pairs for evaluation
30-
31-
## Creating a Knowledge Assistant
32-
33-
Use the `manage_ka` tool with `action="create_or_update"`:
34-
35-
- `name`: "HR Policy Assistant"
36-
- `volume_path`: "/Volumes/my_catalog/my_schema/raw_data/hr_docs"
37-
- `description`: "Answers questions about HR policies and procedures"
38-
- `instructions`: "Be helpful and always cite the specific policy document when answering. If you're unsure, say so."
39-
40-
The tool will:
41-
1. Create the KA with the specified volume as a knowledge source
42-
2. Scan the volume for JSON files with example questions (from PDF generation)
43-
3. Queue examples to be added once the endpoint is ready
44-
45-
## Provisioning Timeline
46-
47-
After creation, the KA endpoint needs to provision:
48-
49-
| Status | Meaning | Duration |
50-
|--------|---------|----------|
51-
| `PROVISIONING` | Creating the endpoint | 2-5 minutes |
52-
| `ONLINE` | Ready to use | - |
53-
| `OFFLINE` | Not currently running | - |
54-
55-
Use `manage_ka` with `action="get"` to check the status:
56-
57-
- `tile_id`: "<the tile_id from create>"
58-
59-
## Adding Example Questions
60-
61-
Example questions help with:
62-
- **Evaluation**: Test if the KA answers correctly
63-
- **User onboarding**: Show users what to ask
64-
65-
### Automatic (from PDF generation)
66-
67-
If you used `generate_pdf_documents`, each PDF has a companion JSON with:
6811
```json
6912
{
70-
"question": "What is the company's remote work policy?",
71-
"guideline": "Should mention the 3-day minimum in-office requirement"
13+
"display_name": "...",
14+
"description": "...",
15+
"source_type": "files",
16+
"files": {"path": "/Volumes/catalog/schema/volume/folder/"}
7217
}
7318
```
7419

75-
These are automatically added when `add_examples_from_volume=true` (default).
76-
77-
### Manual
20+
Supported formats: PDF, TXT, MD, DOCX.
7821

79-
Examples can also be specified in the `manage_ka` create_or_update call if needed.
22+
### Vector Search Index
8023

81-
## Best Practices
24+
Use an existing index instead of auto-indexing:
8225

83-
### Document Organization
84-
85-
- **One volume per topic**: e.g., `/Volumes/catalog/schema/raw_data/hr_docs`, `/Volumes/catalog/schema/raw_data/tech_docs`
86-
- **Clear naming**: Name files descriptively so chunks are identifiable
87-
88-
### Instructions
89-
90-
Good instructions improve answer quality:
91-
92-
```
93-
Be helpful and professional. When answering:
94-
1. Always cite the specific document and section
95-
2. If multiple documents are relevant, mention all of them
96-
3. If the information isn't in the documents, clearly say so
97-
4. Use bullet points for multi-part answers
26+
```json
27+
{
28+
"display_name": "...",
29+
"description": "...",
30+
"source_type": "index",
31+
"index": {
32+
"index_name": "catalog.schema.my_index",
33+
"text_col": "content",
34+
"doc_uri_col": "source_url"
35+
}
36+
}
9837
```
9938

100-
### Updating Content
101-
102-
To update the indexed documents:
103-
1. Add/remove/modify files in the volume
104-
2. Call `manage_ka` with `action="create_or_update"`, the same name and `tile_id`
105-
3. The KA will re-index the updated content
106-
107-
## Example Workflow
108-
109-
1. **Generate PDF documents** using `databricks-unstructured-pdf-generation` skill:
110-
- Creates PDFs in `/Volumes/catalog/schema/raw_data/pdf_documents`
111-
- Creates JSON files with question/guideline pairs
112-
113-
2. **Create the Knowledge Assistant**:
114-
- `name`: "My Document Assistant"
115-
- `volume_path`: "/Volumes/catalog/schema/raw_data/pdf_documents"
39+
## Updating Content
11640

117-
3. **Wait for ONLINE status** (2-5 minutes)
41+
1. Add/modify/remove files in the Volume
42+
2. Re-sync: `databricks knowledge-assistants sync-knowledge-sources "knowledge-assistants/{ka_id}"`
11843

119-
4. **Examples are automatically added** from the JSON files
120-
121-
5. **Test the KA** in the Databricks UI
122-
123-
## Using KA in Supervisor Agents
124-
125-
Knowledge Assistants can be used as agents in a Supervisor Agent (formerly Multi-Agent Supervisor, MAS). Each KA has an associated model serving endpoint.
126-
127-
### Finding the Endpoint Name
44+
## Troubleshooting
12845

129-
Use `manage_ka` with `action="get"` to retrieve the KA details. The response includes:
130-
- `tile_id`: The unique identifier for the KA
131-
- `name`: The KA name (sanitized)
132-
- `endpoint_status`: Current status (ONLINE, PROVISIONING, etc.)
46+
**KA stays in CREATING:**
47+
- Wait up to 10 minutes
48+
- Check workspace quotas
49+
- Verify volume path exists
13350

134-
The endpoint name follows this pattern: `ka-{tile_id}-endpoint`
51+
**Documents not indexed:**
52+
- Check file format (PDF, TXT, MD, DOCX)
53+
- Verify volume path (trailing slash matters)
54+
- Check file permissions
13555

136-
### Finding a KA by Name
56+
**Poor answer quality:**
57+
- Ensure documents are well-structured
58+
- Break large documents into smaller files
59+
- Add clear headings and sections
13760

138-
If you know the KA name but not the tile_id, use `manage_ka` with `action="find_by_name"`:
61+
## Evaluation Questions
13962

140-
```python
141-
manage_ka(action="find_by_name", name="HR_Policy_Assistant")
142-
# Returns: {"found": True, "tile_id": "01abc...", "name": "HR_Policy_Assistant", "endpoint_name": "ka-01abc...-endpoint"}
143-
```
63+
When testing a KA, check if the volume or project contains a `pdf_eval_questions.json` file with test questions:
14464

145-
### Example: Adding KA to Supervisor Agent
146-
147-
```python
148-
# First, find the KA
149-
manage_ka(action="find_by_name", name="HR_Policy_Assistant")
150-
151-
# Then use the tile_id in a Supervisor Agent
152-
manage_mas(
153-
action="create_or_update",
154-
name="Support_MAS",
155-
agents=[
156-
{
157-
"name": "hr_agent",
158-
"ka_tile_id": "<tile_id from find_by_name>",
159-
"description": "Answers HR policy questions from the employee handbook"
160-
}
161-
]
162-
)
65+
```json
66+
{
67+
"api_errors_guide.pdf": {
68+
"question": "What is the solution for error ERR-4521?",
69+
"expected_fact": "Call /api/v2/auth/refresh with refresh_token before the 3600s TTL expires"
70+
}
71+
}
16372
```
16473

165-
## Troubleshooting
166-
167-
### Endpoint stays in PROVISIONING
168-
169-
- Check workspace capacity and quotas
170-
- Verify the volume path is accessible
171-
- Wait up to 10 minutes before investigating further
172-
173-
### Documents not indexed
174-
175-
- Ensure files are in a supported format (PDF, TXT, MD)
176-
- Check file permissions in the volume
177-
- Verify the volume path is correct
178-
179-
### Poor answer quality
180-
181-
- Add more specific instructions
182-
- Ensure documents are well-structured
183-
- Consider breaking large documents into smaller files
74+
Use these questions to validate retrieval accuracy. See [databricks-unstructured-pdf-generation](../databricks-unstructured-pdf-generation/SKILL.md) for generating test PDFs with eval questions.

0 commit comments

Comments
 (0)