|
1 | | -# Knowledge Assistants (KA) |
| 1 | +# Knowledge Assistants - Details |
2 | 2 |
|
3 | | -Knowledge Assistants are document-based Q&A systems that use RAG (Retrieval-Augmented Generation) to answer questions from indexed documents. |
| 3 | +For commands, see [SKILL.md](SKILL.md). |
4 | 4 |
|
5 | | -## What is a Knowledge Assistant? |
| 5 | +## Source Types |
6 | 6 |
|
7 | | -A KA connects to documents stored in a Unity Catalog Volume and allows users to ask natural language questions. The system: |
| 7 | +Both shapes go inside the `--json` body alongside `display_name` and `description` — see SKILL.md for the full invocation. |
8 | 8 |
|
9 | | -1. **Indexes** all documents in the volume (PDFs, text files, etc.) |
10 | | -2. **Retrieves** relevant chunks when a question is asked |
11 | | -3. **Generates** an answer using the retrieved context |
| 9 | +### Files (Volume) |
12 | 10 |
|
13 | | -## When to Use |
14 | | - |
15 | | -Use a Knowledge Assistant when: |
16 | | -- You have a collection of documents (policies, manuals, guides, reports) |
17 | | -- Users need to find specific information without reading entire documents |
18 | | -- You want to provide a conversational interface to documentation |
19 | | - |
20 | | -## Prerequisites |
21 | | - |
22 | | -Before creating a KA, you need documents in a Unity Catalog Volume: |
23 | | - |
24 | | -**Option 1: Use existing documents** |
25 | | -- Upload PDFs/text files to a Volume manually or via SDK |
26 | | - |
27 | | -**Option 2: Generate synthetic documents** |
28 | | -- Use the `databricks-unstructured-pdf-generation` skill to create realistic PDF documents |
29 | | -- Each PDF gets a companion JSON file with question/guideline pairs for evaluation |
30 | | - |
31 | | -## Creating a Knowledge Assistant |
32 | | - |
33 | | -Use the `manage_ka` tool with `action="create_or_update"`: |
34 | | - |
35 | | -- `name`: "HR Policy Assistant" |
36 | | -- `volume_path`: "/Volumes/my_catalog/my_schema/raw_data/hr_docs" |
37 | | -- `description`: "Answers questions about HR policies and procedures" |
38 | | -- `instructions`: "Be helpful and always cite the specific policy document when answering. If you're unsure, say so." |
39 | | - |
40 | | -The tool will: |
41 | | -1. Create the KA with the specified volume as a knowledge source |
42 | | -2. Scan the volume for JSON files with example questions (from PDF generation) |
43 | | -3. Queue examples to be added once the endpoint is ready |
44 | | - |
45 | | -## Provisioning Timeline |
46 | | - |
47 | | -After creation, the KA endpoint needs to provision: |
48 | | - |
49 | | -| Status | Meaning | Duration | |
50 | | -|--------|---------|----------| |
51 | | -| `PROVISIONING` | Creating the endpoint | 2-5 minutes | |
52 | | -| `ONLINE` | Ready to use | - | |
53 | | -| `OFFLINE` | Not currently running | - | |
54 | | - |
55 | | -Use `manage_ka` with `action="get"` to check the status: |
56 | | - |
57 | | -- `tile_id`: "<the tile_id from create>" |
58 | | - |
59 | | -## Adding Example Questions |
60 | | - |
61 | | -Example questions help with: |
62 | | -- **Evaluation**: Test if the KA answers correctly |
63 | | -- **User onboarding**: Show users what to ask |
64 | | - |
65 | | -### Automatic (from PDF generation) |
66 | | - |
67 | | -If you used `generate_pdf_documents`, each PDF has a companion JSON with: |
68 | 11 | ```json |
69 | 12 | { |
70 | | - "question": "What is the company's remote work policy?", |
71 | | - "guideline": "Should mention the 3-day minimum in-office requirement" |
| 13 | + "display_name": "...", |
| 14 | + "description": "...", |
| 15 | + "source_type": "files", |
| 16 | + "files": {"path": "/Volumes/catalog/schema/volume/folder/"} |
72 | 17 | } |
73 | 18 | ``` |
74 | 19 |
|
75 | | -These are automatically added when `add_examples_from_volume=true` (default). |
76 | | - |
77 | | -### Manual |
| 20 | +Supported formats: PDF, TXT, MD, DOCX. |
78 | 21 |
|
79 | | -Examples can also be specified in the `manage_ka` create_or_update call if needed. |
| 22 | +### Vector Search Index |
80 | 23 |
|
81 | | -## Best Practices |
| 24 | +Use an existing index instead of auto-indexing: |
82 | 25 |
|
83 | | -### Document Organization |
84 | | - |
85 | | -- **One volume per topic**: e.g., `/Volumes/catalog/schema/raw_data/hr_docs`, `/Volumes/catalog/schema/raw_data/tech_docs` |
86 | | -- **Clear naming**: Name files descriptively so chunks are identifiable |
87 | | - |
88 | | -### Instructions |
89 | | - |
90 | | -Good instructions improve answer quality: |
91 | | - |
92 | | -``` |
93 | | -Be helpful and professional. When answering: |
94 | | -1. Always cite the specific document and section |
95 | | -2. If multiple documents are relevant, mention all of them |
96 | | -3. If the information isn't in the documents, clearly say so |
97 | | -4. Use bullet points for multi-part answers |
| 26 | +```json |
| 27 | +{ |
| 28 | + "display_name": "...", |
| 29 | + "description": "...", |
| 30 | + "source_type": "index", |
| 31 | + "index": { |
| 32 | + "index_name": "catalog.schema.my_index", |
| 33 | + "text_col": "content", |
| 34 | + "doc_uri_col": "source_url" |
| 35 | + } |
| 36 | +} |
98 | 37 | ``` |
99 | 38 |
|
100 | | -### Updating Content |
101 | | - |
102 | | -To update the indexed documents: |
103 | | -1. Add/remove/modify files in the volume |
104 | | -2. Call `manage_ka` with `action="create_or_update"`, the same name and `tile_id` |
105 | | -3. The KA will re-index the updated content |
106 | | - |
107 | | -## Example Workflow |
108 | | - |
109 | | -1. **Generate PDF documents** using `databricks-unstructured-pdf-generation` skill: |
110 | | - - Creates PDFs in `/Volumes/catalog/schema/raw_data/pdf_documents` |
111 | | - - Creates JSON files with question/guideline pairs |
112 | | - |
113 | | -2. **Create the Knowledge Assistant**: |
114 | | - - `name`: "My Document Assistant" |
115 | | - - `volume_path`: "/Volumes/catalog/schema/raw_data/pdf_documents" |
| 39 | +## Updating Content |
116 | 40 |
|
117 | | -3. **Wait for ONLINE status** (2-5 minutes) |
| 41 | +1. Add/modify/remove files in the Volume |
| 42 | +2. Re-sync: `databricks knowledge-assistants sync-knowledge-sources "knowledge-assistants/{ka_id}"` |
118 | 43 |
|
119 | | -4. **Examples are automatically added** from the JSON files |
120 | | - |
121 | | -5. **Test the KA** in the Databricks UI |
122 | | - |
123 | | -## Using KA in Supervisor Agents |
124 | | - |
125 | | -Knowledge Assistants can be used as agents in a Supervisor Agent (formerly Multi-Agent Supervisor, MAS). Each KA has an associated model serving endpoint. |
126 | | - |
127 | | -### Finding the Endpoint Name |
| 44 | +## Troubleshooting |
128 | 45 |
|
129 | | -Use `manage_ka` with `action="get"` to retrieve the KA details. The response includes: |
130 | | -- `tile_id`: The unique identifier for the KA |
131 | | -- `name`: The KA name (sanitized) |
132 | | -- `endpoint_status`: Current status (ONLINE, PROVISIONING, etc.) |
| 46 | +**KA stays in CREATING:** |
| 47 | +- Wait up to 10 minutes |
| 48 | +- Check workspace quotas |
| 49 | +- Verify volume path exists |
133 | 50 |
|
134 | | -The endpoint name follows this pattern: `ka-{tile_id}-endpoint` |
| 51 | +**Documents not indexed:** |
| 52 | +- Check file format (PDF, TXT, MD, DOCX) |
| 53 | +- Verify volume path (trailing slash matters) |
| 54 | +- Check file permissions |
135 | 55 |
|
136 | | -### Finding a KA by Name |
| 56 | +**Poor answer quality:** |
| 57 | +- Ensure documents are well-structured |
| 58 | +- Break large documents into smaller files |
| 59 | +- Add clear headings and sections |
137 | 60 |
|
138 | | -If you know the KA name but not the tile_id, use `manage_ka` with `action="find_by_name"`: |
| 61 | +## Evaluation Questions |
139 | 62 |
|
140 | | -```python |
141 | | -manage_ka(action="find_by_name", name="HR_Policy_Assistant") |
142 | | -# Returns: {"found": True, "tile_id": "01abc...", "name": "HR_Policy_Assistant", "endpoint_name": "ka-01abc...-endpoint"} |
143 | | -``` |
| 63 | +When testing a KA, check if the volume or project contains a `pdf_eval_questions.json` file with test questions: |
144 | 64 |
|
145 | | -### Example: Adding KA to Supervisor Agent |
146 | | - |
147 | | -```python |
148 | | -# First, find the KA |
149 | | -manage_ka(action="find_by_name", name="HR_Policy_Assistant") |
150 | | - |
151 | | -# Then use the tile_id in a Supervisor Agent |
152 | | -manage_mas( |
153 | | - action="create_or_update", |
154 | | - name="Support_MAS", |
155 | | - agents=[ |
156 | | - { |
157 | | - "name": "hr_agent", |
158 | | - "ka_tile_id": "<tile_id from find_by_name>", |
159 | | - "description": "Answers HR policy questions from the employee handbook" |
160 | | - } |
161 | | - ] |
162 | | -) |
| 65 | +```json |
| 66 | +{ |
| 67 | + "api_errors_guide.pdf": { |
| 68 | + "question": "What is the solution for error ERR-4521?", |
| 69 | + "expected_fact": "Call /api/v2/auth/refresh with refresh_token before the 3600s TTL expires" |
| 70 | + } |
| 71 | +} |
163 | 72 | ``` |
164 | 73 |
|
165 | | -## Troubleshooting |
166 | | - |
167 | | -### Endpoint stays in PROVISIONING |
168 | | - |
169 | | -- Check workspace capacity and quotas |
170 | | -- Verify the volume path is accessible |
171 | | -- Wait up to 10 minutes before investigating further |
172 | | - |
173 | | -### Documents not indexed |
174 | | - |
175 | | -- Ensure files are in a supported format (PDF, TXT, MD) |
176 | | -- Check file permissions in the volume |
177 | | -- Verify the volume path is correct |
178 | | - |
179 | | -### Poor answer quality |
180 | | - |
181 | | -- Add more specific instructions |
182 | | -- Ensure documents are well-structured |
183 | | -- Consider breaking large documents into smaller files |
| 74 | +Use these questions to validate retrieval accuracy. See [databricks-unstructured-pdf-generation](../databricks-unstructured-pdf-generation/SKILL.md) for generating test PDFs with eval questions. |
0 commit comments