Skip to content

Commit 6bf6776

Browse files
authored
Fix/vector search skill tool names (#364)
* fix: correct vector-search SKILL.md MCP tool names (#326) The MCP Tools section referenced 14 tool names, but the actual MCP server only implements 8. Two tools had wrong names and six don't exist (their functionality is bundled into other tools). Any session using the skill would call non-existent tools and fail silently. Changes: - Rewrote MCP Tools section to document the actual 8 tools - Added explicit code examples showing correct parameter names (e.g. `get_vs_index(index_name=...)` not `get_vs_index(name=...)`) - Added 2 new ground truth test cases to catch regressions Skill eval: 0.646 → 0.727 (+0.081), all 8 test cases corr=yes Co-authored-by: Isaac * Document query_vector MCP truncation issue * Lint fix
1 parent f81aa75 commit 6bf6776

File tree

3 files changed

+128
-13
lines changed

3 files changed

+128
-13
lines changed

.test/skills/databricks-vector-search/ground_truth.yaml

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,74 @@ test_cases:
297297
difficulty: easy
298298
source: auto_generated
299299

300+
- id: vs_mcp_create_endpoint_008
301+
inputs:
302+
prompt: "Use MCP tools to create a vector search endpoint called my-rag-endpoint"
303+
outputs:
304+
response: |
305+
```python
306+
result = create_or_update_vs_endpoint(
307+
name="my-rag-endpoint",
308+
endpoint_type="STANDARD"
309+
)
310+
```
311+
312+
The `create_or_update_vs_endpoint` tool is idempotent — it returns the existing endpoint if one with that name already exists.
313+
execution_success: true
314+
expectations:
315+
expected_facts:
316+
- "create_or_update_vs_endpoint is the correct MCP tool name"
317+
- "The tool is idempotent"
318+
expected_patterns:
319+
- pattern: "create_or_update_vs_endpoint"
320+
min_count: 1
321+
description: "Uses correct MCP tool name (not create_vs_endpoint)"
322+
guidelines:
323+
- "Must use create_or_update_vs_endpoint, NOT create_vs_endpoint"
324+
- "Must mention that the tool is idempotent"
325+
metadata:
326+
category: happy_path
327+
difficulty: easy
328+
source: manual
329+
330+
- id: vs_mcp_manage_data_009
331+
inputs:
332+
prompt: "How do I upsert documents into a Direct Access vector search index using MCP tools?"
333+
outputs:
334+
response: |
335+
```python
336+
result = manage_vs_data(
337+
index_name="catalog.schema.my_index",
338+
operation="upsert",
339+
inputs_json=[
340+
{"id": "doc1", "content": "Sample document", "embedding": [0.1, 0.2, ...]},
341+
{"id": "doc2", "content": "Another document", "embedding": [0.3, 0.4, ...]}
342+
]
343+
)
344+
```
345+
346+
Use `manage_vs_data` with `operation="upsert"` to insert or update vectors. Other supported operations: `"delete"`, `"scan"`, `"sync"`.
347+
execution_success: true
348+
expectations:
349+
expected_facts:
350+
- "manage_vs_data is the correct MCP tool for data operations"
351+
- "operation parameter accepts upsert, delete, scan, sync"
352+
- "inputs_json contains the vector data to upsert"
353+
expected_patterns:
354+
- pattern: "manage_vs_data"
355+
min_count: 1
356+
description: "Uses manage_vs_data (not upsert_vs_data)"
357+
- pattern: "upsert"
358+
min_count: 1
359+
description: "Specifies upsert operation"
360+
guidelines:
361+
- "Must use manage_vs_data with operation='upsert', NOT upsert_vs_data"
362+
- "Must mention other available operations (delete, scan, sync)"
363+
metadata:
364+
category: happy_path
365+
difficulty: medium
366+
source: manual
367+
300368
- id: vs_embedding_models_007
301369
inputs:
302370
prompt: "What embedding models are available for vector search indexes?"

databricks-skills/databricks-vector-search/SKILL.md

Lines changed: 59 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,7 @@ databricks vector-search indexes delete-index \
292292
| **Embedding dimension mismatch** | Ensure query and index dimensions match |
293293
| **Index not updating** | Check pipeline_type; use sync_index() for TRIGGERED |
294294
| **Out of capacity** | Upgrade to Storage-Optimized (1B+ vectors) |
295+
| **`query_vector` truncated by MCP tool** | MCP tool calls serialize arrays as JSON and can truncate large vectors (e.g. 1024-dim). Use `query_text` instead (for managed embedding indexes), or use the Databricks SDK/CLI to pass raw vectors |
295296

296297
## Embedding Models
297298

@@ -320,29 +321,74 @@ The following MCP tools are available for managing Vector Search infrastructure.
320321

321322
| Tool | Description |
322323
|------|-------------|
323-
| `create_vs_endpoint` | Create endpoint (STANDARD or STORAGE_OPTIMIZED). Async — check status with `get_vs_endpoint` |
324-
| `get_vs_endpoint` | Get endpoint details and status by name |
325-
| `list_vs_endpoints` | List all Vector Search endpoints in the workspace |
326-
| `delete_vs_endpoint` | Delete an endpoint (indexes must be deleted first) |
324+
| `create_or_update_vs_endpoint` | Create or update an endpoint (STANDARD or STORAGE_OPTIMIZED). Idempotent — returns existing if found |
325+
| `get_vs_endpoint` | Get endpoint details by name. Omit `name` to list all endpoints in the workspace |
326+
| `delete_vs_endpoint` | Delete an endpoint (all indexes must be deleted first) |
327+
328+
```python
329+
# Create or update an endpoint
330+
result = create_or_update_vs_endpoint(name="my-vs-endpoint", endpoint_type="STANDARD")
331+
# Returns {"name": "my-vs-endpoint", "endpoint_type": "STANDARD", "created": True}
332+
333+
# List all endpoints
334+
endpoints = get_vs_endpoint() # omit name to list all
335+
```
327336

328337
### Index Management
329338

330339
| Tool | Description |
331340
|------|-------------|
332-
| `create_vs_index` | Create a Delta Sync or Direct Access index on an endpoint |
333-
| `get_vs_index` | Get index details, status, and configuration |
334-
| `list_vs_indexes` | List all indexes on an endpoint |
335-
| `delete_vs_index` | Delete an index |
336-
| `sync_vs_index` | Trigger sync for TRIGGERED pipeline indexes |
341+
| `create_or_update_vs_index` | Create or update an index. Idempotent — auto-triggers initial sync for DELTA_SYNC indexes |
342+
| `get_vs_index` | Get index details by `index_name`. Pass `endpoint_name` (no `index_name`) to list all indexes on an endpoint |
343+
| `delete_vs_index` | Delete an index by fully-qualified name (catalog.schema.index_name) |
344+
345+
```python
346+
# Create a Delta Sync index with managed embeddings
347+
result = create_or_update_vs_index(
348+
name="catalog.schema.my_index",
349+
endpoint_name="my-vs-endpoint",
350+
primary_key="id",
351+
index_type="DELTA_SYNC",
352+
delta_sync_index_spec={
353+
"source_table": "catalog.schema.docs",
354+
"embedding_source_columns": [{"name": "content", "embedding_model_endpoint_name": "databricks-gte-large-en"}],
355+
"pipeline_type": "TRIGGERED"
356+
}
357+
)
358+
359+
# Get a specific index by name — parameter is index_name, not name
360+
index = get_vs_index(index_name="catalog.schema.my_index")
361+
362+
# List all indexes on an endpoint
363+
indexes = get_vs_index(endpoint_name="my-vs-endpoint")
364+
```
337365

338366
### Query and Data
339367

340368
| Tool | Description |
341369
|------|-------------|
342-
| `query_vs_index` | Query index with `query_text`, `query_vector`, or hybrid (`query_type="HYBRID"`) |
343-
| `upsert_vs_data` | Upsert vectors into a Direct Access index |
344-
| `delete_vs_data` | Delete vectors from a Direct Access index by primary key |
345-
| `scan_vs_index` | Retrieve all vectors from an index (for debugging/export) |
370+
| `query_vs_index` | Query index with `query_text`, `query_vector`, or hybrid (`query_type="HYBRID"`). Prefer `query_text` over `query_vector` — MCP tool calls can truncate large embedding arrays (1024-dim) |
371+
| `manage_vs_data` | CRUD operations on Direct Access indexes. `operation`: `"upsert"`, `"delete"`, `"scan"`, `"sync"` |
372+
373+
```python
374+
# Query an index
375+
results = query_vs_index(
376+
index_name="catalog.schema.my_index",
377+
columns=["id", "content"],
378+
query_text="machine learning best practices",
379+
num_results=5
380+
)
381+
382+
# Upsert data into a Direct Access index
383+
manage_vs_data(
384+
index_name="catalog.schema.my_index",
385+
operation="upsert",
386+
inputs_json=[{"id": "doc1", "content": "...", "embedding": [0.1, 0.2, ...]}]
387+
)
388+
389+
# Trigger manual sync for a TRIGGERED pipeline index
390+
manage_vs_data(index_name="catalog.schema.my_index", operation="sync")
391+
```
346392

347393
## Notes
348394

databricks-tools-core/databricks_tools_core/sql/sql_utils/executor.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ def execute(
8787
exec_params["row_limit"] = row_limit
8888
if query_tags:
8989
from databricks.sdk.service.sql import QueryTag
90+
9091
exec_params["query_tags"] = [
9192
QueryTag(key=k.strip(), value=v.strip())
9293
for pair in query_tags.split(",")

0 commit comments

Comments
 (0)