Skip to content

Commit 66669e6

Browse files
committed
docs(skill): align hotdata skill with CLI behavior
Document workspace-wide indexes list, optional filters, and dataset scope. Add datasets update, agent skills (install/status), shell completions, and global command inventory. Cross-update WORKFLOWS, MODEL_BUILD, and DATA_MODEL.template references. Bump skill metadata to 0.2.1.
1 parent 2ee2666 commit 66669e6

4 files changed

Lines changed: 73 additions & 16 deletions

File tree

skills/hotdata/SKILL.md

Lines changed: 52 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: hotdata
3-
description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, manage workspace context and stored docs such as context:DATAMODEL via the context API (`hotdata context`), or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", "workspace context", "pull context", "push context", "data model", "context:DATAMODEL", or asks you to use the hotdata CLI.
4-
version: 0.2.0
3+
description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, manage workspace context and stored docs such as context:DATAMODEL via the context API (`hotdata context`), install or update the bundled agent skills (`hotdata skills`), generate shell completions (`hotdata completions`), or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", "workspace context", "pull context", "push context", "data model", "context:DATAMODEL", or asks you to use the hotdata CLI.
4+
version: 0.2.1
55
---
66

77
# Hotdata CLI Skill
@@ -80,6 +80,10 @@ Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md)
8080

8181
## Available Commands
8282

83+
Top-level subcommands (each detailed below): **`auth`**, **`datasets`**, **`query`**, **`workspaces`**, **`connections`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`sandbox`**, **`context`**, **`completions`**.
84+
85+
Global CLI options: **`--api-key`**, **`-v` / `--version`**, **`-h` / `--help`**. Hidden developer flag: **`--debug`** (verbose HTTP logs).
86+
8387
### List Workspaces
8488
```
8589
hotdata workspaces list [--output table|json|yaml]
@@ -127,7 +131,7 @@ hotdata connections create \
127131
--name "my-connection" \
128132
--type <source_type> \
129133
--config '<json object>' \
130-
[--workspace-id <workspace_id>]
134+
[--workspace-id <workspace_id>] [--output table|json|yaml]
131135
```
132136

133137
The `--config` JSON object must contain all **required** fields from `config` plus the **auth fields** merged in at the top level. Auth fields are not nested — they sit alongside config fields in the same object.
@@ -196,6 +200,12 @@ hotdata datasets <dataset_id> [--workspace-id <workspace_id>] [--output table|js
196200
- Use this to inspect schema before querying.
197201
- For the **qualified SQL name**, prefer **`FULL NAME` from `datasets list`** or the **`full_name` printed by `datasets create`**—especially for sandbox datasets, where the schema is **`datasets.<sandbox_id>`**, not `datasets.main`.
198202

203+
#### Update a dataset
204+
```
205+
hotdata datasets update <dataset_id> [--label <label>] [--table-name <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
206+
```
207+
- The CLI requires **at least one** of **`--label`** or **`--table-name`**.
208+
199209
#### Create a dataset
200210
```
201211
hotdata datasets create --label "My Dataset" --file data.csv [--table-name my_dataset] [--workspace-id <workspace_id>]
@@ -302,10 +312,12 @@ To create a dataset from a **saved query** still registered for the workspace, u
302312

303313
```
304314
# BM25 full-text search (requires BM25 index on the column)
305-
hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [--output table|json|csv]
315+
hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> \
316+
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
306317
307318
# Vector similarity search via server-side auto-embed (requires a vector index on the column)
308-
hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
319+
hotdata search "<query>" --type vector --table <connection.schema.table> --column <source_text_column> \
320+
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
309321
```
310322
- **`--type vector`** — pass the query as **plain text** and name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — distance metric, model, and dimensions match automatically. No client-side embedding, no `OPENAI_API_KEY` required. Generated SQL: `vector_distance(col, 'text')`.
311323
- **`--type bm25`** generates `bm25_search(table, col, 'text')` server-side; requires a BM25 index on the column.
@@ -318,23 +330,30 @@ hotdata search "<query>" --type vector --table <table> --column <source_text_col
318330

319331
### Indexes
320332

321-
Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`) — the two scopes are mutually exclusive. `--type` is required (no default).
333+
Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`) — the two scopes are mutually exclusive for **create** / **delete**. **`indexes list`** supports three ways to scope (below).
334+
335+
For **create**, `--type` is required (no default).
322336

323337
```
324-
# Connection-table scope
325-
hotdata indexes list --connection-id <connection_id> --schema <schema> --table <table> [--workspace-id <workspace_id>] [--output table|json|yaml]
338+
# List — default: all indexes on connection tables in the workspace (from information_schema; parallel fetch).
339+
# Narrow the scan with any subset of: --connection-id (-c), --schema, --table. With all three set, uses one table API call.
340+
# Dataset indexes are not included; use --dataset-id per dataset.
341+
hotdata indexes list [--connection-id <connection_id>] [--schema <schema>] [--table <table>] [--workspace-id <workspace_id>] [--output table|json|yaml]
342+
hotdata indexes list --dataset-id <dataset_id> [--workspace-id <workspace_id>] [--output table|json|yaml]
343+
344+
# Connection-table scope — create / delete
326345
hotdata indexes create --connection-id <connection_id> --schema <schema> --table <table> \
327346
--name <name> --columns <cols> --type sorted|bm25|vector \
328347
[--metric l2|cosine|dot] [--async] \
329348
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
330349
hotdata indexes delete --connection-id <connection_id> --schema <schema> --table <table> --name <name>
331350
332-
# Dataset scope (positional dataset_id replaced by --dataset-id flag)
333-
hotdata indexes list --dataset-id <dataset_id> [--workspace-id <workspace_id>] [--output table|json|yaml]
351+
# Dataset scope — create / delete (same --dataset-id flag)
334352
hotdata indexes create --dataset-id <dataset_id> --name <name> --columns <cols> --type sorted|bm25|vector ...
335353
hotdata indexes delete --dataset-id <dataset_id> --name <name>
336354
```
337-
- `--type` accepts `sorted` (B-tree-like; range/exact lookups), `bm25` (full-text), or `vector` (similarity). It is **required**.
355+
- **`indexes list`:** With no `--dataset-id`, lists indexes on **connection** tables (workspace scan or filtered scan). **Dataset** indexes are listed only via `--dataset-id` (one dataset per invocation).
356+
- `--type` accepts `sorted` (B-tree-like; range/exact lookups), `bm25` (full-text), or `vector` (similarity). It is **required** for **create**.
338357
- `--type vector` requires exactly one column.
339358
- `--async` submits index creation as a background job; poll with `hotdata jobs <job_id>`.
340359
- **Auto-embedding:** with `--type vector` on a **text** column, the server generates embeddings automatically. Pass `--embedding-provider-id` to pick a specific provider; if omitted, the first system provider is used. The generated column defaults to `{column}_embedding` (override with `--output-column`).
@@ -345,7 +364,7 @@ hotdata embedding-providers list [--workspace-id <workspace_id>] [--output table
345364
hotdata embedding-providers get <id> [--workspace-id <workspace_id>] [--output table|json|yaml]
346365
hotdata embedding-providers create --name <name> --provider-type service|local \
347366
[--config '<json>'] [--provider-api-key <key> | --secret-name <name>] [--workspace-id <workspace_id>]
348-
hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] [--provider-api-key <key> | --secret-name <name>]
367+
hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] [--provider-api-key <key> | --secret-name <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
349368
hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
350369
```
351370
- System providers (e.g. `sys_emb_openai`) come pre-configured. `list` shows IDs to pass to `--embedding-provider-id`.
@@ -361,6 +380,26 @@ hotdata jobs <job_id> [--workspace-id <workspace_id>] [--output table|json|yaml]
361380
- `--status`: `pending`, `running`, `succeeded`, `partially_succeeded`, `failed`.
362381
- Use `hotdata jobs <job_id>` to inspect a specific job's status, error, and result.
363382

383+
### Agent skills (`skills`)
384+
385+
Bundled Markdown skills (**`hotdata`**, **`hotdata-geospatial`**) ship with the CLI release tarball.
386+
387+
```
388+
hotdata skills install [--project]
389+
hotdata skills status
390+
```
391+
392+
- **`install`** — Downloads and installs skills to **`~/.hotdata/skills/<skill>`**, then symlinks into **`~/.agents/skills`** and into **`~/.claude/skills`** / **`~/.pi/skills`** when those directories exist. **`--project`** instead copies into **`./.agents/skills/<skill>`** in the current directory (and links `./.claude` / `./.pi` when present). The CLI may auto-refresh skills after an upgrade when appropriate.
393+
- **`status`** — Reports installed vs current CLI version and where skills are linked.
394+
395+
### Shell completions
396+
397+
```
398+
hotdata completions <bash|zsh|fish>
399+
```
400+
401+
Writes completion script for the chosen shell to stdout (redirect into your shell’s completion path as usual).
402+
364403
### Auth
365404
```
366405
hotdata auth login # Browser-based login (same as: hotdata auth)
@@ -437,7 +476,7 @@ Use a sandbox to explore tables and capture **analysis-oriented** notes in sandb
437476
5. Continue exploring and update the markdown as your **analysis picture** takes shape. Sandbox markdown is the living artifact for **that sandbox** only.
438477
6. When that picture should become **context:DATAMODEL** (outlive the sandbox or be shared with everyone), promote it: save consolidated Markdown as `./DATAMODEL.md` in the project directory and run `hotdata context push DATAMODEL` (if **context:DATAMODEL** already exists on the server, merge with `hotdata context show DATAMODEL` first—confirm `DATAMODEL` appears in `hotdata context list` before `show`).
439478

440-
Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions <bash|zsh|fish>`.
479+
**Also available:** `hotdata connections new` interactive connection wizard (no substitute for the programmatic **`connections create`** flow above).
441480

442481
## Workflow: Running a Query
443482

skills/hotdata/references/DATA_MODEL.template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Document safe join paths and caveats (fan-out, timing, different refresh cadence
5858
|-------|--------|--------------------------|--------------|-------|
5959
| | | | | |
6060

61-
_Use `hotdata indexes list -c <connection_id> --schema <schema> --table <table>` per table as needed._
61+
_Use `hotdata indexes list` for connection tables across the workspace (add `-c` / `--schema` / `--table` to narrow), or per table with all three set; use `hotdata indexes list --dataset-id <id>` for uploaded datasets._
6262

6363
## Datasets (uploaded)
6464

skills/hotdata/references/MODEL_BUILD.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,21 @@ For **small** schemas (e.g. ≤5 tables in a domain), a short **ASCII diagram**
7979

8080
## 5. Search and index awareness
8181

82-
For tables you care about:
82+
Inventory indexes on connection tables (whole workspace or filtered):
83+
84+
```bash
85+
hotdata indexes list [-w <workspace_id>]
86+
hotdata indexes list -c <connection_id> [--schema <schema>] [--table <table>] [-w <workspace_id>]
87+
```
88+
89+
Per table when you only need one:
8390

8491
```bash
8592
hotdata indexes list -c <connection_id> --schema <schema> --table <table> [-w <workspace_id>]
8693
```
8794

95+
For dataset-backed indexes: `hotdata indexes list --dataset-id <dataset_id>` (not merged into the workspace-wide connection-table list).
96+
8897
Note:
8998

9099
- **Vector**-friendly columns (embeddings) vs **BM25**-friendly text (`title`, `body`, `description`, …).

skills/hotdata/references/WORKFLOWS.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,12 +161,21 @@ High-cardinality **text** columns (`title`, `body`, `description`, …) may warr
161161

162162
### 2. Compare to existing indexes
163163

164-
For each `connection.schema.table` you care about:
164+
Start broad, then narrow:
165+
166+
```bash
167+
# All indexes on connection tables in the workspace (optional: -c / --schema / --table to filter)
168+
hotdata indexes list [--workspace-id <workspace_id>]
169+
```
170+
171+
For a single table, or to avoid scanning the whole workspace:
165172

166173
```bash
167174
hotdata indexes list --connection-id <connection_id> --schema <schema> --table <table> [--workspace-id <workspace_id>]
168175
```
169176

177+
Indexes on **uploaded datasets** are not included in that workspace scan — use `hotdata indexes list --dataset-id <dataset_id>` per dataset.
178+
170179
Skip creating a duplicate: same table + overlapping columns + same purpose (e.g. another bm25 on the same column).
171180

172181
### 3. Create indexes when justified

0 commit comments

Comments
 (0)