Skip to content

Commit 80ab6d6

Browse files
committed
docs: update README and skills to reflect new CLI syntax
- databases load: explicit --catalog/--schema/--table flags (no more dot-notation) - databases list: note * marker on active database - databases set/unset: documented - indexes create: --catalog option for managed databases (in addition to --connection-id) - search: --type and --column are now optional (inferred from indexes) - workflows: updated examples throughout
1 parent 9454c48 commit 80ab6d6

5 files changed

Lines changed: 85 additions & 55 deletions

File tree

README.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -135,28 +135,35 @@ Managed databases are Hotdata-owned catalogs you create and populate yourself (n
135135
```sh
136136
hotdata databases list [-w <id>] [-o table|json|yaml]
137137
hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [-o table|json|yaml]
138+
hotdata databases set <id>
139+
hotdata databases unset
138140
hotdata databases <name_or_id> [-o table|json|yaml]
139141
hotdata databases delete <name_or_id>
140142
hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] <cmd> [args...]
141143
hotdata databases <id> run <cmd> [args...]
142144

143-
hotdata databases tables list <database> [--schema <name>] [-o table|json|yaml]
144-
hotdata databases tables load <database> <table> --file ./data.parquet [--schema public]
145-
hotdata databases tables load <database> <table> --upload-id <id> [--schema public]
146-
hotdata databases tables delete <database> <table> [--schema public]
145+
# Preferred: load by catalog alias (auto-declares table if needed)
146+
hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>)
147+
148+
# Also available: explicit database flag
149+
hotdata databases tables list [--database <id_or_name>] [--schema <name>] [-o table|json|yaml]
150+
hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>)
151+
hotdata databases tables delete <table> [--database <id_or_name>] [--schema public]
147152
```
148153

149-
- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`. Use `--table` to declare tables up front (required before `tables load` on the current API).
154+
- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`.
155+
- `set` / `unset` — save or clear the active database. All `databases tables` and `context` commands default to it. The active database is marked with `*` in `databases list`.
156+
- `load` (top-level shorthand) — loads a parquet file into `--catalog.--schema.--table`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
150157
- `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode).
151-
- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment. Pass a database id (group-positional `<id>` like `sandbox run`, or `--database <id>`) to scope an existing database; omit both to auto-create a scratch one using `--name` / `--schema` / `--table` / `--expires-at`. Useful for launching an agent or child process whose API access is restricted to a single database.
158+
- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment.
152159
- For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`).
153160

154161
Example:
155162

156163
```sh
157-
hotdata databases create --name "Sales reporting" --catalog sales --table orders
158-
hotdata databases tables load sales orders --file ./orders.parquet
159-
hotdata query "SELECT count(*) FROM sales.public.orders"
164+
hotdata databases create --catalog airbnb
165+
hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
166+
hotdata query "SELECT count(*) FROM airbnb.public.listings"
160167
```
161168

162169
## Tables
@@ -233,14 +240,14 @@ hotdata queries <query_run_id> [-o table|json|yaml]
233240

234241
## Search
235242

236-
`--type` is **required** — no default. Pass either `vector` (similarity search via the index's embedding provider) or `bm25` (full-text search). Both run entirely server-side.
243+
Both run entirely server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Pass them explicitly when multiple indexes exist.
237244

238245
```sh
239246
# BM25 full-text search (requires a BM25 index on the column)
240-
hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
247+
hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] [--select <columns>] [--limit <n>] [-o table|json|csv]
241248

242249
# Vector search (requires a vector index with auto-embedding on the column)
243-
hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
250+
hotdata search "<query>" --table <table> [--type vector] [--column <source_text_column>] [--limit <n>]
244251
```
245252

246253
- **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side.
@@ -255,17 +262,21 @@ hotdata search "<query>" --type vector --table <table> --column <source_text_col
255262
Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`). The two scopes are mutually exclusive.
256263

257264
```sh
258-
# Connection-table scope
265+
# Managed database scope (catalog alias resolves via active database)
266+
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
267+
--column <cols> --type bm25|vector|sorted \
268+
[--name <name>] [--metric l2|cosine|dot] [--async] \
269+
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
270+
271+
# Connection-table scope (for non-managed connections)
259272
hotdata indexes list --connection-id <id> --schema <schema> --table <table> [-o table|json|yaml]
260273
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
261-
--name <name> --columns <cols> --type sorted|bm25|vector \
262-
[--metric l2|cosine|dot] [--async] \
263-
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
274+
--column <cols> --type sorted|bm25|vector [--name <name>] ...
264275
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
265276

266277
# Dataset scope
267278
hotdata indexes list --dataset-id <id> [-o table|json|yaml]
268-
hotdata indexes create --dataset-id <id> --name <name> --columns <cols> --type sorted|bm25|vector ...
279+
hotdata indexes create --dataset-id <id> --column <cols> --type sorted|bm25|vector [--name <name>] ...
269280
hotdata indexes delete --dataset-id <id> --name <name>
270281
```
271282

skills/hotdata-search/SKILL.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ Retrieval workloads in Hotdata: **BM25 full-text**, **vector similarity**, and t
1616

1717
## Search CLI
1818

19-
`--type` is **required**: `bm25` or `vector`. Both run server-side.
19+
Both run server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Specify them when multiple indexes exist.
2020

2121
```bash
2222
# BM25 (requires a BM25 index on the column)
23-
hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> \
23+
hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] \
2424
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
2525

2626
# Vector (requires a vector index; server auto-embeds the query text)
27-
hotdata search "<query>" --type vector --table <connection.schema.table> --column <source_text_column> \
27+
hotdata search "<query>" --table <connection.schema.table> [--type vector] [--column <source_text_column>] \
2828
[--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
2929
```
3030

@@ -33,6 +33,7 @@ hotdata search "<query>" --type vector --table <connection.schema.table> --colum
3333
| **`bm25`** | Server generates `bm25_search(table, col, 'text')`. Results sort by score (descending). |
3434
| **`vector`** | Pass plain-text query; name the **source text column** (e.g. `title`). Server embeds using the same provider/metric/dimensions as the index. SQL uses `vector_distance(col, 'text')`. Results sort by distance (ascending). |
3535

36+
- **Inference:** when `--type` or `--column` are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags.
3637
- **No vector index, or custom embedding model?** Use raw SQL via `hotdata query` (e.g. `cosine_distance(col, [<vec>])`). The removed `--model` / stdin-vector paths hardcoded `l2_distance` and are not supported.
3738
- **Before search:** create the right index (`indexes create --type bm25` or `--type vector`). See [references/INDEXES.md](references/INDEXES.md).
3839
- Default `--limit` is 10.
@@ -48,15 +49,19 @@ Indexes attach to a **connection table** (`--connection-id` + `--schema` + `--ta
4849
hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [--workspace-id <ws>] [--output table|json|yaml]
4950
hotdata indexes list --dataset-id <dataset_id> [--workspace-id <ws>] [--output table|json|yaml]
5051

51-
# Connection table
52-
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
53-
--name <name> --columns <cols> --type bm25|vector \
54-
[--metric l2|cosine|dot] [--async] \
52+
# Managed database (catalog alias — uses the active database when the catalog matches)
53+
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
54+
--column <col> --type bm25|vector \
55+
[--name <name>] [--metric l2|cosine|dot] [--async] \
5556
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
57+
58+
# Connection table (raw connection ID)
59+
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
60+
--column <col> --type bm25|vector [--name <name>] ...
5661
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
5762

5863
# Dataset
59-
hotdata indexes create --dataset-id <dataset_id> --name <name> --columns <cols> --type bm25|vector ...
64+
hotdata indexes create --dataset-id <dataset_id> --column <col> --type bm25|vector [--name <name>] ...
6065
hotdata indexes delete --dataset-id <dataset_id> --name <name>
6166
```
6267

@@ -89,6 +94,6 @@ hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
8994

9095
1. `hotdata tables list --connection-id <id>` — confirm column types.
9196
2. `hotdata indexes list` — avoid duplicate indexes.
92-
3. `hotdata indexes create ... --type bm25|vector` (add `--async` if large).
93-
4. `hotdata search "..." --type bm25|vector --table ... --column ...`
97+
3. `hotdata indexes create --catalog <alias> --table <table> --column <col> --type bm25|vector` (add `--async` if large).
98+
4. `hotdata search "..." --table <catalog.table>``--type` and `--column` are inferred when there is one search index.
9499
5. Record what exists in **context:DATAMODEL** (core skill) when the workspace should remember index choices.

skills/hotdata-search/references/INDEXES.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,24 @@ Skip duplicates (same table, column, and purpose).
3030

3131
## 3. Create indexes
3232

33+
For managed databases (catalog alias — auto-selects the active database connection):
34+
35+
```bash
36+
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
37+
--column body --type bm25
38+
39+
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
40+
--column embedding --type vector --metric cosine
41+
```
42+
43+
For regular connections (explicit connection ID):
44+
3345
```bash
3446
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
35-
--name idx_posts_body_bm25 --columns body --type bm25
47+
--name idx_posts_body_bm25 --column body --type bm25
3648

3749
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
38-
--name idx_chunks_embedding --columns embedding --type vector --metric cosine
50+
--name idx_chunks_embedding --column embedding --type vector --metric cosine
3951
```
4052

4153
Large builds: `--async`, then `hotdata jobs list` / `hotdata jobs <job_id>`.

skills/hotdata/SKILL.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -189,25 +189,28 @@ hotdata connections create \
189189
hotdata databases list [--workspace-id <workspace_id>] [--output table|json|yaml]
190190
hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] [--output table|json|yaml]
191191
hotdata databases set <id_or_name>
192+
hotdata databases unset
192193
hotdata databases <id_or_name> [--workspace-id <workspace_id>] [--output table|json|yaml]
193194
hotdata databases delete <id_or_name> [--workspace-id <workspace_id>]
194195
hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] <cmd> [args...]
195196
hotdata databases <id> run <cmd> [args...]
196197
197-
# Dot-notation shorthand for load: database.table or database.schema.table
198-
hotdata databases load <database.table> [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
198+
# Preferred: load by catalog alias (auto-declares table if needed)
199+
hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
199200
201+
# Also available via tables subcommand
200202
hotdata databases tables list [--database <id_or_name>] [--schema <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
201-
hotdata databases tables load <table> [--database <id_or_name>] [--schema public] [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
203+
hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
202204
hotdata databases tables delete <table> [--database <id_or_name>] [--schema public] [--workspace-id <workspace_id>]
203205
```
204206

205-
- `list` — all managed databases in the workspace.
207+
- `list` — all managed databases in the workspace. Active database is marked with `*`.
206208
- `create` — creates a new managed database. `--name` is an optional human-readable display name. `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`); must be `[a-z_][a-z0-9_]*`. `--expires-at` accepts relative durations (`24h`, `7d`, `90m`) or an RFC 3339 timestamp; omitting means no expiry. Repeat `--table` to declare tables up front.
207209
- `set` — saves `<id_or_name>` as the active database. Subsequent `databases tables` and `context` commands use it automatically.
210+
- `unset` — clears the active database from config.
208211
- `<id_or_name>` — inspect one database (id, catalog, name, expires_at).
209212
- `delete` — removes the managed database; clears the active-database config if it matched.
210-
- `load` shorthand with dot notation (`database.table` or `database.schema.table`). Schema defaults to `public`.
213+
- `load` (top-level shorthand) — loads parquet into `--catalog.--schema.--table`. Accepts `--file`, `--url`, or `--upload-id`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
211214
- `tables list` — lists tables with `TABLE` (`<catalog>.<schema>.<table>`), `SYNCED`, `LAST_SYNC`. Uses active database when `--database` is omitted.
212215
- `tables load` — uploads a local parquet file (`--file`), a remote parquet URL (`--url`), or a pre-staged upload (`--upload-id`) and publishes with **replace** mode.
213216
- `tables delete` — drops a table from the managed database.
@@ -216,10 +219,9 @@ hotdata databases tables delete <table> [--database <id_or_name>] [--schema publ
216219
Example:
217220

218221
```
219-
hotdata databases create --name "Sales reporting" --catalog sales --table orders
220-
hotdata databases set <returned-id>
221-
hotdata databases tables load orders --file ./orders.parquet
222-
hotdata query "SELECT count(*) FROM sales.public.orders"
222+
hotdata databases create --catalog airbnb
223+
hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
224+
hotdata query "SELECT count(*) FROM airbnb.public.listings"
223225
```
224226

225227
### List Tables and Columns
@@ -457,17 +459,18 @@ Use a sandbox to explore tables and capture **analysis-oriented** notes in sandb
457459

458460
## Workflow: Creating a managed database (parquet)
459461

460-
1. Create the database and declare tables up front:
462+
1. Create the database with a catalog alias:
461463
```
462-
hotdata databases create --name mydb --table events --table users
464+
hotdata databases create --catalog mydb
463465
```
464-
2. Load parquet into each table:
466+
2. Load parquet per table (tables are auto-declared if needed):
465467
```
466-
hotdata databases tables load mydb events --file ./events.parquet
468+
hotdata databases load --catalog mydb --table events --file ./events.parquet
469+
hotdata databases load --catalog mydb --table events --url https://example.com/events.parquet
467470
```
468471
3. Confirm tables and query:
469472
```
470-
hotdata databases tables list mydb
473+
hotdata databases tables list
471474
hotdata query "SELECT * FROM mydb.public.events LIMIT 10"
472475
```
473476

skills/hotdata/references/WORKFLOWS.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -68,12 +68,12 @@ End-to-end checklists. Use the linked sections for command detail and guardrails
6868
1. [ ] `hotdata tables list --connection-id <id>` — pick text column (BM25) or embedding/text column (vector)
6969
2. [ ] `hotdata indexes list` — avoid duplicate bm25/vector indexes on the same column
7070
3. [ ] Create index:
71-
- [ ] **Keyword:** `hotdata indexes create … --type bm25 --columns <text_col>`
72-
- [ ] **Semantic:** `hotdata indexes create … --type vector --columns <col> [--metric cosine|l2|dot]`
71+
- [ ] **Managed DB:** `hotdata indexes create --catalog <alias> --table <tbl> --column <text_col> --type bm25|vector`
72+
- [ ] **Connection:** `hotdata indexes create --connection-id <id> --schema <s> --table <t> --column <col> --type bm25|vector [--metric cosine|l2|dot]`
7373
- [ ] Large build: add `--async`, then `hotdata jobs <job_id>`
74-
4. [ ] Search:
75-
- [ ] `hotdata search "…" --type bm25 --table <connection.schema.table> --column <col>`
76-
- [ ] `hotdata search "…" --type vector --table … --column <source_text_col>`
74+
4. [ ] Search (--type and --column inferred when one search index exists):
75+
- [ ] `hotdata search "…" --table <catalog.schema.table>` (auto-infer)
76+
- [ ] `hotdata search "…" --table … --type bm25 --column <col>` (explicit)
7777
5. [ ] (Optional) Note indexes in **context:DATAMODEL → Search & index summary**
7878

7979
**Detail:** [hotdata-search INDEXES.md](../../hotdata-search/references/INDEXES.md)
@@ -116,24 +116,23 @@ Both land queryable tables in the workspace; the path depends on **format** and
116116

117117
### Workflow: managed database (parquet)
118118

119-
1. Create the database and **declare tables** up front:
119+
1. Create the database with a catalog alias:
120120

121121
```bash
122-
hotdata databases create --name sales --table orders --table customers
122+
hotdata databases create --catalog sales
123123
```
124124

125-
2. Load parquet per table:
125+
2. Load parquet per table (tables are auto-declared if needed):
126126

127127
```bash
128-
hotdata databases tables load sales orders --file ./orders.parquet
128+
hotdata databases load --catalog sales --table orders --file ./orders.parquet
129+
hotdata databases load --catalog sales --table customers --url https://example.com/customers.parquet
129130
```
130131

131-
If load fails with *not declared*, add `--table` at create time. There is no `--url` on load — download parquet locally first.
132-
133132
3. Confirm and query:
134133

135134
```bash
136-
hotdata databases tables list sales
135+
hotdata databases tables list
137136
hotdata query "SELECT count(*) FROM sales.public.orders"
138137
```
139138

0 commit comments

Comments
 (0)