docs: update README and skills to reflect new CLI syntax

eddietejeda · eddietejeda · commit 80ab6d629b32 · 2026-06-03T23:41:43.000-07:00
- databases load: explicit --catalog/--schema/--table flags (no more dot-notation)
- databases list: note * marker on active database
- databases set/unset: documented
- indexes create: --catalog option for managed databases (in addition to --connection-id)
- search: --type and --column are now optional (inferred from indexes)
- workflows: updated examples throughout
diff --git a/README.md b/README.md
@@ -135,28 +135,35 @@ Managed databases are Hotdata-owned catalogs you create and populate yourself (n
 ```sh
 hotdata databases list [-w <id>] [-o table|json|yaml]
 hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [-o table|json|yaml]
+hotdata databases set <id>
+hotdata databases unset
 hotdata databases <name_or_id> [-o table|json|yaml]
 hotdata databases delete <name_or_id>
 hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] <cmd> [args...]
 hotdata databases <id> run <cmd> [args...]
 
-hotdata databases tables list <database> [--schema <name>] [-o table|json|yaml]
-hotdata databases tables load <database> <table> --file ./data.parquet [--schema public]
-hotdata databases tables load <database> <table> --upload-id <id> [--schema public]
-hotdata databases tables delete <database> <table> [--schema public]
+# Preferred: load by catalog alias (auto-declares table if needed)
+hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>)
+
+# Also available: explicit database flag
+hotdata databases tables list [--database <id_or_name>] [--schema <name>] [-o table|json|yaml]
+hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>)
+hotdata databases tables delete <table> [--database <id_or_name>] [--schema public]
 ```
 
-- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`. Use `--table` to declare tables up front (required before `tables load` on the current API).
+- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`.
+- `set` / `unset` — save or clear the active database. All `databases tables` and `context` commands default to it. The active database is marked with `*` in `databases list`.
+- `load` (top-level shorthand) — loads a parquet file into `--catalog.--schema.--table`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
 - `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode).
-- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment. Pass a database id (group-positional `<id>` like `sandbox run`, or `--database <id>`) to scope an existing database; omit both to auto-create a scratch one using `--name` / `--schema` / `--table` / `--expires-at`. Useful for launching an agent or child process whose API access is restricted to a single database.
+- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment.
 - For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`).
 
 Example:
 
 ```sh
-hotdata databases create --name "Sales reporting" --catalog sales --table orders
-hotdata databases tables load sales orders --file ./orders.parquet
-hotdata query "SELECT count(*) FROM sales.public.orders"
+hotdata databases create --catalog airbnb
+hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
+hotdata query "SELECT count(*) FROM airbnb.public.listings"
 ```
 
 ## Tables
@@ -233,14 +240,14 @@ hotdata queries <query_run_id> [-o table|json|yaml]
 
 ## Search
 
-`--type` is **required** — no default. Pass either `vector` (similarity search via the index's embedding provider) or `bm25` (full-text search). Both run entirely server-side.
+Both run entirely server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Pass them explicitly when multiple indexes exist.
 
 ```sh
 # BM25 full-text search (requires a BM25 index on the column)
-hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
+hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] [--select <columns>] [--limit <n>] [-o table|json|csv]
 
 # Vector search (requires a vector index with auto-embedding on the column)
-hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
+hotdata search "<query>" --table <table> [--type vector] [--column <source_text_column>] [--limit <n>]
 ```
 
 - **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side.
@@ -255,17 +262,21 @@ hotdata search "<query>" --type vector --table <table> --column <source_text_col
 Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`). The two scopes are mutually exclusive.
 
 ```sh
-# Connection-table scope
+# Managed database scope (catalog alias resolves via active database)
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column <cols> --type bm25|vector|sorted \
+  [--name <name>] [--metric l2|cosine|dot] [--async] \
+  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+
+# Connection-table scope (for non-managed connections)
 hotdata indexes list   --connection-id <id> --schema <schema> --table <table> [-o table|json|yaml]
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name <name> --columns <cols> --type sorted|bm25|vector \
-  [--metric l2|cosine|dot] [--async] \
-  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+  --column <cols> --type sorted|bm25|vector [--name <name>] ...
 hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
 
 # Dataset scope
 hotdata indexes list   --dataset-id <id> [-o table|json|yaml]
-hotdata indexes create --dataset-id <id> --name <name> --columns <cols> --type sorted|bm25|vector ...
+hotdata indexes create --dataset-id <id> --column <cols> --type sorted|bm25|vector [--name <name>] ...
 hotdata indexes delete --dataset-id <id> --name <name>
 ```
 
diff --git a/skills/hotdata-search/SKILL.md b/skills/hotdata-search/SKILL.md
@@ -16,15 +16,15 @@ Retrieval workloads in Hotdata: **BM25 full-text**, **vector similarity**, and t
 
 ## Search CLI
 
-`--type` is **required**: `bm25` or `vector`. Both run server-side.
+Both run server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Specify them when multiple indexes exist.
 
 ```bash
 # BM25 (requires a BM25 index on the column)
-hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> \
+hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] \
   [--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
 
 # Vector (requires a vector index; server auto-embeds the query text)
-hotdata search "<query>" --type vector --table <connection.schema.table> --column <source_text_column> \
+hotdata search "<query>" --table <connection.schema.table> [--type vector] [--column <source_text_column>] \
   [--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
 ```
 
@@ -33,6 +33,7 @@ hotdata search "<query>" --type vector --table <connection.schema.table> --colum
 | **`bm25`** | Server generates `bm25_search(table, col, 'text')`. Results sort by score (descending). |
 | **`vector`** | Pass plain-text query; name the **source text column** (e.g. `title`). Server embeds using the same provider/metric/dimensions as the index. SQL uses `vector_distance(col, 'text')`. Results sort by distance (ascending). |
 
+- **Inference:** when `--type` or `--column` are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags.
 - **No vector index, or custom embedding model?** Use raw SQL via `hotdata query` (e.g. `cosine_distance(col, [<vec>])`). The removed `--model` / stdin-vector paths hardcoded `l2_distance` and are not supported.
 - **Before search:** create the right index (`indexes create --type bm25` or `--type vector`). See [references/INDEXES.md](references/INDEXES.md).
 - Default `--limit` is 10.
@@ -48,15 +49,19 @@ Indexes attach to a **connection table** (`--connection-id` + `--schema` + `--ta
 hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [--workspace-id <ws>] [--output table|json|yaml]
 hotdata indexes list --dataset-id <dataset_id> [--workspace-id <ws>] [--output table|json|yaml]
 
-# Connection table
-hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name <name> --columns <cols> --type bm25|vector \
-  [--metric l2|cosine|dot] [--async] \
+# Managed database (catalog alias — uses the active database when the catalog matches)
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column <col> --type bm25|vector \
+  [--name <name>] [--metric l2|cosine|dot] [--async] \
   [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+
+# Connection table (raw connection ID)
+hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
+  --column <col> --type bm25|vector [--name <name>] ...
 hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
 
 # Dataset
-hotdata indexes create --dataset-id <dataset_id> --name <name> --columns <cols> --type bm25|vector ...
+hotdata indexes create --dataset-id <dataset_id> --column <col> --type bm25|vector [--name <name>] ...
 hotdata indexes delete --dataset-id <dataset_id> --name <name>
 ```
 
@@ -89,6 +94,6 @@ hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
 
 1. `hotdata tables list --connection-id <id>` — confirm column types.
 2. `hotdata indexes list` — avoid duplicate indexes.
-3. `hotdata indexes create ... --type bm25|vector` (add `--async` if large).
-4. `hotdata search "..." --type bm25|vector --table ... --column ...`
+3. `hotdata indexes create --catalog <alias> --table <table> --column <col> --type bm25|vector` (add `--async` if large).
+4. `hotdata search "..." --table <catalog.table>` — `--type` and `--column` are inferred when there is one search index.
 5. Record what exists in **context:DATAMODEL** (core skill) when the workspace should remember index choices.
diff --git a/skills/hotdata-search/references/INDEXES.md b/skills/hotdata-search/references/INDEXES.md
@@ -30,12 +30,24 @@ Skip duplicates (same table, column, and purpose).
 
 ## 3. Create indexes
 
+For managed databases (catalog alias — auto-selects the active database connection):
+
+```bash
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column body --type bm25
+
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column embedding --type vector --metric cosine
+```
+
+For regular connections (explicit connection ID):
+
 ```bash
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name idx_posts_body_bm25 --columns body --type bm25
+  --name idx_posts_body_bm25 --column body --type bm25
 
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name idx_chunks_embedding --columns embedding --type vector --metric cosine
+  --name idx_chunks_embedding --column embedding --type vector --metric cosine
 ```
 
 Large builds: `--async`, then `hotdata jobs list` / `hotdata jobs <job_id>`.
diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md
@@ -189,25 +189,28 @@ hotdata connections create \
 hotdata databases list [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases set <id_or_name>
+hotdata databases unset
 hotdata databases <id_or_name> [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases delete <id_or_name> [--workspace-id <workspace_id>]
 hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] <cmd> [args...]
 hotdata databases <id> run <cmd> [args...]
 
-# Dot-notation shorthand for load: database.table or database.schema.table
-hotdata databases load <database.table> [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
+# Preferred: load by catalog alias (auto-declares table if needed)
+hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
 
+# Also available via tables subcommand
 hotdata databases tables list [--database <id_or_name>] [--schema <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
-hotdata databases tables load <table> [--database <id_or_name>] [--schema public] [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
+hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
 hotdata databases tables delete <table> [--database <id_or_name>] [--schema public] [--workspace-id <workspace_id>]
 ```
 
-- `list` — all managed databases in the workspace.
+- `list` — all managed databases in the workspace. Active database is marked with `*`.
 - `create` — creates a new managed database. `--name` is an optional human-readable display name. `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`); must be `[a-z_][a-z0-9_]*`. `--expires-at` accepts relative durations (`24h`, `7d`, `90m`) or an RFC 3339 timestamp; omitting means no expiry. Repeat `--table` to declare tables up front.
 - `set` — saves `<id_or_name>` as the active database. Subsequent `databases tables` and `context` commands use it automatically.
+- `unset` — clears the active database from config.
 - `<id_or_name>` — inspect one database (id, catalog, name, expires_at).
 - `delete` — removes the managed database; clears the active-database config if it matched.
-- `load` — shorthand with dot notation (`database.table` or `database.schema.table`). Schema defaults to `public`.
+- `load` (top-level shorthand) — loads parquet into `--catalog.--schema.--table`. Accepts `--file`, `--url`, or `--upload-id`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
 - `tables list` — lists tables with `TABLE` (`<catalog>.<schema>.<table>`), `SYNCED`, `LAST_SYNC`. Uses active database when `--database` is omitted.
 - `tables load` — uploads a local parquet file (`--file`), a remote parquet URL (`--url`), or a pre-staged upload (`--upload-id`) and publishes with **replace** mode.
 - `tables delete` — drops a table from the managed database.
@@ -216,10 +219,9 @@ hotdata databases tables delete <table> [--database <id_or_name>] [--schema publ
 Example:
 
 ```
-hotdata databases create --name "Sales reporting" --catalog sales --table orders
-hotdata databases set <returned-id>
-hotdata databases tables load orders --file ./orders.parquet
-hotdata query "SELECT count(*) FROM sales.public.orders"
+hotdata databases create --catalog airbnb
+hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
+hotdata query "SELECT count(*) FROM airbnb.public.listings"
 ```
 
 ### List Tables and Columns
@@ -457,17 +459,18 @@ Use a sandbox to explore tables and capture **analysis-oriented** notes in sandb
 
 ## Workflow: Creating a managed database (parquet)
 
-1. Create the database and declare tables up front:
+1. Create the database with a catalog alias:
    ```
-   hotdata databases create --name mydb --table events --table users
+   hotdata databases create --catalog mydb
    ```
-2. Load parquet into each table:
+2. Load parquet per table (tables are auto-declared if needed):
    ```
-   hotdata databases tables load mydb events --file ./events.parquet
+   hotdata databases load --catalog mydb --table events --file ./events.parquet
+   hotdata databases load --catalog mydb --table events --url https://example.com/events.parquet
    ```
 3. Confirm tables and query:
    ```
-   hotdata databases tables list mydb
+   hotdata databases tables list
    hotdata query "SELECT * FROM mydb.public.events LIMIT 10"
    ```
 
diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md
@@ -68,12 +68,12 @@ End-to-end checklists. Use the linked sections for command detail and guardrails
 1. [ ] `hotdata tables list --connection-id <id>` — pick text column (BM25) or embedding/text column (vector)
 2. [ ] `hotdata indexes list` — avoid duplicate bm25/vector indexes on the same column
 3. [ ] Create index:
-   - [ ] **Keyword:** `hotdata indexes create … --type bm25 --columns <text_col>`
-   - [ ] **Semantic:** `hotdata indexes create … --type vector --columns <col> [--metric cosine|l2|dot]`
+   - [ ] **Managed DB:** `hotdata indexes create --catalog <alias> --table <tbl> --column <text_col> --type bm25|vector`
+   - [ ] **Connection:** `hotdata indexes create --connection-id <id> --schema <s> --table <t> --column <col> --type bm25|vector [--metric cosine|l2|dot]`
    - [ ] Large build: add `--async`, then `hotdata jobs <job_id>`
-4. [ ] Search:
-   - [ ] `hotdata search "…" --type bm25 --table <connection.schema.table> --column <col>`
-   - [ ] `hotdata search "…" --type vector --table … --column <source_text_col>`
+4. [ ] Search (--type and --column inferred when one search index exists):
+   - [ ] `hotdata search "…" --table <catalog.schema.table>` (auto-infer)
+   - [ ] `hotdata search "…" --table … --type bm25 --column <col>` (explicit)
 5. [ ] (Optional) Note indexes in **context:DATAMODEL → Search & index summary**
 
 **Detail:** [hotdata-search INDEXES.md](../../hotdata-search/references/INDEXES.md)
@@ -116,24 +116,23 @@ Both land queryable tables in the workspace; the path depends on **format** and
 
 ### Workflow: managed database (parquet)
 
-1. Create the database and **declare tables** up front:
+1. Create the database with a catalog alias:
 
    ```bash
-   hotdata databases create --name sales --table orders --table customers
+   hotdata databases create --catalog sales
    ```
 
-2. Load parquet per table:
+2. Load parquet per table (tables are auto-declared if needed):
 
    ```bash
-   hotdata databases tables load sales orders --file ./orders.parquet
+   hotdata databases load --catalog sales --table orders --file ./orders.parquet
+   hotdata databases load --catalog sales --table customers --url https://example.com/customers.parquet
    ```
 
-   If load fails with *not declared*, add `--table` at create time. There is no `--url` on load — download parquet locally first.
-
 3. Confirm and query:
 
    ```bash
-   hotdata databases tables list sales
+   hotdata databases tables list
    hotdata query "SELECT count(*) FROM sales.public.orders"
    ```