hotdata-dev
diff --git a/‎README.md‎
Lines changed: 28 additions & 17 deletions b/‎README.md‎
Lines changed: 28 additions & 17 deletions
diff --git a/‎skills/hotdata-analytics/SKILL.md‎
Lines changed: 3 additions & 4 deletions b/‎skills/hotdata-analytics/SKILL.md‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎skills/hotdata-analytics/references/WORKFLOWS.md‎
Lines changed: 2 additions & 2 deletions b/‎skills/hotdata-analytics/references/WORKFLOWS.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎skills/hotdata-search/SKILL.md‎
Lines changed: 15 additions & 10 deletions b/‎skills/hotdata-search/SKILL.md‎
Lines changed: 15 additions & 10 deletions
diff --git a/‎skills/hotdata-search/references/INDEXES.md‎
Lines changed: 14 additions & 2 deletions b/‎skills/hotdata-search/references/INDEXES.md‎
Lines changed: 14 additions & 2 deletions
diff --git a/‎skills/hotdata/SKILL.md‎
Lines changed: 17 additions & 14 deletions b/‎skills/hotdata/SKILL.md‎
Lines changed: 17 additions & 14 deletions
@@ -135,28 +135,35 @@ Managed databases are Hotdata-owned catalogs you create and populate yourself (n
 ```sh
 hotdata databases list [-w <id>] [-o table|json|yaml]
 hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [-o table|json|yaml]
+hotdata databases set <id>
+hotdata databases unset
 hotdata databases <name_or_id> [-o table|json|yaml]
 hotdata databases delete <name_or_id>
 hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] <cmd> [args...]
 hotdata databases <id> run <cmd> [args...]
 
-hotdata databases tables list <database> [--schema <name>] [-o table|json|yaml]
-hotdata databases tables load <database> <table> --file ./data.parquet [--schema public]
-hotdata databases tables load <database> <table> --upload-id <id> [--schema public]
-hotdata databases tables delete <database> <table> [--schema public]
+# Preferred: load by catalog alias (auto-declares table if needed)
+hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>)
+
+# Also available: explicit database flag
+hotdata databases tables list [--database <id_or_name>] [--schema <name>] [-o table|json|yaml]
+hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>)
+hotdata databases tables delete <table> [--database <id_or_name>] [--schema public]
 ```
 
-- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`. Use `--table` to declare tables up front (required before `tables load` on the current API).
+- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`) and must be `[a-z_][a-z0-9_]*`.
+- `set` / `unset` — save or clear the active database. All `databases tables` and `context` commands default to it. The active database is marked with `*` in `databases list`.
+- `load` (top-level shorthand) — loads a parquet file into `--catalog.--schema.--table`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
 - `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode).
-- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment. Pass a database id (group-positional `<id>` like `sandbox run`, or `--database <id>`) to scope an existing database; omit both to auto-create a scratch one using `--name` / `--schema` / `--table` / `--expires-at`. Useful for launching an agent or child process whose API access is restricted to a single database.
+- `run` mints a database-scoped JWT and execs `<cmd>` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment.
 - For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`).
 
 Example:
 
 ```sh
-hotdata databases create --name "Sales reporting" --catalog sales --table orders
-hotdata databases tables load sales orders --file ./orders.parquet
-hotdata query "SELECT count(*) FROM sales.public.orders"
+hotdata databases create --catalog airbnb
+hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
+hotdata query "SELECT count(*) FROM airbnb.public.listings"
 ```
 
 ## Tables
@@ -233,14 +240,14 @@ hotdata queries <query_run_id> [-o table|json|yaml]
 
 ## Search
 
-`--type` is **required** — no default. Pass either `vector` (similarity search via the index's embedding provider) or `bm25` (full-text search). Both run entirely server-side.
+Both run entirely server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Pass them explicitly when multiple indexes exist.
 
 ```sh
 # BM25 full-text search (requires a BM25 index on the column)
-hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
+hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] [--select <columns>] [--limit <n>] [-o table|json|csv]
 
 # Vector search (requires a vector index with auto-embedding on the column)
-hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
+hotdata search "<query>" --table <table> [--type vector] [--column <source_text_column>] [--limit <n>]
 ```
 
 - **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side.
@@ -255,17 +262,21 @@ hotdata search "<query>" --type vector --table <table> --column <source_text_col
 Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`). The two scopes are mutually exclusive.
 
 ```sh
-# Connection-table scope
+# Managed database scope (catalog alias resolves via active database)
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column <cols> --type bm25|vector|sorted \
+  [--name <name>] [--metric l2|cosine|dot] [--async] \
+  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+
+# Connection-table scope (for non-managed connections)
 hotdata indexes list   --connection-id <id> --schema <schema> --table <table> [-o table|json|yaml]
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name <name> --columns <cols> --type sorted|bm25|vector \
-  [--metric l2|cosine|dot] [--async] \
-  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+  --column <cols> --type sorted|bm25|vector [--name <name>] ...
 hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
 
 # Dataset scope
 hotdata indexes list   --dataset-id <id> [-o table|json|yaml]
-hotdata indexes create --dataset-id <id> --name <name> --columns <cols> --type sorted|bm25|vector ...
+hotdata indexes create --dataset-id <id> --column <cols> --type sorted|bm25|vector [--name <name>] ...
 hotdata indexes delete --dataset-id <id> --name <name>
 ```
 
 
@@ -89,9 +89,8 @@ hotdata results <result_id> [--workspace-id <workspace_id>] [--output table|json
    Or managed parquet:
 
    ```bash
-   hotdata databases create --name analytics --table slice
-   hotdata databases set <returned-id>
-   hotdata databases tables load slice --file ./slice.parquet
+   hotdata databases create --catalog analytics
+   hotdata databases load --catalog analytics --table slice --file ./slice.parquet
    ```
 
 3. **Chain query** — use printed **`full_name`** or `datasets list` **FULL NAME** column:
@@ -113,7 +112,7 @@ For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hot
 
 ```bash
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name idx_orders_created --columns created_at --type sorted [--async]
+  --name idx_orders_created --column created_at --type sorted [--async]
 ```
 
 List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here.
 
@@ -76,8 +76,8 @@ hotdata datasets create --label "from saved" --query-id <query_id> [--table-name
 **Managed database** (parquet → `<database>.<schema>.<table>`):
 
 ```bash
-hotdata databases create --name chain_db --table revenue_slice
-hotdata databases tables load chain_db revenue_slice --file ./revenue_slice.parquet
+hotdata databases create --catalog chain_db
+hotdata databases load --catalog chain_db --table revenue_slice --file ./revenue_slice.parquet
 ```
 
 Note the printed **`full_name`** (e.g. `datasets.main.chain_revenue_slice` or `chain_db.public.revenue_slice`). For datasets, **`FULL NAME`** from `datasets list` is authoritative.
 
@@ -16,15 +16,15 @@ Retrieval workloads in Hotdata: **BM25 full-text**, **vector similarity**, and t
 
 ## Search CLI
 
-`--type` is **required**: `bm25` or `vector`. Both run server-side.
+Both run server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Specify them when multiple indexes exist.
 
 ```bash
 # BM25 (requires a BM25 index on the column)
-hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> \
+hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] \
   [--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
 
 # Vector (requires a vector index; server auto-embeds the query text)
-hotdata search "<query>" --type vector --table <connection.schema.table> --column <source_text_column> \
+hotdata search "<query>" --table <connection.schema.table> [--type vector] [--column <source_text_column>] \
   [--select <columns>] [--limit <n>] [--workspace-id <workspace_id>] [--output table|json|csv]
 ```
 
@@ -33,6 +33,7 @@ hotdata search "<query>" --type vector --table <connection.schema.table> --colum
 | **`bm25`** | Server generates `bm25_search(table, col, 'text')`. Results sort by score (descending). |
 | **`vector`** | Pass plain-text query; name the **source text column** (e.g. `title`). Server embeds using the same provider/metric/dimensions as the index. SQL uses `vector_distance(col, 'text')`. Results sort by distance (ascending). |
 
+- **Inference:** when `--type` or `--column` are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags.
 - **No vector index, or custom embedding model?** Use raw SQL via `hotdata query` (e.g. `cosine_distance(col, [<vec>])`). The removed `--model` / stdin-vector paths hardcoded `l2_distance` and are not supported.
 - **Before search:** create the right index (`indexes create --type bm25` or `--type vector`). See [references/INDEXES.md](references/INDEXES.md).
 - Default `--limit` is 10.
@@ -48,15 +49,19 @@ Indexes attach to a **connection table** (`--connection-id` + `--schema` + `--ta
 hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [--workspace-id <ws>] [--output table|json|yaml]
 hotdata indexes list --dataset-id <dataset_id> [--workspace-id <ws>] [--output table|json|yaml]
 
-# Connection table
-hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name <name> --columns <cols> --type bm25|vector \
-  [--metric l2|cosine|dot] [--async] \
+# Managed database (catalog alias — uses the active database when the catalog matches)
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column <col> --type bm25|vector \
+  [--name <name>] [--metric l2|cosine|dot] [--async] \
   [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+
+# Connection table (raw connection ID)
+hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
+  --column <col> --type bm25|vector [--name <name>] ...
 hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
 
 # Dataset
-hotdata indexes create --dataset-id <dataset_id> --name <name> --columns <cols> --type bm25|vector ...
+hotdata indexes create --dataset-id <dataset_id> --column <col> --type bm25|vector [--name <name>] ...
 hotdata indexes delete --dataset-id <dataset_id> --name <name>
 ```
 
@@ -89,6 +94,6 @@ hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
 
 1. `hotdata tables list --connection-id <id>` — confirm column types.
 2. `hotdata indexes list` — avoid duplicate indexes.
-3. `hotdata indexes create ... --type bm25|vector` (add `--async` if large).
-4. `hotdata search "..." --type bm25|vector --table ... --column ...`
+3. `hotdata indexes create --catalog <alias> --table <table> --column <col> --type bm25|vector` (add `--async` if large).
+4. `hotdata search "..." --table <catalog.table>` — `--type` and `--column` are inferred when there is one search index.
 5. Record what exists in **context:DATAMODEL** (core skill) when the workspace should remember index choices.
@@ -30,12 +30,24 @@ Skip duplicates (same table, column, and purpose).
 
 ## 3. Create indexes
 
+For managed databases (catalog alias — auto-selects the active database connection):
+
+```bash
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column body --type bm25
+
+hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
+  --column embedding --type vector --metric cosine
+```
+
+For regular connections (explicit connection ID):
+
 ```bash
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name idx_posts_body_bm25 --columns body --type bm25
+  --name idx_posts_body_bm25 --column body --type bm25
 
 hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
-  --name idx_chunks_embedding --columns embedding --type vector --metric cosine
+  --name idx_chunks_embedding --column embedding --type vector --metric cosine
 ```
 
 Large builds: `--async`, then `hotdata jobs list` / `hotdata jobs <job_id>`.
 
@@ -189,25 +189,28 @@ hotdata connections create \
 hotdata databases list [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases set <id_or_name>
+hotdata databases unset
 hotdata databases <id_or_name> [--workspace-id <workspace_id>] [--output table|json|yaml]
 hotdata databases delete <id_or_name> [--workspace-id <workspace_id>]
 hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] [--workspace-id <workspace_id>] <cmd> [args...]
 hotdata databases <id> run <cmd> [args...]
 
-# Dot-notation shorthand for load: database.table or database.schema.table
-hotdata databases load <database.table> [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
+# Preferred: load by catalog alias (auto-declares table if needed)
+hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
 
+# Also available via tables subcommand
 hotdata databases tables list [--database <id_or_name>] [--schema <name>] [--workspace-id <workspace_id>] [--output table|json|yaml]
-hotdata databases tables load <table> [--database <id_or_name>] [--schema public] [--file ./data.parquet] [--url <url>] [--upload-id <id>] [--workspace-id <workspace_id>]
+hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>) [--workspace-id <workspace_id>]
 hotdata databases tables delete <table> [--database <id_or_name>] [--schema public] [--workspace-id <workspace_id>]
 ```
 
-- `list` — all managed databases in the workspace.
+- `list` — all managed databases in the workspace. Active database is marked with `*`.
 - `create` — creates a new managed database. `--name` is an optional human-readable display name. `--catalog` sets the SQL alias used in queries (`SELECT … FROM <catalog>.schema.table`); must be `[a-z_][a-z0-9_]*`. `--expires-at` accepts relative durations (`24h`, `7d`, `90m`) or an RFC 3339 timestamp; omitting means no expiry. Repeat `--table` to declare tables up front.
 - `set` — saves `<id_or_name>` as the active database. Subsequent `databases tables` and `context` commands use it automatically.
+- `unset` — clears the active database from config.
 - `<id_or_name>` — inspect one database (id, catalog, name, expires_at).
 - `delete` — removes the managed database; clears the active-database config if it matched.
-- `load` — shorthand with dot notation (`database.table` or `database.schema.table`). Schema defaults to `public`.
+- `load` (top-level shorthand) — loads parquet into `--catalog.--schema.--table`. Accepts `--file`, `--url`, or `--upload-id`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
 - `tables list` — lists tables with `TABLE` (`<catalog>.<schema>.<table>`), `SYNCED`, `LAST_SYNC`. Uses active database when `--database` is omitted.
 - `tables load` — uploads a local parquet file (`--file`), a remote parquet URL (`--url`), or a pre-staged upload (`--upload-id`) and publishes with **replace** mode.
 - `tables delete` — drops a table from the managed database.
@@ -216,10 +219,9 @@ hotdata databases tables delete <table> [--database <id_or_name>] [--schema publ
 Example:
 
 ```
-hotdata databases create --name "Sales reporting" --catalog sales --table orders
-hotdata databases set <returned-id>
-hotdata databases tables load orders --file ./orders.parquet
-hotdata query "SELECT count(*) FROM sales.public.orders"
+hotdata databases create --catalog airbnb
+hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
+hotdata query "SELECT count(*) FROM airbnb.public.listings"
 ```
 
 ### List Tables and Columns
@@ -457,17 +459,18 @@ Use a sandbox to explore tables and capture **analysis-oriented** notes in sandb
 
 ## Workflow: Creating a managed database (parquet)
 
-1. Create the database and declare tables up front:
+1. Create the database with a catalog alias:
    ```
-   hotdata databases create --name mydb --table events --table users
+   hotdata databases create --catalog mydb
    ```
-2. Load parquet into each table:
+2. Load parquet per table (tables are auto-declared if needed):
    ```
-   hotdata databases tables load mydb events --file ./events.parquet
+   hotdata databases load --catalog mydb --table events --file ./events.parquet
+   hotdata databases load --catalog mydb --table events --url https://example.com/events.parquet
    ```
 3. Confirm tables and query:
    ```
-   hotdata databases tables list mydb
+   hotdata databases tables list
    hotdata query "SELECT * FROM mydb.public.events LIMIT 10"
    ```