diff --git a/README.md b/README.md index 3200480..55f7c92 100644 --- a/README.md +++ b/README.md @@ -135,28 +135,35 @@ Managed databases are Hotdata-owned catalogs you create and populate yourself (n ```sh hotdata databases list [-w ] [-o table|json|yaml] hotdata databases create [--name ] [--catalog ] [--table ...] [--schema public] [--expires-at ] [-o table|json|yaml] +hotdata databases set +hotdata databases unset hotdata databases [-o table|json|yaml] hotdata databases delete hotdata databases run [--database ] [--name
...] [--expires-at ] [args...] hotdata databases run [args...] -hotdata databases tables list [--schema ] [-o table|json|yaml] -hotdata databases tables load
--file ./data.parquet [--schema public] -hotdata databases tables load
--upload-id [--schema public] -hotdata databases tables delete
[--schema public] +# Preferred: load by catalog alias (auto-declares table if needed) +hotdata databases load --catalog --table
[--schema public] (--file | --url | --upload-id ) + +# Also available: explicit database flag +hotdata databases tables list [--database ] [--schema ] [-o table|json|yaml] +hotdata databases tables load
[--database ] [--schema public] (--file | --url | --upload-id ) +hotdata databases tables delete
[--database ] [--schema public] ``` -- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM .schema.table`) and must be `[a-z_][a-z0-9_]*`. Use `--table` to declare tables up front (required before `tables load` on the current API). +- `create` registers a managed connection with no external credentials. `--name` is a human-readable display name; `--catalog` sets the SQL alias used in queries (`SELECT … FROM .schema.table`) and must be `[a-z_][a-z0-9_]*`. +- `set` / `unset` — save or clear the active database. All `databases tables` and `context` commands default to it. The active database is marked with `*` in `databases list`. +- `load` (top-level shorthand) — loads a parquet file into `--catalog.--schema.--table`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load. - `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode). -- `run` mints a database-scoped JWT and execs `` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment. Pass a database id (group-positional `` like `sandbox run`, or `--database `) to scope an existing database; omit both to auto-create a scratch one using `--name` / `--schema` / `--table` / `--expires-at`. Useful for launching an agent or child process whose API access is restricted to a single database. +- `run` mints a database-scoped JWT and execs `` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment. - For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`). Example: ```sh -hotdata databases create --name "Sales reporting" --catalog sales --table orders -hotdata databases tables load sales orders --file ./orders.parquet -hotdata query "SELECT count(*) FROM sales.public.orders" +hotdata databases create --catalog airbnb +hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet +hotdata query "SELECT count(*) FROM airbnb.public.listings" ``` ## Tables @@ -233,14 +240,14 @@ hotdata queries [-o table|json|yaml] ## Search -`--type` is **required** — no default. Pass either `vector` (similarity search via the index's embedding provider) or `bm25` (full-text search). Both run entirely server-side. +Both run entirely server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Pass them explicitly when multiple indexes exist. ```sh # BM25 full-text search (requires a BM25 index on the column) -hotdata search "" --type bm25 --table --column [--select ] [--limit ] [-o table|json|csv] +hotdata search "" --table [--type bm25] [--column ] [--select ] [--limit ] [-o table|json|csv] # Vector search (requires a vector index with auto-embedding on the column) -hotdata search "" --type vector --table
--column [--limit ] +hotdata search "" --table
[--type vector] [--column ] [--limit ] ``` - **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side. @@ -255,17 +262,21 @@ hotdata search "" --type vector --table
--column --schema --table
\ + --column --type bm25|vector|sorted \ + [--name ] [--metric l2|cosine|dot] [--async] \ + [--embedding-provider-id ] [--dimensions ] [--output-column ] [--description ] + +# Connection-table scope (for non-managed connections) hotdata indexes list --connection-id --schema --table
[-o table|json|yaml] hotdata indexes create --connection-id --schema --table
\ - --name --columns --type sorted|bm25|vector \ - [--metric l2|cosine|dot] [--async] \ - [--embedding-provider-id ] [--dimensions ] [--output-column ] [--description ] + --column --type sorted|bm25|vector [--name ] ... hotdata indexes delete --connection-id --schema --table
--name # Dataset scope hotdata indexes list --dataset-id [-o table|json|yaml] -hotdata indexes create --dataset-id --name --columns --type sorted|bm25|vector ... +hotdata indexes create --dataset-id --column --type sorted|bm25|vector [--name ] ... hotdata indexes delete --dataset-id --name ``` diff --git a/skills/hotdata-analytics/SKILL.md b/skills/hotdata-analytics/SKILL.md index b112c0f..93db0c6 100644 --- a/skills/hotdata-analytics/SKILL.md +++ b/skills/hotdata-analytics/SKILL.md @@ -89,9 +89,8 @@ hotdata results [--workspace-id ] [--output table|json Or managed parquet: ```bash - hotdata databases create --name analytics --table slice - hotdata databases set - hotdata databases tables load slice --file ./slice.parquet + hotdata databases create --catalog analytics + hotdata databases load --catalog analytics --table slice --file ./slice.parquet ``` 3. **Chain query** — use printed **`full_name`** or `datasets list` **FULL NAME** column: @@ -113,7 +112,7 @@ For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hot ```bash hotdata indexes create --connection-id --schema --table
\ - --name idx_orders_created --columns created_at --type sorted [--async] + --name idx_orders_created --column created_at --type sorted [--async] ``` List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here. diff --git a/skills/hotdata-analytics/references/WORKFLOWS.md b/skills/hotdata-analytics/references/WORKFLOWS.md index 0a11385..a62aaf6 100644 --- a/skills/hotdata-analytics/references/WORKFLOWS.md +++ b/skills/hotdata-analytics/references/WORKFLOWS.md @@ -76,8 +76,8 @@ hotdata datasets create --label "from saved" --query-id [--table-name **Managed database** (parquet → `..
`): ```bash -hotdata databases create --name chain_db --table revenue_slice -hotdata databases tables load chain_db revenue_slice --file ./revenue_slice.parquet +hotdata databases create --catalog chain_db +hotdata databases load --catalog chain_db --table revenue_slice --file ./revenue_slice.parquet ``` Note the printed **`full_name`** (e.g. `datasets.main.chain_revenue_slice` or `chain_db.public.revenue_slice`). For datasets, **`FULL NAME`** from `datasets list` is authoritative. diff --git a/skills/hotdata-search/SKILL.md b/skills/hotdata-search/SKILL.md index b6d6e1f..2fd61c8 100644 --- a/skills/hotdata-search/SKILL.md +++ b/skills/hotdata-search/SKILL.md @@ -16,15 +16,15 @@ Retrieval workloads in Hotdata: **BM25 full-text**, **vector similarity**, and t ## Search CLI -`--type` is **required**: `bm25` or `vector`. Both run server-side. +Both run server-side. `--type` and `--column` are **optional** when the table has exactly one search index — they are inferred automatically. Specify them when multiple indexes exist. ```bash # BM25 (requires a BM25 index on the column) -hotdata search "" --type bm25 --table --column \ +hotdata search "" --table [--type bm25] [--column ] \ [--select ] [--limit ] [--workspace-id ] [--output table|json|csv] # Vector (requires a vector index; server auto-embeds the query text) -hotdata search "" --type vector --table --column \ +hotdata search "" --table [--type vector] [--column ] \ [--select ] [--limit ] [--workspace-id ] [--output table|json|csv] ``` @@ -33,6 +33,7 @@ hotdata search "" --type vector --table --colum | **`bm25`** | Server generates `bm25_search(table, col, 'text')`. Results sort by score (descending). | | **`vector`** | Pass plain-text query; name the **source text column** (e.g. `title`). Server embeds using the same provider/metric/dimensions as the index. SQL uses `vector_distance(col, 'text')`. Results sort by distance (ascending). | +- **Inference:** when `--type` or `--column` are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags. - **No vector index, or custom embedding model?** Use raw SQL via `hotdata query` (e.g. `cosine_distance(col, [])`). The removed `--model` / stdin-vector paths hardcoded `l2_distance` and are not supported. - **Before search:** create the right index (`indexes create --type bm25` or `--type vector`). See [references/INDEXES.md](references/INDEXES.md). - Default `--limit` is 10. @@ -48,15 +49,19 @@ Indexes attach to a **connection table** (`--connection-id` + `--schema` + `--ta hotdata indexes list [--connection-id ] [--schema ] [--table
] [--workspace-id ] [--output table|json|yaml] hotdata indexes list --dataset-id [--workspace-id ] [--output table|json|yaml] -# Connection table -hotdata indexes create --connection-id --schema --table
\ - --name --columns --type bm25|vector \ - [--metric l2|cosine|dot] [--async] \ +# Managed database (catalog alias — uses the active database when the catalog matches) +hotdata indexes create --catalog --schema --table
\ + --column --type bm25|vector \ + [--name ] [--metric l2|cosine|dot] [--async] \ [--embedding-provider-id ] [--dimensions ] [--output-column ] [--description ] + +# Connection table (raw connection ID) +hotdata indexes create --connection-id --schema --table
\ + --column --type bm25|vector [--name ] ... hotdata indexes delete --connection-id --schema --table
--name # Dataset -hotdata indexes create --dataset-id --name --columns --type bm25|vector ... +hotdata indexes create --dataset-id --column --type bm25|vector [--name ] ... hotdata indexes delete --dataset-id --name ``` @@ -89,6 +94,6 @@ hotdata embedding-providers delete [--workspace-id ] 1. `hotdata tables list --connection-id ` — confirm column types. 2. `hotdata indexes list` — avoid duplicate indexes. -3. `hotdata indexes create ... --type bm25|vector` (add `--async` if large). -4. `hotdata search "..." --type bm25|vector --table ... --column ...` +3. `hotdata indexes create --catalog --table
--column --type bm25|vector` (add `--async` if large). +4. `hotdata search "..." --table ` — `--type` and `--column` are inferred when there is one search index. 5. Record what exists in **context:DATAMODEL** (core skill) when the workspace should remember index choices. diff --git a/skills/hotdata-search/references/INDEXES.md b/skills/hotdata-search/references/INDEXES.md index 98fd783..fff424b 100644 --- a/skills/hotdata-search/references/INDEXES.md +++ b/skills/hotdata-search/references/INDEXES.md @@ -30,12 +30,24 @@ Skip duplicates (same table, column, and purpose). ## 3. Create indexes +For managed databases (catalog alias — auto-selects the active database connection): + +```bash +hotdata indexes create --catalog --schema --table
\ + --column body --type bm25 + +hotdata indexes create --catalog --schema --table
\ + --column embedding --type vector --metric cosine +``` + +For regular connections (explicit connection ID): + ```bash hotdata indexes create --connection-id --schema --table
\ - --name idx_posts_body_bm25 --columns body --type bm25 + --name idx_posts_body_bm25 --column body --type bm25 hotdata indexes create --connection-id --schema --table
\ - --name idx_chunks_embedding --columns embedding --type vector --metric cosine + --name idx_chunks_embedding --column embedding --type vector --metric cosine ``` Large builds: `--async`, then `hotdata jobs list` / `hotdata jobs `. diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index e2d04a1..14e6de5 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -189,25 +189,28 @@ hotdata connections create \ hotdata databases list [--workspace-id ] [--output table|json|yaml] hotdata databases create [--name ] [--catalog ] [--table
...] [--schema public] [--expires-at ] [--workspace-id ] [--output table|json|yaml] hotdata databases set +hotdata databases unset hotdata databases [--workspace-id ] [--output table|json|yaml] hotdata databases delete [--workspace-id ] hotdata databases run [--database ] [--name
...] [--expires-at ] [--workspace-id ] [args...] hotdata databases run [args...] -# Dot-notation shorthand for load: database.table or database.schema.table -hotdata databases load [--file ./data.parquet] [--url ] [--upload-id ] [--workspace-id ] +# Preferred: load by catalog alias (auto-declares table if needed) +hotdata databases load --catalog --table
[--schema public] (--file | --url | --upload-id ) [--workspace-id ] +# Also available via tables subcommand hotdata databases tables list [--database ] [--schema ] [--workspace-id ] [--output table|json|yaml] -hotdata databases tables load
[--database ] [--schema public] [--file ./data.parquet] [--url ] [--upload-id ] [--workspace-id ] +hotdata databases tables load
[--database ] [--schema public] (--file | --url | --upload-id ) [--workspace-id ] hotdata databases tables delete
[--database ] [--schema public] [--workspace-id ] ``` -- `list` — all managed databases in the workspace. +- `list` — all managed databases in the workspace. Active database is marked with `*`. - `create` — creates a new managed database. `--name` is an optional human-readable display name. `--catalog` sets the SQL alias used in queries (`SELECT … FROM .schema.table`); must be `[a-z_][a-z0-9_]*`. `--expires-at` accepts relative durations (`24h`, `7d`, `90m`) or an RFC 3339 timestamp; omitting means no expiry. Repeat `--table` to declare tables up front. - `set` — saves `` as the active database. Subsequent `databases tables` and `context` commands use it automatically. +- `unset` — clears the active database from config. - `` — inspect one database (id, catalog, name, expires_at). - `delete` — removes the managed database; clears the active-database config if it matched. -- `load` — shorthand with dot notation (`database.table` or `database.schema.table`). Schema defaults to `public`. +- `load` (top-level shorthand) — loads parquet into `--catalog.--schema.--table`. Accepts `--file`, `--url`, or `--upload-id`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load. - `tables list` — lists tables with `TABLE` (`..
`), `SYNCED`, `LAST_SYNC`. Uses active database when `--database` is omitted. - `tables load` — uploads a local parquet file (`--file`), a remote parquet URL (`--url`), or a pre-staged upload (`--upload-id`) and publishes with **replace** mode. - `tables delete` — drops a table from the managed database. @@ -216,10 +219,9 @@ hotdata databases tables delete
[--database ] [--schema publ Example: ``` -hotdata databases create --name "Sales reporting" --catalog sales --table orders -hotdata databases set -hotdata databases tables load orders --file ./orders.parquet -hotdata query "SELECT count(*) FROM sales.public.orders" +hotdata databases create --catalog airbnb +hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet +hotdata query "SELECT count(*) FROM airbnb.public.listings" ``` ### List Tables and Columns @@ -457,17 +459,18 @@ Use a sandbox to explore tables and capture **analysis-oriented** notes in sandb ## Workflow: Creating a managed database (parquet) -1. Create the database and declare tables up front: +1. Create the database with a catalog alias: ``` - hotdata databases create --name mydb --table events --table users + hotdata databases create --catalog mydb ``` -2. Load parquet into each table: +2. Load parquet per table (tables are auto-declared if needed): ``` - hotdata databases tables load mydb events --file ./events.parquet + hotdata databases load --catalog mydb --table events --file ./events.parquet + hotdata databases load --catalog mydb --table events --url https://example.com/events.parquet ``` 3. Confirm tables and query: ``` - hotdata databases tables list mydb + hotdata databases tables list hotdata query "SELECT * FROM mydb.public.events LIMIT 10" ``` diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md index fbcf6f1..30ee980 100644 --- a/skills/hotdata/references/WORKFLOWS.md +++ b/skills/hotdata/references/WORKFLOWS.md @@ -68,12 +68,12 @@ End-to-end checklists. Use the linked sections for command detail and guardrails 1. [ ] `hotdata tables list --connection-id ` — pick text column (BM25) or embedding/text column (vector) 2. [ ] `hotdata indexes list` — avoid duplicate bm25/vector indexes on the same column 3. [ ] Create index: - - [ ] **Keyword:** `hotdata indexes create … --type bm25 --columns ` - - [ ] **Semantic:** `hotdata indexes create … --type vector --columns [--metric cosine|l2|dot]` + - [ ] **Managed DB:** `hotdata indexes create --catalog --table --column --type bm25|vector` + - [ ] **Connection:** `hotdata indexes create --connection-id --schema --table --column --type bm25|vector [--metric cosine|l2|dot]` - [ ] Large build: add `--async`, then `hotdata jobs ` -4. [ ] Search: - - [ ] `hotdata search "…" --type bm25 --table --column ` - - [ ] `hotdata search "…" --type vector --table … --column ` +4. [ ] Search (--type and --column inferred when one search index exists): + - [ ] `hotdata search "…" --table ` (auto-infer) + - [ ] `hotdata search "…" --table … --type bm25 --column ` (explicit) 5. [ ] (Optional) Note indexes in **context:DATAMODEL → Search & index summary** **Detail:** [hotdata-search INDEXES.md](../../hotdata-search/references/INDEXES.md) @@ -116,24 +116,23 @@ Both land queryable tables in the workspace; the path depends on **format** and ### Workflow: managed database (parquet) -1. Create the database and **declare tables** up front: +1. Create the database with a catalog alias: ```bash - hotdata databases create --name sales --table orders --table customers + hotdata databases create --catalog sales ``` -2. Load parquet per table: +2. Load parquet per table (tables are auto-declared if needed): ```bash - hotdata databases tables load sales orders --file ./orders.parquet + hotdata databases load --catalog sales --table orders --file ./orders.parquet + hotdata databases load --catalog sales --table customers --url https://example.com/customers.parquet ``` - If load fails with *not declared*, add `--table` at create time. There is no `--url` on load — download parquet locally first. - 3. Confirm and query: ```bash - hotdata databases tables list sales + hotdata databases tables list hotdata query "SELECT count(*) FROM sales.public.orders" ``` diff --git a/src/api.rs b/src/api.rs index 1e17b95..4141766 100644 --- a/src/api.rs +++ b/src/api.rs @@ -181,6 +181,10 @@ impl ApiClient { self } + pub fn workspace_id(&self) -> Option<&str> { + self.workspace_id.as_deref() + } + /// Test-only client (no config load). Used with a local mock HTTP server. /// The refresher returns `None`, so 401s are not retried — matching the /// behavior of tests that don't exercise the refresh path. diff --git a/src/command.rs b/src/command.rs index 3b7a1b1..5feb3c0 100644 --- a/src/command.rs +++ b/src/command.rs @@ -329,29 +329,26 @@ pub enum IndexesCommands { }, /// Create an index on a table or dataset. - /// - /// For connection-scoped indexes, pass the table and columns using bracket notation: - /// `connection.table[col1,col2]` or `connection.schema.table[col1,col2]` - /// (schema defaults to `public` when omitted) - /// - /// For dataset-scoped indexes, use `--dataset-id` with `--columns`. Create { - /// Table and columns to index: `connection.table[col1,col2]` - /// or `connection.schema.table[col1,col2]`. Schema defaults to `public`. - /// - /// Quote the argument to prevent shell glob expansion: - /// `hotdata indexes create 'airbnb.listings[description]' --type bm25` - #[arg(conflicts_with = "dataset_id")] - target: Option, + /// SQL catalog alias of the target database (e.g. `--catalog airbnb`) + #[arg(long, conflicts_with = "dataset_id")] + catalog: Option, - /// Dataset ID (alternative scope to the positional target) - #[arg(long, conflicts_with = "target")] - dataset_id: Option, + /// Schema name (default: public) + #[arg(long, default_value = "public")] + schema: String, + + /// Table name to index + #[arg(long, conflicts_with = "dataset_id")] + table: Option, - /// Columns to index (comma-separated). Required with --dataset-id; - /// for connection scope use bracket notation in the target instead. + /// Column(s) to index (comma-separated) #[arg(long)] - columns: Option, + column: Option, + + /// Dataset ID (alternative scope to --catalog/--table) + #[arg(long, conflicts_with_all = ["catalog", "table"])] + dataset_id: Option, /// Index name (derived from table, columns, and type if omitted) #[arg(long)] @@ -600,17 +597,28 @@ pub enum DatabasesCommands { id: String, }, + /// Clear the current database + Unset, + /// Delete a managed database and its tables Delete { /// Database name or connection ID name_or_id: String, }, - /// Load a parquet file into a table using dot notation: `database.table` or `database.schema.table` + /// Load a parquet file into a managed database table Load { - /// Table to load into: `database.table` or `database.schema.table`. - /// Schema defaults to `public` when omitted. - target: String, + /// SQL catalog alias of the target database (e.g. `--catalog airbnb`) + #[arg(long)] + catalog: String, + + /// Schema to load into (default: public) + #[arg(long, default_value = "public")] + schema: String, + + /// Table name to load into + #[arg(long)] + table: String, /// Path to a local parquet file to upload and load #[arg(long, conflicts_with_all = ["upload_id", "url"])] diff --git a/src/connections.rs b/src/connections.rs index 135663f..13af103 100644 --- a/src/connections.rs +++ b/src/connections.rs @@ -172,21 +172,39 @@ pub fn resolve_connection_id(api: &ApiClient, name_or_id: &str) -> String { } } + // Before listing connections, check if the active database's catalog or name + // matches — prefer it over any stale connection entry with the same name. + if let Some(ws) = api.workspace_id() { + if let Some(active_id) = crate::config::load_current_database("default", ws) { + if let Some(active_db) = api.get_none_if_not_found::(&format!("/databases/{active_id}")) { + if active_db.default_catalog.as_deref() == Some(name_or_id) + || active_db.name.as_deref() == Some(name_or_id) + { + return active_db.default_connection_id; + } + } + } + } + let body: ListResponse = api.get("/connections"); - match body + if let Some(conn) = body .connections .iter() .find(|c| c.id == name_or_id || c.name == name_or_id) { - Some(conn) => conn.id.clone(), - None => { - eprintln!( - "{}", - format!("error: no connection named or with id '{name_or_id}'").red() - ); - std::process::exit(1); - } + return conn.id.clone(); } + + // Fall back to managed databases: treat name_or_id as a catalog alias. + if let Ok(db) = crate::databases::try_resolve_database(api, name_or_id) { + return db.default_connection_id; + } + + eprintln!( + "{}", + format!("error: no connection named or with id '{name_or_id}'").red() + ); + std::process::exit(1); } pub fn get(workspace_id: &str, connection_id: &str, format: &str) { diff --git a/src/databases.rs b/src/databases.rs index 15feedb..18e8c39 100644 --- a/src/databases.rs +++ b/src/databases.rs @@ -11,6 +11,8 @@ struct DatabaseSummary { id: String, #[serde(default)] name: Option, + #[serde(default)] + default_catalog: Option, } #[derive(Deserialize)] @@ -28,6 +30,8 @@ pub struct Database { pub default_catalog: Option, pub default_connection_id: String, #[serde(default)] + pub expires_at: Option, + #[serde(default)] attachments: Vec, } @@ -109,8 +113,26 @@ pub fn try_resolve_database(api: &ApiClient, id_or_name: &str) -> Result = body + .databases + .iter() + .filter(|d| d.default_catalog.as_deref() == Some(id_or_name)) + .collect(); + + if !catalog_matches.is_empty() { + return match catalog_matches.len() { + 1 => Ok(fetch_database(api, &catalog_matches[0].id)), + _ => Err(format!( + "multiple databases have catalog '{}' — use the database id instead", + id_or_name + )), + }; + } + + let name_matches: Vec<&DatabaseSummary> = body .databases .iter() @@ -119,7 +141,7 @@ pub fn try_resolve_database(api: &ApiClient, id_or_name: &str) -> Result Err(format!( - "no database with id or name '{id_or_name}'" + "no database with id, catalog, or name '{id_or_name}'" )), 1 => Ok(fetch_database(api, &name_matches[0].id)), _ => Err(format!( @@ -398,17 +420,20 @@ pub fn list(workspace_id: &str, format: &str) { "Create one with: hotdata databases create --catalog ".dark_grey() ); } else { + let current = crate::config::load_current_database("default", workspace_id); let rows: Vec> = body .databases .iter() .map(|d| { + let marker = if current.as_deref() == Some(d.id.as_str()) { "*" } else { "" }; vec![ + marker.to_string(), d.id.clone(), d.name.as_deref().unwrap_or("-").to_string(), ] }) .collect(); - crate::table::print(&["ID", "NAME"], &rows); + crate::table::print(&["", "ID", "NAME"], &rows); } } _ => unreachable!(), @@ -630,12 +655,13 @@ pub fn create( format!( concat!( "Load a table:\n", - " hotdata databases load --file {}.\n", + " hotdata databases load --catalog {0} --table
--file \n", + " hotdata databases load --catalog {0} --table
--url \n", "\nQuery with:\n", - " hotdata query --database {} \"SELECT * FROM {}.public.
LIMIT 10\"\n", + " hotdata query \"SELECT * FROM {0}.public.
LIMIT 10\"\n", "\n Tip: column names are case-sensitive — wrap uppercase names in double quotes", ), - result.id, result.id, catalog + catalog ) .dark_grey() ); @@ -644,6 +670,15 @@ pub fn create( } } +pub fn unset(workspace_id: &str) { + use crossterm::style::Stylize; + if let Err(e) = crate::config::clear_current_database("default", workspace_id) { + eprintln!("{}", format!("error clearing current database: {e}").red()); + std::process::exit(1); + } + println!("{}", "Current database cleared.".green()); +} + pub fn set(workspace_id: &str, id: &str) { use crossterm::style::Stylize; let api = ApiClient::new(Some(workspace_id)); @@ -747,7 +782,26 @@ pub fn tables_load( let database = resolve_current_database(database, workspace_id); let api = ApiClient::new(Some(workspace_id)); - let db = resolve_database(&api, &database); + // Prefer the active database when its catalog or name matches the lookup key, + // avoiding ambiguity when multiple databases share the same catalog name. + let active_id = crate::config::load_current_database("default", workspace_id); + let lookup_key = match active_id.as_deref() { + Some(id) => { + if let Some(active) = api.get_none_if_not_found::(&format!("/databases/{id}")) { + if active.default_catalog.as_deref() == Some(database.as_str()) + || active.name.as_deref() == Some(database.as_str()) + { + id.to_string() + } else { + database.clone() + } + } else { + database.clone() + } + } + None => database.clone(), + }; + let db = resolve_database(&api, &lookup_key); let schema = schema_name(schema); // clap enforces mutual exclusion; only one of these is ever Some. @@ -769,19 +823,98 @@ pub fn tables_load( let (status, resp_body) = api.post_raw(&path, &body); spinner.finish_and_clear(); - if !status.is_success() { - let msg = crate::util::api_error(resp_body); - if msg.contains("not declared") { - eprintln!("{}", msg.red()); + let (status, resp_body) = if !status.is_success() + && crate::util::api_error(resp_body.clone()).contains("not declared") + { + // The table wasn't declared at create time. Collect existing tables so + // they are re-declared in the replacement database, then delete and + // recreate with all tables (including the new one) declared. + let existing = collect_tables(&api, &db.default_connection_id, None); + let mut all_tables: Vec = existing + .iter() + .map(|t| format!("{}.{}", t.schema, t.table)) + .collect(); + let new_table_key = format!("{schema}.{table}"); + if !all_tables.contains(&new_table_key) { + all_tables.push(new_table_key); + } + + // Warn if any existing table has synced data — delete+recreate will lose it. + let synced: Vec = existing + .iter() + .filter(|t| t.synced) + .map(|t| format!("{}.{}", t.schema, t.table)) + .collect(); + if !synced.is_empty() { + use crossterm::style::Stylize; + let catalog = db.default_catalog.as_deref().or(db.name.as_deref()).unwrap_or(&db.id); eprintln!( "{}", - "Declare the table when creating the database, e.g.:\n \ - hotdata databases create --table
" - .dark_grey() + format!( + "warning: declaring '{}' requires recreating the database '{catalog}'. \ + The following tables have loaded data that will be lost:\n {}", + table, + synced.join(", ") + ) + .yellow() ); - } else { - eprintln!("{}", msg.red()); + if crate::util::is_interactive() { + use std::io::Write; + eprint!("Proceed and lose this data? [y/N] "); + std::io::stderr().flush().unwrap(); + let mut input = String::new(); + std::io::stdin().read_line(&mut input).unwrap(); + if !input.trim().eq_ignore_ascii_case("y") { + eprintln!("{}", "Aborted.".red()); + std::process::exit(1); + } + } else { + eprintln!( + "{}", + "error: cannot auto-declare table in non-interactive mode — existing data would be lost. \ + Declare all tables up front with 'databases create --table '." + .red() + ); + std::process::exit(1); + } } + + let (del_status, del_body) = api.delete_raw(&format!("/databases/{}", db.id)); + if !del_status.is_success() { + eprintln!("{}", crate::util::api_error(del_body).red()); + std::process::exit(1); + } + let create_body = create_database_request( + db.name.as_deref(), + db.default_catalog.as_deref(), + schema, + &all_tables, + db.expires_at.as_deref(), + ); + let (create_status, create_body_resp) = api.post_raw("/databases", &create_body); + if !create_status.is_success() { + eprintln!("{}", crate::util::api_error(create_body_resp).red()); + std::process::exit(1); + } + let new_db: CreateDatabaseResponse = match serde_json::from_str(&create_body_resp) { + Ok(v) => v, + Err(e) => { + eprintln!("error parsing create response: {e}"); + std::process::exit(1); + } + }; + let _ = crate::config::save_current_database("default", workspace_id, &new_db.id); + let new_path = managed_table_load_path(&new_db.default_connection_id, schema, table); + let spinner = crate::util::spinner("Loading table..."); + let result = api.post_raw(&new_path, &body); + spinner.finish_and_clear(); + result + } else { + (status, resp_body) + }; + + if !status.is_success() { + eprintln!("{}", crate::util::api_error(resp_body).red()); std::process::exit(1); } @@ -978,7 +1111,7 @@ mod tests { let api = ApiClient::test_new(&server.url(), "k", None); let err = try_resolve_database(&api, "missing").unwrap_err(); - assert!(err.contains("no database with id or name")); + assert!(err.contains("no database with id")); } #[test] diff --git a/src/indexes.rs b/src/indexes.rs index 2465b2b..c670ab0 100644 --- a/src/indexes.rs +++ b/src/indexes.rs @@ -219,18 +219,8 @@ pub fn infer_for_search( let api = ApiClient::new(Some(workspace_id)); - // Resolve connection name → ID - let conn_map = connection_lookup(&api); - let connection_id = match conn_map.get(connection_name) { - Some(id) => id.clone(), - None => { - eprintln!( - "{}", - format!("Connection '{}' not found.", connection_name).red() - ); - std::process::exit(1); - } - }; + // Resolve connection name → ID (falls back to managed database catalog lookup) + let connection_id = crate::connections::resolve_connection_id(&api, connection_name); // Fetch indexes for this table let indexes = list_one_table(&api, &connection_id, schema, table); diff --git a/src/main.rs b/src/main.rs index 0f57569..7e791de 100644 --- a/src/main.rs +++ b/src/main.rs @@ -449,26 +449,28 @@ fn main() { Some(DatabasesCommands::Set { id }) => { databases::set(&workspace_id, &id) } + Some(DatabasesCommands::Unset) => { + databases::unset(&workspace_id) + } Some(DatabasesCommands::Delete { name_or_id }) => { databases::delete(&workspace_id, &name_or_id) } Some(DatabasesCommands::Load { - target, + catalog, + schema, + table, file, url, upload_id, - }) => { - let (database, schema, table) = parse_db_target(&target); - databases::tables_load( - &workspace_id, - Some(database.as_str()), - &table, - Some(schema.as_str()), - file.as_deref(), - url.as_deref(), - upload_id.as_deref(), - ) - } + }) => databases::tables_load( + &workspace_id, + Some(catalog.as_str()), + &table, + Some(schema.as_str()), + file.as_deref(), + url.as_deref(), + upload_id.as_deref(), + ), Some(DatabasesCommands::Tables { database, command }) => match command { Some(DatabaseTablesCommands::List { database: db_flag, @@ -662,9 +664,11 @@ fn main() { &output, ), IndexesCommands::Create { - target, + catalog, + schema, + table, + column, dataset_id, - columns, name, r#type, metric, @@ -676,46 +680,42 @@ fn main() { } => { let api = api::ApiClient::new(Some(&workspace_id)); let (scope, resolved_columns, auto_name) = - match (target.as_deref(), dataset_id.as_deref()) { - (Some(tgt), None) => { - let (conn_name, schema, table, cols) = - parse_index_target(tgt); - let conn_id = - connections::resolve_connection_id(&api, &conn_name); + match (catalog.as_deref().or(table.as_deref()), dataset_id.as_deref()) { + (Some(_), None) => { + let catalog_or_conn = catalog.as_deref().unwrap_or_else(|| { + eprintln!("error: --catalog is required"); + std::process::exit(1); + }); + let tbl = table.as_deref().unwrap_or_else(|| { + eprintln!("error: --table is required"); + std::process::exit(1); + }); + let cols = column.as_deref().unwrap_or_else(|| { + eprintln!("error: --column is required"); + std::process::exit(1); + }); + let conn_id = connections::resolve_connection_id(&api, catalog_or_conn); let auto = format!( - "{table}_{cols}_{type}", - cols = cols.join("_"), + "{tbl}_{cols}_{type}", + cols = cols.replace(',', "_"), type = r#type ); - ( - (conn_id, schema, table), - cols.join(","), - auto, - ) + ((conn_id, schema, tbl.to_string()), cols.to_string(), auto) } (None, Some(did)) => { - let cols = - columns.as_deref().unwrap_or_else(|| { - eprintln!( - "error: --columns is required with --dataset-id" - ); - std::process::exit(1); - }); + let cols = column.as_deref().unwrap_or_else(|| { + eprintln!("error: --column is required with --dataset-id"); + std::process::exit(1); + }); let auto = format!( "dataset_{cols}_{type}", cols = cols.replace(',', "_"), type = r#type ); - ( - (did.to_string(), String::new(), String::new()), - cols.to_string(), - auto, - ) + ((did.to_string(), String::new(), String::new()), cols.to_string(), auto) } _ => { - eprintln!( - "error: provide either (e.g. airbnb.listings[col1,col2]) or --dataset-id with --columns" - ); + eprintln!("error: provide --catalog and --table, or --dataset-id with --column"); std::process::exit(1); } }; @@ -879,6 +879,9 @@ fn main() { let sql = match resolved_type.as_str() { "bm25" => { let bm25_columns = match select.as_deref() { + Some(cols) if cols.split(',').any(|c| c.trim() == "score") => { + cols.to_string() + } Some(cols) => format!("{}, score", cols), None => "*".to_string(), }; @@ -1059,121 +1062,10 @@ fn main() { update::maybe_print_update_notice(update_handle); } -/// Parse a database target like `airbnb.listings` or `airbnb.public.listings` -/// into `(database, schema, table)`. Schema defaults to `public`. -fn parse_db_target(target: &str) -> (String, String, String) { - let parts: Vec<&str> = target.splitn(4, '.').collect(); - match parts.as_slice() { - [db, tbl] => (db.to_string(), "public".to_string(), tbl.to_string()), - [db, schema, tbl] => (db.to_string(), schema.to_string(), tbl.to_string()), - _ => { - eprintln!( - "error: target must be 'database.table' or 'database.schema.table'" - ); - std::process::exit(1); - } - } -} - -/// Parse an index target like `airbnb.listings[col1,col2]` or -/// `airbnb.public.listings[col1,col2]` into `(conn_name, schema, table, columns)`. -/// Schema defaults to `public` when only two dot-parts are given. -fn parse_index_target(target: &str) -> (String, String, String, Vec) { - let Some(bracket_pos) = target.find('[') else { - eprintln!( - "error: target must include columns in brackets, e.g. airbnb.listings[col1,col2]" - ); - std::process::exit(1); - }; - if !target.ends_with(']') { - eprintln!( - "error: target bracket is not closed — use e.g. 'airbnb.listings[col1,col2]'" - ); - std::process::exit(1); - } - let table_part = &target[..bracket_pos]; - let cols_raw = &target[bracket_pos + 1..target.len() - 1]; - - let parts: Vec<&str> = table_part.splitn(4, '.').collect(); - let (conn, schema, table) = match parts.as_slice() { - [c, t] => (c.to_string(), "public".to_string(), t.to_string()), - [c, s, t] => (c.to_string(), s.to_string(), t.to_string()), - _ => { - eprintln!( - "error: target must be 'connection.table[cols]' or 'connection.schema.table[cols]'" - ); - std::process::exit(1); - } - }; - - let columns: Vec = cols_raw - .split(',') - .map(|s| s.trim().to_string()) - .filter(|s| !s.is_empty()) - .collect(); - - if columns.is_empty() { - eprintln!("error: no columns specified in brackets"); - std::process::exit(1); - } - - (conn, schema, table, columns) -} #[cfg(test)] mod tests { use super::*; - - // --- parse_db_target --- - - #[test] - fn db_target_two_parts_defaults_schema_to_public() { - let (db, schema, table) = parse_db_target("airbnb.listings"); - assert_eq!(db, "airbnb"); - assert_eq!(schema, "public"); - assert_eq!(table, "listings"); - } - - #[test] - fn db_target_three_parts_uses_explicit_schema() { - let (db, schema, table) = parse_db_target("airbnb.staging.listings"); - assert_eq!(db, "airbnb"); - assert_eq!(schema, "staging"); - assert_eq!(table, "listings"); - } - - // --- parse_index_target --- - - #[test] - fn index_target_two_parts_defaults_schema_to_public() { - let (conn, schema, table, cols) = parse_index_target("airbnb.listings[description]"); - assert_eq!(conn, "airbnb"); - assert_eq!(schema, "public"); - assert_eq!(table, "listings"); - assert_eq!(cols, vec!["description"]); - } - - #[test] - fn index_target_three_parts_uses_explicit_schema() { - let (conn, schema, table, cols) = - parse_index_target("airbnb.public.listings[name,description]"); - assert_eq!(conn, "airbnb"); - assert_eq!(schema, "public"); - assert_eq!(table, "listings"); - assert_eq!(cols, vec!["name", "description"]); - } - - #[test] - fn index_target_multiple_columns() { - let (_, _, _, cols) = parse_index_target("db.tbl[a,b,c]"); - assert_eq!(cols, vec!["a", "b", "c"]); - } - - #[test] - fn index_target_trims_column_whitespace() { - let (_, _, _, cols) = parse_index_target("db.tbl[a, b]"); - assert_eq!(cols, vec!["a", "b"]); - } } pub fn get_styles() -> clap::builder::Styles {