docs(skill): workflows, references, and SKILL alignment

Eddie A Tejeda · Eddie A Tejeda · commit f04caa500f1d · 2026-04-03T12:46:21.000-07:00
- Add references/WORKFLOWS.md (Model, Library, History, Chain, Indexes)
- Add references/MODEL_BUILD.md for deep data-model builds; vendor-neutral enrichment
- Add references/DATA_MODEL.template.md for project-owned model docs
- Extend SKILL.md: multi-step workflow index, links to references; keep refresh
  section from main; queries update flags; query results list+get; auth logout
  and other commands note
diff --git a/skills/hotdata-cli/SKILL.md b/skills/hotdata-cli/SKILL.md
@@ -29,6 +29,20 @@ API URL defaults to `https://api.hotdata.dev/v1` or overridden via `HOTDATA_API_
 
 All commands that accept `--workspace-id` are optional. If omitted, the active workspace is used. Use `hotdata workspaces set` to switch the active workspace interactively, or pass a workspace ID directly: `hotdata workspaces set <workspace_id>`. The active workspace is shown with a `*` marker in `hotdata workspaces list`. **Omit `--workspace-id` unless you need to target a specific workspace.**
 
+## Multi-step workflows (Model, Library, History, Chain, Indexes)
+
+These are **patterns** built from the commands below—not separate CLI subcommands:
+
+- **Model** — Markdown semantic map of your workspace (entities, keys, joins). Refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md).
+- **Library** — Curated **`hotdata queries`** entries for repeatable SQL (`queries create`, `queries run`, …).
+- **History** — Find prior **`hotdata results`** and saved queries (`results list`, `results <id>`, `queries list`).
+- **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets.main.<table>`.
+- **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes).
+
+Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md).
+
+**Project-owned files:** Put `DATA_MODEL.md` or `data_model.md` (e.g. under `docs/`) in the **directory where you run `hotdata`**—your repo or project—not under `~/.claude/skills/` or other agent skill paths. Copy the template from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) to start; use [references/MODEL_BUILD.md](references/MODEL_BUILD.md) when you need the full procedure.
+
 ## Available Commands
 
 ### List Workspaces
@@ -259,8 +273,11 @@ hotdata jobs <job_id> [--workspace-id <workspace_id>] [--format table|json|yaml]
 ```
 hotdata auth                # Browser-based login
 hotdata auth status         # Check current auth status
+hotdata auth logout         # Remove saved auth for the default profile
 ```
 
+Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions <bash|zsh|fish>`.
+
 ## Workflow: Running a Query
 
 1. List connections:
diff --git a/skills/hotdata-cli/references/DATA_MODEL.template.md b/skills/hotdata-cli/references/DATA_MODEL.template.md
@@ -0,0 +1,89 @@
+# Data model — `<project name>`
+
+> Copy this file to your **project** directory (e.g. `./DATA_MODEL.md`, `./data_model.md`, or `./docs/DATA_MODEL.md`).  
+> Do not commit workspace-specific content into agent skill folders.  
+> For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata-cli/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s.
+
+**Workspace (Hotdata):** `<workspace name or id>`  
+**Last catalog refresh:** `<YYYY-MM-DD>`
+
+## Overview
+
+What data exists, which business domains it covers, and who owns this document.  
+_(Large workspaces: add a **table of contents** here—per connection, table counts.)_
+
+## Purpose
+
+Short description of what this workspace is for and how the model should be used for queries.
+
+## Connections & sources
+
+| Connection ID | Name | Type | Role / domain |
+|---------------|------|------|---------------|
+| | | | |
+
+### Per-table detail (optional — use for deep models)
+
+_Use for important tables only, or expand all via [MODEL_BUILD.md](MODEL_BUILD.md). **Duplicate** this whole block (from the heading through the horizontal rule) for each table._
+
+#### `<connection>.<schema>.<table>`
+
+**Grain:** one row = one `…`  
+**Description:**  
+
+| Column | Type | Nullable | PK/FK | Notes |
+|--------|------|----------|-------|-------|
+
+**Relationships:** (PK, FKs, parent–child)  
+**Queryability:** (filters, joins, caveats)
+
+---
+
+## Entities and grain (summary view)
+
+For each business entity:
+
+- **Entity:**  
+- **Grain:** one row per …  
+- **Primary tables:** `connection.schema.table`  
+- **Key columns:**  
+
+## Cross-connection joins
+
+Document safe join paths and caveats (fan-out, timing, different refresh cadence, type mismatches).
+
+## Search & index summary (optional)
+
+| Table | Column | Kind (vector / text / …) | Index status | Notes |
+|-------|--------|--------------------------|--------------|-------|
+| | | | | |
+
+_Use `hotdata indexes list -c <connection_id> --schema <schema> --table <table>` per table as needed._
+
+## Datasets (uploaded)
+
+Catalog from `hotdata datasets list` / `hotdata datasets <id>`:
+
+| Label | Table name (`datasets.main.…`) | Grain | Notes |
+|-------|-------------------------------|-------|-------|
+| | | | |
+
+## Derived tables (Chain)
+
+Stable `datasets.main.*` tables built for **Chain** workflows (not necessarily uploaded file datasets):
+
+| Table name | Built from | Purpose | Owner / TTL |
+|------------|------------|---------|-------------|
+| | | | |
+
+## Saved query index (Library)
+
+Link business questions to saved queries (ids/names from `hotdata queries list`):
+
+| Question / report | Saved query name | ID (optional) |
+|-------------------|------------------|---------------|
+| | | |
+
+## Notes
+
+Assumptions, known gaps, and refresh checklist.
diff --git a/skills/hotdata-cli/references/MODEL_BUILD.md b/skills/hotdata-cli/references/MODEL_BUILD.md
@@ -0,0 +1,125 @@
+# Building a workspace data model (advanced)
+
+Optional **deep pass** for a single authoritative markdown model. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md).
+
+**Output:** Save as `DATA_MODEL.md`, `data_model.md`, or `docs/DATA_MODEL.md` in the **project directory** where you run `hotdata` (not inside agent skill folders).
+
+---
+
+## 1. Discover connections
+
+```bash
+hotdata connections list
+```
+
+For each connection, record `id`, `name`, and `source_type`.
+
+---
+
+## 2. Enumerate tables, columns, and datasets
+
+If the catalog may be **stale** (recent DDL, new tables missing), run **`hotdata connections refresh <connection_id>`** for affected connections **before** relying on `tables list`.
+
+**Per connection:**
+
+```bash
+hotdata tables list --connection-id <connection_id>
+```
+
+**Uploaded datasets:**
+
+```bash
+hotdata datasets list
+hotdata datasets <dataset_id>
+```
+
+Capture schema for each dataset (columns, types) from the detail view.
+
+You can also refresh after enumeration if you discover drift:
+
+```bash
+hotdata connections refresh <connection_id>
+```
+
+---
+
+## 3. Enrich beyond column names (optional but valuable)
+
+Use **connector and tooling docs** when `source_type` (or table shapes) match:
+
+- **Vendor / ELT docs** — Your loader or integration vendor’s published schemas for canonical tables, PKs/FKs, and field semantics (link what you use so a human can verify).
+- **dlt** — [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources) for normalized layouts.
+- **dlt-loaded data** — If you see `_dlt_id`, `_dlt_load_id`, `_dlt_parent_id`: treat as pipeline metadata; `_dlt_parent_id` often links flattened child rows to parents when no explicit FK exists. Exclude these from **grain** statements unless the question is specifically about loads.
+- **Vectors** — Columns typed as lists of floats (e.g. embedding columns) are candidates for vector search; note them.
+- **Well-known SaaS shapes** — Apply general patterns (e.g. Stripe charges/customers, HubSpot contacts/deals) only when naming and structure fit; **link** the doc you used so a human can verify.
+
+Do **not** invent facts: if context is missing, say so and suggest a small sample query:
+
+```bash
+hotdata query "SELECT * FROM <connection>.<schema>.<table> LIMIT 5"
+```
+
+---
+
+## 4. Infer relationships
+
+For each table, capture where reasonable:
+
+1. **Grain** — One row = one `…` (required per table; if unknown, say unknown).
+2. **Primary keys** — `id`, `<entity>_id`, or composite patterns from names + types.
+3. **Foreign keys** — `_id` / `_fk` / name matches to other tables; confirm with connector docs when possible.
+4. **Parent–child** — Flattened API/JSON tables (often nested names) and dlt parent keys.
+5. **Cross-connection** — Same logical entity in two connections (keys, type mismatches, caveats).
+
+For **small** schemas (e.g. ≤5 tables in a domain), a short **ASCII diagram** helps. For larger ones, group by domain in prose (e.g. billing, identity, product).
+
+---
+
+## 5. Search and index awareness
+
+For tables you care about:
+
+```bash
+hotdata indexes list -c <connection_id> --schema <schema> --table <table> [-w <workspace_id>]
+```
+
+Note:
+
+- **Vector**-friendly columns (embeddings) vs **BM25**-friendly text (`title`, `body`, `description`, …).
+- **Time** columns — event grain vs slowly changing dimensions.
+- **Facts vs dimensions** — for analytics-oriented workspaces.
+
+When suggesting a new index, use the same connection/schema/table/column names as in `tables list` and the main skill’s `indexes create` examples.
+
+---
+
+## 6. Document structure
+
+Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed:
+
+- **Overview** — Domains and what the workspace is for.
+- **Per connection** — Optional subsection per source; for **deep** models, **repeat** one block per `connection.schema.table` (grain, column table with name/type/nullable/PK-FK/notes, relationships, queryability, caveats)—the template’s single `####` heading is a pattern to copy for each table.
+- **Datasets** — Same treatment as connection tables where relevant.
+- **Cross-connection joins** — Keys, semantics, type caveats.
+- **Search / index summary** — Table, column, index status, intended use.
+
+If the workspace has **many** tables (e.g. 50+), add a **table of contents** after the overview (connection → table counts).
+
+---
+
+## Error handling
+
+- If a CLI command fails, record the error in the doc and **continue** when possible.
+- Unreachable connections or empty table lists: note in the connections table (e.g. unreachable / no tables).
+- Do not abort the whole model for one bad connection.
+
+---
+
+## Rules (keep quality high)
+
+- Every table gets an explicit **grain** (or “unknown”).
+- Prefer **documented** connector semantics over guesswork; **link** external docs when you use them.
+- Flag **test/dev** tables (`test`, `tmp`, `dev`, `staging` in names) as non-production when applicable.
+- Note **Utf8-stored numbers** and cast requirements where relevant.
+- Do not leave column **Notes** empty when domain knowledge or docs apply; “—” is weak unless the column is opaque/internal.
+- Align table names with **`hotdata tables list`** output (`connection.schema.table`).
diff --git a/skills/hotdata-cli/references/WORKFLOWS.md b/skills/hotdata-cli/references/WORKFLOWS.md