Skip to content

Commit f04caa5

Browse files
author
Eddie A Tejeda
committed
docs(skill): workflows, references, and SKILL alignment
- Add references/WORKFLOWS.md (Model, Library, History, Chain, Indexes) - Add references/MODEL_BUILD.md for deep data-model builds; vendor-neutral enrichment - Add references/DATA_MODEL.template.md for project-owned model docs - Extend SKILL.md: multi-step workflow index, links to references; keep refresh section from main; queries update flags; query results list+get; auth logout and other commands note
1 parent 49dbeea commit f04caa5

4 files changed

Lines changed: 443 additions & 0 deletions

File tree

skills/hotdata-cli/SKILL.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,20 @@ API URL defaults to `https://api.hotdata.dev/v1` or overridden via `HOTDATA_API_
2929

3030
All commands that accept `--workspace-id` are optional. If omitted, the active workspace is used. Use `hotdata workspaces set` to switch the active workspace interactively, or pass a workspace ID directly: `hotdata workspaces set <workspace_id>`. The active workspace is shown with a `*` marker in `hotdata workspaces list`. **Omit `--workspace-id` unless you need to target a specific workspace.**
3131

32+
## Multi-step workflows (Model, Library, History, Chain, Indexes)
33+
34+
These are **patterns** built from the commands below—not separate CLI subcommands:
35+
36+
- **Model** — Markdown semantic map of your workspace (entities, keys, joins). Refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md).
37+
- **Library** — Curated **`hotdata queries`** entries for repeatable SQL (`queries create`, `queries run`, …).
38+
- **History** — Find prior **`hotdata results`** and saved queries (`results list`, `results <id>`, `queries list`).
39+
- **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets.main.<table>`.
40+
- **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes).
41+
42+
Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md).
43+
44+
**Project-owned files:** Put `DATA_MODEL.md` or `data_model.md` (e.g. under `docs/`) in the **directory where you run `hotdata`**—your repo or project—not under `~/.claude/skills/` or other agent skill paths. Copy the template from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) to start; use [references/MODEL_BUILD.md](references/MODEL_BUILD.md) when you need the full procedure.
45+
3246
## Available Commands
3347

3448
### List Workspaces
@@ -259,8 +273,11 @@ hotdata jobs <job_id> [--workspace-id <workspace_id>] [--format table|json|yaml]
259273
```
260274
hotdata auth # Browser-based login
261275
hotdata auth status # Check current auth status
276+
hotdata auth logout # Remove saved auth for the default profile
262277
```
263278

279+
Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions <bash|zsh|fish>`.
280+
264281
## Workflow: Running a Query
265282

266283
1. List connections:
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Data model — `<project name>`
2+
3+
> Copy this file to your **project** directory (e.g. `./DATA_MODEL.md`, `./data_model.md`, or `./docs/DATA_MODEL.md`).
4+
> Do not commit workspace-specific content into agent skill folders.
5+
> For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata-cli/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s.
6+
7+
**Workspace (Hotdata):** `<workspace name or id>`
8+
**Last catalog refresh:** `<YYYY-MM-DD>`
9+
10+
## Overview
11+
12+
What data exists, which business domains it covers, and who owns this document.
13+
_(Large workspaces: add a **table of contents** here—per connection, table counts.)_
14+
15+
## Purpose
16+
17+
Short description of what this workspace is for and how the model should be used for queries.
18+
19+
## Connections & sources
20+
21+
| Connection ID | Name | Type | Role / domain |
22+
|---------------|------|------|---------------|
23+
| | | | |
24+
25+
### Per-table detail (optional — use for deep models)
26+
27+
_Use for important tables only, or expand all via [MODEL_BUILD.md](MODEL_BUILD.md). **Duplicate** this whole block (from the heading through the horizontal rule) for each table._
28+
29+
#### `<connection>.<schema>.<table>`
30+
31+
**Grain:** one row = one ``
32+
**Description:**
33+
34+
| Column | Type | Nullable | PK/FK | Notes |
35+
|--------|------|----------|-------|-------|
36+
37+
**Relationships:** (PK, FKs, parent–child)
38+
**Queryability:** (filters, joins, caveats)
39+
40+
---
41+
42+
## Entities and grain (summary view)
43+
44+
For each business entity:
45+
46+
- **Entity:**
47+
- **Grain:** one row per …
48+
- **Primary tables:** `connection.schema.table`
49+
- **Key columns:**
50+
51+
## Cross-connection joins
52+
53+
Document safe join paths and caveats (fan-out, timing, different refresh cadence, type mismatches).
54+
55+
## Search & index summary (optional)
56+
57+
| Table | Column | Kind (vector / text / …) | Index status | Notes |
58+
|-------|--------|--------------------------|--------------|-------|
59+
| | | | | |
60+
61+
_Use `hotdata indexes list -c <connection_id> --schema <schema> --table <table>` per table as needed._
62+
63+
## Datasets (uploaded)
64+
65+
Catalog from `hotdata datasets list` / `hotdata datasets <id>`:
66+
67+
| Label | Table name (`datasets.main.…`) | Grain | Notes |
68+
|-------|-------------------------------|-------|-------|
69+
| | | | |
70+
71+
## Derived tables (Chain)
72+
73+
Stable `datasets.main.*` tables built for **Chain** workflows (not necessarily uploaded file datasets):
74+
75+
| Table name | Built from | Purpose | Owner / TTL |
76+
|------------|------------|---------|-------------|
77+
| | | | |
78+
79+
## Saved query index (Library)
80+
81+
Link business questions to saved queries (ids/names from `hotdata queries list`):
82+
83+
| Question / report | Saved query name | ID (optional) |
84+
|-------------------|------------------|---------------|
85+
| | | |
86+
87+
## Notes
88+
89+
Assumptions, known gaps, and refresh checklist.
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Building a workspace data model (advanced)
2+
3+
Optional **deep pass** for a single authoritative markdown model. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md).
4+
5+
**Output:** Save as `DATA_MODEL.md`, `data_model.md`, or `docs/DATA_MODEL.md` in the **project directory** where you run `hotdata` (not inside agent skill folders).
6+
7+
---
8+
9+
## 1. Discover connections
10+
11+
```bash
12+
hotdata connections list
13+
```
14+
15+
For each connection, record `id`, `name`, and `source_type`.
16+
17+
---
18+
19+
## 2. Enumerate tables, columns, and datasets
20+
21+
If the catalog may be **stale** (recent DDL, new tables missing), run **`hotdata connections refresh <connection_id>`** for affected connections **before** relying on `tables list`.
22+
23+
**Per connection:**
24+
25+
```bash
26+
hotdata tables list --connection-id <connection_id>
27+
```
28+
29+
**Uploaded datasets:**
30+
31+
```bash
32+
hotdata datasets list
33+
hotdata datasets <dataset_id>
34+
```
35+
36+
Capture schema for each dataset (columns, types) from the detail view.
37+
38+
You can also refresh after enumeration if you discover drift:
39+
40+
```bash
41+
hotdata connections refresh <connection_id>
42+
```
43+
44+
---
45+
46+
## 3. Enrich beyond column names (optional but valuable)
47+
48+
Use **connector and tooling docs** when `source_type` (or table shapes) match:
49+
50+
- **Vendor / ELT docs** — Your loader or integration vendor’s published schemas for canonical tables, PKs/FKs, and field semantics (link what you use so a human can verify).
51+
- **dlt**[verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources) for normalized layouts.
52+
- **dlt-loaded data** — If you see `_dlt_id`, `_dlt_load_id`, `_dlt_parent_id`: treat as pipeline metadata; `_dlt_parent_id` often links flattened child rows to parents when no explicit FK exists. Exclude these from **grain** statements unless the question is specifically about loads.
53+
- **Vectors** — Columns typed as lists of floats (e.g. embedding columns) are candidates for vector search; note them.
54+
- **Well-known SaaS shapes** — Apply general patterns (e.g. Stripe charges/customers, HubSpot contacts/deals) only when naming and structure fit; **link** the doc you used so a human can verify.
55+
56+
Do **not** invent facts: if context is missing, say so and suggest a small sample query:
57+
58+
```bash
59+
hotdata query "SELECT * FROM <connection>.<schema>.<table> LIMIT 5"
60+
```
61+
62+
---
63+
64+
## 4. Infer relationships
65+
66+
For each table, capture where reasonable:
67+
68+
1. **Grain** — One row = one `` (required per table; if unknown, say unknown).
69+
2. **Primary keys**`id`, `<entity>_id`, or composite patterns from names + types.
70+
3. **Foreign keys**`_id` / `_fk` / name matches to other tables; confirm with connector docs when possible.
71+
4. **Parent–child** — Flattened API/JSON tables (often nested names) and dlt parent keys.
72+
5. **Cross-connection** — Same logical entity in two connections (keys, type mismatches, caveats).
73+
74+
For **small** schemas (e.g. ≤5 tables in a domain), a short **ASCII diagram** helps. For larger ones, group by domain in prose (e.g. billing, identity, product).
75+
76+
---
77+
78+
## 5. Search and index awareness
79+
80+
For tables you care about:
81+
82+
```bash
83+
hotdata indexes list -c <connection_id> --schema <schema> --table <table> [-w <workspace_id>]
84+
```
85+
86+
Note:
87+
88+
- **Vector**-friendly columns (embeddings) vs **BM25**-friendly text (`title`, `body`, `description`, …).
89+
- **Time** columns — event grain vs slowly changing dimensions.
90+
- **Facts vs dimensions** — for analytics-oriented workspaces.
91+
92+
When suggesting a new index, use the same connection/schema/table/column names as in `tables list` and the main skill’s `indexes create` examples.
93+
94+
---
95+
96+
## 6. Document structure
97+
98+
Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed:
99+
100+
- **Overview** — Domains and what the workspace is for.
101+
- **Per connection** — Optional subsection per source; for **deep** models, **repeat** one block per `connection.schema.table` (grain, column table with name/type/nullable/PK-FK/notes, relationships, queryability, caveats)—the template’s single `####` heading is a pattern to copy for each table.
102+
- **Datasets** — Same treatment as connection tables where relevant.
103+
- **Cross-connection joins** — Keys, semantics, type caveats.
104+
- **Search / index summary** — Table, column, index status, intended use.
105+
106+
If the workspace has **many** tables (e.g. 50+), add a **table of contents** after the overview (connection → table counts).
107+
108+
---
109+
110+
## Error handling
111+
112+
- If a CLI command fails, record the error in the doc and **continue** when possible.
113+
- Unreachable connections or empty table lists: note in the connections table (e.g. unreachable / no tables).
114+
- Do not abort the whole model for one bad connection.
115+
116+
---
117+
118+
## Rules (keep quality high)
119+
120+
- Every table gets an explicit **grain** (or “unknown”).
121+
- Prefer **documented** connector semantics over guesswork; **link** external docs when you use them.
122+
- Flag **test/dev** tables (`test`, `tmp`, `dev`, `staging` in names) as non-production when applicable.
123+
- Note **Utf8-stored numbers** and cast requirements where relevant.
124+
- Do not leave column **Notes** empty when domain knowledge or docs apply; “—” is weak unless the column is opaque/internal.
125+
- Align table names with **`hotdata tables list`** output (`connection.schema.table`).

0 commit comments

Comments
 (0)