|
| 1 | +--- |
| 2 | +name: hotdata-analytics |
| 3 | +description: Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain via datasets or managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, datasets, and auth. |
| 4 | +version: 0.2.3 |
| 5 | +--- |
| 6 | + |
| 7 | +# Hotdata Analytics Skill |
| 8 | + |
| 9 | +**OLAP-style analytics** in Hotdata: PostgreSQL-dialect SQL, query execution, run history, stored results, **Chain** materializations, and **sorted** indexes for filters and joins. |
| 10 | + |
| 11 | +**Prerequisites:** Authenticate, workspace, and catalog discovery via the **`hotdata`** skill (`connections`, `tables`, `datasets`, `databases`). |
| 12 | + |
| 13 | +**Related skills:** **`hotdata-search`** (BM25, vector, retrieval indexes), **`hotdata-geospatial`** (spatial SQL). |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Execute SQL |
| 18 | + |
| 19 | +```bash |
| 20 | +hotdata query "<sql>" [--workspace-id <workspace_id>] [--connection <connection_id>] [--output table|json|csv] |
| 21 | +hotdata query status <query_run_id> [--output table|json|csv] |
| 22 | +``` |
| 23 | + |
| 24 | +- **PostgreSQL dialect.** Quote mixed-case identifiers: `"CustomerName"`. |
| 25 | +- Use **`hotdata tables list`** for schema discovery — not `information_schema` via `query`. |
| 26 | +- Fully qualified names: `<connection>.<schema>.<table>`, `datasets.<schema>.<table>`, `<database>.<schema>.<table>`. |
| 27 | +- Long-running queries may return `query_run_id` → poll with **`query status`** (exit `2` = still running). Do not re-run identical heavy SQL while polling. |
| 28 | +- For **workspace-wide** joins and naming, load **context:DATAMODEL** when listed (`hotdata context list` → `show DATAMODEL`) — see **`hotdata`** skill. |
| 29 | + |
| 30 | +### OLAP patterns |
| 31 | + |
| 32 | +Typical analytics SQL (all via `hotdata query`): |
| 33 | + |
| 34 | +- **Aggregations:** `COUNT`, `SUM`, `AVG`, `MIN`, `MAX` with `GROUP BY` |
| 35 | +- **Joins:** `INNER` / `LEFT JOIN` across `<connection>.<schema>.<table>` names |
| 36 | +- **Filtering:** `WHERE` on partition-friendly columns (consider **sorted** indexes below) |
| 37 | +- **Ordering:** `ORDER BY` on metrics or dimensions |
| 38 | +- **Bounded exploration:** always `LIMIT` while iterating; widen once validated |
| 39 | + |
| 40 | +Column names from CSV uploads may be case-sensitive — use double quotes when not all-lowercase. |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## Query run history |
| 45 | + |
| 46 | +Uses the **active workspace only** (no `--workspace-id`; set with `hotdata workspaces set`). |
| 47 | + |
| 48 | +```bash |
| 49 | +hotdata queries list [--limit <int>] [--cursor <token>] [--status <csv>] [--output table|json|yaml] |
| 50 | +hotdata queries <query_run_id> [--output table|json|yaml] |
| 51 | +``` |
| 52 | + |
| 53 | +- `list` — status, duration, row count, SQL preview (default limit 20). Filter: `--status running,failed`. |
| 54 | +- `<query_run_id>` — full metadata, formatted SQL, `result_id` when present. |
| 55 | +- Use history to find recurring `WHERE` / `JOIN` / `GROUP BY` patterns before adding indexes (search skill) or chains. |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## Stored results |
| 60 | + |
| 61 | +```bash |
| 62 | +hotdata results list [--workspace-id <workspace_id>] [--limit <int>] [--offset <int>] [--output table|json|yaml] |
| 63 | +hotdata results <result_id> [--workspace-id <workspace_id>] [--output table|json|csv] |
| 64 | +``` |
| 65 | + |
| 66 | +- Prefer **`results <id>`** over re-running identical heavy queries. |
| 67 | +- Query footers may include `[result-id: rslt...]`; also available from `queries <query_run_id>`. |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Chain (materialized follow-ups) |
| 72 | + |
| 73 | +**Pattern:** run SQL → materialize a smaller table → query the materialized name. |
| 74 | + |
| 75 | +1. **Base query** |
| 76 | + |
| 77 | + ```bash |
| 78 | + hotdata query "SELECT ..." |
| 79 | + hotdata query status <query_run_id> # if async |
| 80 | + ``` |
| 81 | + |
| 82 | +2. **Materialize** (pick one) |
| 83 | + |
| 84 | + ```bash |
| 85 | + hotdata datasets create --label "chain slice" --sql "SELECT ..." [--table-name chain_slice] |
| 86 | + hotdata datasets create --label "from saved" --query-id <query_id> [--table-name ...] |
| 87 | + ``` |
| 88 | + |
| 89 | + Or managed parquet: |
| 90 | + |
| 91 | + ```bash |
| 92 | + hotdata databases create --name analytics --table slice |
| 93 | + hotdata databases tables load analytics slice --file ./slice.parquet |
| 94 | + ``` |
| 95 | + |
| 96 | +3. **Chain query** — use printed **`full_name`** or `datasets list` **FULL NAME** column: |
| 97 | + |
| 98 | + ```bash |
| 99 | + hotdata query "SELECT * FROM datasets.main.chain_slice WHERE ..." |
| 100 | + hotdata query "SELECT * FROM analytics.public.slice WHERE ..." |
| 101 | + ``` |
| 102 | + |
| 103 | +Document stable chains in **context:DATAMODEL → Derived tables (Chain)**. |
| 104 | + |
| 105 | +Full procedure: [references/WORKFLOWS.md](references/WORKFLOWS.md). |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Sorted indexes (filters and range scans) |
| 110 | + |
| 111 | +For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hotdata-search`**): |
| 112 | + |
| 113 | +```bash |
| 114 | +hotdata indexes create --connection-id <id> --schema <schema> --table <table> \ |
| 115 | + --name idx_orders_created --columns created_at --type sorted [--async] |
| 116 | +``` |
| 117 | + |
| 118 | +List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here. |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## Sandboxes and chains |
| 123 | + |
| 124 | +Sandbox datasets use **`datasets.<sandbox_id>.<table>`**, not `datasets.main`. Run queries with active sandbox config or `hotdata sandbox <id> run hotdata query "..."`. See **`hotdata`** skill **Sandboxes**. |
0 commit comments