Goal: Find full-text and vector access patterns that lack indexes, then create bm25 or vector indexes when the benefit is clear.
-
Query-run history — recurring predicates or search-style SQL (
bm25_search,vector_distance, or plannedhotdata search):hotdata queries list hotdata queries <query_run_id>
-
Columns — confirm types:
hotdata tables list --connection-id <connection_id>
High-cardinality text (title, body, …) → bm25. Embedding / float list columns → vector (+ --metric).
hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>]
hotdata indexes list --dataset-id <dataset_id>Skip duplicates (same table, column, and purpose).
For managed databases (catalog alias — auto-selects the active database connection):
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
--column body --type bm25
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
--column embedding --type vector --metric cosineFor regular connections (explicit connection ID):
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
--name idx_posts_body_bm25 --column body --type bm25
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
--name idx_chunks_embedding --column embedding --type vector --metric cosineLarge builds: --async, then hotdata jobs list / hotdata jobs <job_id>.
Re-run hotdata search or representative SQL. Update context:DATAMODEL → Search & index summary via hotdata context push DATAMODEL (core skill).
- Prefer evidence (repeated search workloads) over speculative indexes.
- Get approval before production
indexes createwhen cost/impact is uncertain. - Align connection/schema/table with
hotdata tables listoutput.