Index workflow (BM25 and vector)

Goal: Find full-text and vector access patterns that lack indexes, then create bm25 or vector indexes when the benefit is clear.

1. Gather workload and schema

Query-run history — recurring predicates or search-style SQL (bm25_search, vector_distance, or planned hotdata search):
```
hotdata queries list
hotdata queries <query_run_id>
```

Columns — confirm types:

hotdata tables list --connection-id <connection_id>

High-cardinality text (title, body, …) → bm25. Embedding / float list columns → vector (+ --metric).

2. Compare to existing indexes

hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>]
hotdata indexes list --dataset-id <dataset_id>

Skip duplicates (same table, column, and purpose).

3. Create indexes

For managed databases (catalog alias — auto-selects the active database connection):

hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
  --column body --type bm25

hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
  --column embedding --type vector --metric cosine

For regular connections (explicit connection ID):

hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
  --name idx_posts_body_bm25 --column body --type bm25

hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
  --name idx_chunks_embedding --column embedding --type vector --metric cosine

Large builds: --async, then hotdata jobs list / hotdata jobs <job_id>.

4. Verify

Re-run hotdata search or representative SQL. Update context:DATAMODEL → Search & index summary via hotdata context push DATAMODEL (core skill).

Guardrails

Prefer evidence (repeated search workloads) over speculative indexes.
Get approval before production indexes create when cost/impact is uncertain.
Align connection/schema/table with hotdata tables list output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Index workflow (BM25 and vector)

1. Gather workload and schema

2. Compare to existing indexes

3. Create indexes

4. Verify

Guardrails

Uh oh!

FilesExpand file tree

INDEXES.md

Latest commit

History

INDEXES.md

File metadata and controls

Index workflow (BM25 and vector)

1. Gather workload and schema

2. Compare to existing indexes

3. Create indexes

4. Verify

Guardrails