Skip to content

Add ignored_index_prefixes to TableDefinition for externally-managed index drift exclusion#638

Merged
Jeadie merged 6 commits intodatafusion-contrib:spiceai-52from
Jeadie:spiceai-hnsw-index-drift
Apr 29, 2026
Merged

Add ignored_index_prefixes to TableDefinition for externally-managed index drift exclusion#638
Jeadie merged 6 commits intodatafusion-contrib:spiceai-52from
Jeadie:spiceai-hnsw-index-drift

Conversation

@Jeadie
Copy link
Copy Markdown
Collaborator

@Jeadie Jeadie commented Apr 29, 2026

Problem

Applications that create indexes on DuckDB internal tables outside the write pipeline (i.e. not registered in TableDefinition) hit spurious refresh failures. On each overwrite refresh the writer compares the indexes on the previous
internal table with the new empty one. Any index not defined in TableDefinition is flagged as "unexpected" and the refresh is aborted:

Unexpected index(es) detected in table '__data_foo_123': __spice_vss_foo_embedding.
Indexes do not match between the new table and the existing table.

Solution

Add ignored_index_prefixes: Mutex<Vec> to TableDefinition. Index names matching any registered prefix are excluded from both sides of verify_indexes_match, so they are invisible to the drift check. The Mutex allows callers to
register prefixes after construction — important when the decision to create such indexes is made in a separate setup step (e.g. vector engine registration) from table creation.

table_definition.add_ignored_index_prefix("_spice_vss");

What it is not

This does not change index creation — callers remain fully responsible for creating and managing those indexes. This only prevents the drift check from failing on indexes it doesn't own.

Jeadie added 4 commits April 29, 2026 12:29
Indexes named `__spice_vss_*` are created externally by the Spice runtime
after each full-refresh write completes. The datafusion-table-providers
overwrite flow compares indexes on the previous internal table against the
new one; these externally-managed indexes are not registered in the
`TableDefinition` configuration, causing spurious "Indexes do not match"
errors on every subsequent refresh.

Filtering them out of the actual-indexes set before the comparison lets
the drift check ignore them, consistent with how they are managed entirely
outside the table provider.
…check

Replace the hardcoded `__spice_vss_*` filter with a configurable
`ignored_index_prefixes` field on `TableDefinition`. Callers register
the prefixes of externally-managed indexes; `verify_indexes_match` then
excludes those indexes from the drift comparison so they don't cause
spurious refresh failures.
Use Mutex<Vec<String>> so callers can register externally-managed index
prefixes after the TableDefinition is created (e.g. when the vector
engine is configured in a separate registration step).
@Jeadie Jeadie self-assigned this Apr 29, 2026
@Jeadie Jeadie enabled auto-merge April 29, 2026 05:39
@Jeadie Jeadie disabled auto-merge April 29, 2026 05:39
@Jeadie Jeadie enabled auto-merge April 29, 2026 05:39
@Jeadie Jeadie merged commit df7dbc6 into datafusion-contrib:spiceai-52 Apr 29, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants