Add ignored_index_prefixes to TableDefinition for externally-managed index drift exclusion#638
Merged
Jeadie merged 6 commits intodatafusion-contrib:spiceai-52from Apr 29, 2026
Conversation
Indexes named `__spice_vss_*` are created externally by the Spice runtime after each full-refresh write completes. The datafusion-table-providers overwrite flow compares indexes on the previous internal table against the new one; these externally-managed indexes are not registered in the `TableDefinition` configuration, causing spurious "Indexes do not match" errors on every subsequent refresh. Filtering them out of the actual-indexes set before the comparison lets the drift check ignore them, consistent with how they are managed entirely outside the table provider.
…check Replace the hardcoded `__spice_vss_*` filter with a configurable `ignored_index_prefixes` field on `TableDefinition`. Callers register the prefixes of externally-managed indexes; `verify_indexes_match` then excludes those indexes from the drift comparison so they don't cause spurious refresh failures.
Use Mutex<Vec<String>> so callers can register externally-managed index prefixes after the TableDefinition is created (e.g. when the vector engine is configured in a separate registration step).
peasee
approved these changes
Apr 29, 2026
peasee
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Applications that create indexes on DuckDB internal tables outside the write pipeline (i.e. not registered in TableDefinition) hit spurious refresh failures. On each overwrite refresh the writer compares the indexes on the previous
internal table with the new empty one. Any index not defined in TableDefinition is flagged as "unexpected" and the refresh is aborted:
Unexpected index(es) detected in table '__data_foo_123': __spice_vss_foo_embedding.
Indexes do not match between the new table and the existing table.
Solution
Add ignored_index_prefixes: Mutex<Vec> to TableDefinition. Index names matching any registered prefix are excluded from both sides of verify_indexes_match, so they are invisible to the drift check. The Mutex allows callers to
register prefixes after construction — important when the decision to create such indexes is made in a separate setup step (e.g. vector engine registration) from table creation.
table_definition.add_ignored_index_prefix("_spice_vss");
What it is not
This does not change index creation — callers remain fully responsible for creating and managing those indexes. This only prevents the drift check from failing on indexes it doesn't own.