fix(oracle): hybrid metadata filters + OVERWRITE upsert (delta for #3426)#2
Closed
fede-kamel wants to merge 1 commit into
Conversation
…psert Two bugs confirmed against live Oracle 26ai. Both are invisible in CI because the gvenzl test image ships without Oracle Text / DBMS_SEARCH, so the keyword index (and its trigger) is never created. - Hybrid retrieval with any Haystack metadata filter raised ORA-20000/ORA-00904. DBMS_HYBRID_VECTOR.SEARCH filter_by resolves paths to base-table columns, not JSON metadata fields, so meta.* filters could never match. Metadata filters are now applied as a SQL predicate while fetching the ranked hits; a native filter_by over declared filterable columns can still be supplied via params. This also fixes positional score misalignment when a ranked hit is filtered out. - write_documents(policy=OVERWRITE) raised ORA-06531 in the DBMS_SEARCH keyword-index trigger. A MERGE combining WHEN MATCHED UPDATE with WHEN NOT MATCHED INSERT trips the trigger even when every row is an insert; replaced with delete-then-insert (rows de-duplicated by id, last wins). Removes the now-unused to_hybrid_filter helper. Adds live integration tests for both paths and wallet support in conftest so the suite can run against an ADB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Delta fix for deepset-ai#3426, branched directly off
oracle-upstream-compat-features, so this PR contains only the changes on top of your work.Both bugs were reproduced and the fixes verified against a live Oracle 26ai database (Oracle Text /
DBMS_SEARCH,DBMS_VECTOR_CHAIN,DBMS_HYBRID_VECTOR, in-DB ONNX embedding model). They don't surface in CI because thegvenzl/oracle-freeimage ships without Oracle Text, so the keyword index and its trigger never exist there.🔴 Hybrid retrieval + any metadata filter →
ORA-20000/ORA-00904to_hybrid_filter()producedfilter_by={"path": "meta.lang", ...}, butDBMS_HYBRID_VECTOR.SEARCH'sfilter_by.pathresolves to a column on the base table, not a JSON path intometadata:Fix: Haystack metadata filters are applied as a SQL predicate while fetching the ranked hits (post-filter); a native
filter_byover columns declared withFILTER BYat index creation is still available viaparams. Also fixes positional score-misalignment when a ranked hit is filtered out.🔴
write_documents(policy=OVERWRITE)→ORA-06531The OVERWRITE
MERGE(WHEN MATCHED UPDATE+WHEN NOT MATCHED INSERT) fails inside theDBMS_SEARCHkeyword-index trigger, even on an empty table where every row is an insert (NONEandSKIPwork). Fix: delete-then-insert (rows de-duplicated by id, last wins).Also
to_hybrid_filterhelper.test_hybrid_retriever_with_metadata_filter_live,test_write_documents_overwrite_policy_live.conftest.pyso the suite can run against an Oracle ADB.Validation
Unit (51) ✅ · ruff ✅ · mypy ✅ · live targeted tests ✅ (no regression on the existing hybrid test) · base + features suite (112 selected) ✅.