Skip to content

fix(oracle): hybrid metadata filters + OVERWRITE upsert (delta for #3426)#2

Closed
fede-kamel wants to merge 1 commit into
fileames:oracle-upstream-compat-featuresfrom
fede-kamel:fix/oracle-hybrid-filter-and-overwrite
Closed

fix(oracle): hybrid metadata filters + OVERWRITE upsert (delta for #3426)#2
fede-kamel wants to merge 1 commit into
fileames:oracle-upstream-compat-featuresfrom
fede-kamel:fix/oracle-hybrid-filter-and-overwrite

Conversation

@fede-kamel

Copy link
Copy Markdown

Delta fix for deepset-ai#3426, branched directly off oracle-upstream-compat-features, so this PR contains only the changes on top of your work.

Both bugs were reproduced and the fixes verified against a live Oracle 26ai database (Oracle Text / DBMS_SEARCH, DBMS_VECTOR_CHAIN, DBMS_HYBRID_VECTOR, in-DB ONNX embedding model). They don't surface in CI because the gvenzl/oracle-free image ships without Oracle Text, so the keyword index and its trigger never exist there.

🔴 Hybrid retrieval + any metadata filter → ORA-20000 / ORA-00904

to_hybrid_filter() produced filter_by={"path": "meta.lang", ...}, but DBMS_HYBRID_VECTOR.SEARCH's filter_by.path resolves to a column on the base table, not a JSON path into metadata:

ORA-00904: "THEBASE"."META"."LANG": invalid identifier   -- path "meta.lang"
ORA-00904: "THEBASE"."LANG": invalid identifier          -- even with the meta. prefix stripped

Fix: Haystack metadata filters are applied as a SQL predicate while fetching the ranked hits (post-filter); a native filter_by over columns declared with FILTER BY at index creation is still available via params. Also fixes positional score-misalignment when a ranked hit is filtered out.

🔴 write_documents(policy=OVERWRITE)ORA-06531

The OVERWRITE MERGE (WHEN MATCHED UPDATE + WHEN NOT MATCHED INSERT) fails inside the DBMS_SEARCH keyword-index trigger, even on an empty table where every row is an insert (NONE and SKIP work). Fix: delete-then-insert (rows de-duplicated by id, last wins).

Also

  • Removes the now-unused to_hybrid_filter helper.
  • Adds live integration tests: test_hybrid_retriever_with_metadata_filter_live, test_write_documents_overwrite_policy_live.
  • Adds wallet support in conftest.py so the suite can run against an Oracle ADB.

Validation

Unit (51) ✅ · ruff ✅ · mypy ✅ · live targeted tests ✅ (no regression on the existing hybrid test) · base + features suite (112 selected) ✅.

…psert

Two bugs confirmed against live Oracle 26ai. Both are invisible in CI because the
gvenzl test image ships without Oracle Text / DBMS_SEARCH, so the keyword index
(and its trigger) is never created.

- Hybrid retrieval with any Haystack metadata filter raised ORA-20000/ORA-00904.
  DBMS_HYBRID_VECTOR.SEARCH filter_by resolves paths to base-table columns, not
  JSON metadata fields, so meta.* filters could never match. Metadata filters are
  now applied as a SQL predicate while fetching the ranked hits; a native
  filter_by over declared filterable columns can still be supplied via params.
  This also fixes positional score misalignment when a ranked hit is filtered out.

- write_documents(policy=OVERWRITE) raised ORA-06531 in the DBMS_SEARCH
  keyword-index trigger. A MERGE combining WHEN MATCHED UPDATE with WHEN NOT
  MATCHED INSERT trips the trigger even when every row is an insert; replaced with
  delete-then-insert (rows de-duplicated by id, last wins).

Removes the now-unused to_hybrid_filter helper. Adds live integration tests for
both paths and wallet support in conftest so the suite can run against an ADB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant