feat: add Elasticsearch sparse retrieval support#3104
Open
GunaPalanivel wants to merge 25 commits intodeepset-ai:mainfrom
Open
feat: add Elasticsearch sparse retrieval support#3104GunaPalanivel wants to merge 25 commits intodeepset-ai:mainfrom
GunaPalanivel wants to merge 25 commits intodeepset-ai:mainfrom
Conversation
…lization - Update est_bm25_retriever.py and est_embedding_retriever.py to include sparse_vector_field in serialized document_store init parameters.
- Add est_write_documents_with_sparse_vectors and est_write_documents_with_sparse_embedding_warning to est_document_store.py - Add est_write_documents_async_with_sparse_vectors to est_document_store_async.py - Update existing warning test in est_document_store_async.py - Add est_init_with_sparse_vector_field and update serialization tests.
- Add SPECIAL_FIELDS validation for sparse_vector_field in __init__ - Add sparse_vector_field to __init__ docstring - Inject sparse_vector mapping into custom_mapping when both provided - Extract _handle_sparse_embedding helper to deduplicate write methods - Convert _deserialize_document to reconstruct SparseEmbedding on read
- Add SPECIAL_FIELDS validation test - Add custom_mapping injection test - Add legacy from_dict backward compat test - Fix async test to use async_client for index deletion - Add retrieval reconstruction assertions to sync and async sparse tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds sparse retrieval support to the Elasticsearch integration and exposes it through
ElasticsearchSparseEmbeddingRetriever.Why
ElasticsearchSparseEmbeddingRetriever(precomputed sparse) #2941Sparse embeddings could be stored on the parent branch, but there was no retrieval path for a precomputed sparse query embedding.
How
The Elasticsearch document store now provides sparse retrieval methods that build the
sparse_vectorquery. The retriever follows the same component pattern as the existing BM25 and dense retrievers and calls into the document store. The test suite now skips sparse-query integration cases on Elasticsearch versions that do not supportsparse_vectorqueries.Testing
hatch run fmt-checkhatch run test:typeshatch run test:unithatch run test:integrationTrade-offs
Kept the scope limited to retrieval only. No hybrid retrieval or inference-in-Elasticsearch changes are included here.