Skip to content

feat: add Elasticsearch sparse retrieval support#3104

Open
GunaPalanivel wants to merge 25 commits intodeepset-ai:mainfrom
GunaPalanivel:feat/2941-elasticsearch-sparse-embedding-retriever
Open

feat: add Elasticsearch sparse retrieval support#3104
GunaPalanivel wants to merge 25 commits intodeepset-ai:mainfrom
GunaPalanivel:feat/2941-elasticsearch-sparse-embedding-retriever

Conversation

@GunaPalanivel
Copy link
Copy Markdown
Contributor

@GunaPalanivel GunaPalanivel commented Apr 3, 2026

What

Adds sparse retrieval support to the Elasticsearch integration and exposes it through ElasticsearchSparseEmbeddingRetriever.

Why

Sparse embeddings could be stored on the parent branch, but there was no retrieval path for a precomputed sparse query embedding.

How

The Elasticsearch document store now provides sparse retrieval methods that build the sparse_vector query. The retriever follows the same component pattern as the existing BM25 and dense retrievers and calls into the document store. The test suite now skips sparse-query integration cases on Elasticsearch versions that do not support sparse_vector queries.

Testing

  • hatch run fmt-check
  • hatch run test:types
  • hatch run test:unit
  • hatch run test:integration

Trade-offs

Kept the scope limited to retrieval only. No hybrid retrieval or inference-in-Elasticsearch changes are included here.

GunaPalanivel and others added 23 commits March 19, 2026 13:05
…lization

- Update 	est_bm25_retriever.py and 	est_embedding_retriever.py to include sparse_vector_field in serialized document_store init parameters.
- Add 	est_write_documents_with_sparse_vectors and 	est_write_documents_with_sparse_embedding_warning to 	est_document_store.py
- Add 	est_write_documents_async_with_sparse_vectors to 	est_document_store_async.py
- Update existing warning test in 	est_document_store_async.py
- Add 	est_init_with_sparse_vector_field and update serialization tests.
- Add SPECIAL_FIELDS validation for sparse_vector_field in __init__

- Add sparse_vector_field to __init__ docstring

- Inject sparse_vector mapping into custom_mapping when both provided

- Extract _handle_sparse_embedding helper to deduplicate write methods

- Convert _deserialize_document to reconstruct SparseEmbedding on read
- Add SPECIAL_FIELDS validation test

- Add custom_mapping injection test

- Add legacy from_dict backward compat test

- Fix async test to use async_client for index deletion

- Add retrieval reconstruction assertions to sync and async sparse tests
@GunaPalanivel GunaPalanivel requested a review from a team as a code owner April 3, 2026 18:00
@GunaPalanivel GunaPalanivel requested review from bogdankostic and removed request for a team April 3, 2026 18:00
@github-actions github-actions bot added integration:elasticsearch type:documentation Improvements or additions to documentation labels Apr 3, 2026
@bogdankostic bogdankostic requested review from davidsbatista and removed request for bogdankostic April 7, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:elasticsearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ElasticsearchSparseEmbeddingRetriever (precomputed sparse)

2 participants