Elasticsearch: support dense, sparse, hybrid with inference in Elasticsearch

## Summary and motivation

Elasticsearch offers multiple retrieval features including
- approximate dense vector retrieval with embedding inference in Python or in Elasticsearch
- exact dense vector retrieval with embedding inference in Python
- sparse vector retrieval with embedding inference in Elasticsearch
- hybrid retrieval (dense+BM25) with embedding inference in Elasticsearch

Other libraries such as LangChain already have all these options integrated. It would be great to also have them available in Haystack. Elastic is currently working on a Python package that will make the integration of these features easier. Here we want to discuss how to best make them available.

### Questions
- Does Haystack want to enable inference in Elasticsearch? The current design assumes that mapping from input string to embedding vector is done in Python before calling a retriever. With inference in Elasticsearch, this would change. For example, users could configure a dense vector model in Elasticsearch and then use input strings in Haystack.
- The options mentioned above require different ways of indexing the data. How to best incorporate this requirement? The current document store abstraction kind of assumes that there is only one way of indexing.

## Detailed design

**Concrete proposal:**

1. `ElasticsearchDocumentStore` takes an argument `retrieval_strategy` similarly to how it is down in [LangChain](https://github.com/langchain-ai/langchain-elastic/blob/97c0016220a44bd5e370e11c939559d97c927615/libs/elasticsearch/langchain_elasticsearch/vectorstores.py#L599). Calls to `write_documents` make use of the retrieval strategy to know how to index the data.
2. We add a number of different retrievers (`ElasticsearchDenseVectorRetriever`, `ElasticsearchSparseVectorRetriever`, `ElasticsearchHybridRetriever`, ...) that get initialized with an `ElasticsearchDocumentStore`. The retrieval strategy has to match the expectation of the individual retrievers. We check that the expectation is met upon initialization. For retrieving documents, the retrievers call a search method on the document store as this is the established pattern.


## Checklist

If the request is accepted, ensure the following checklist is complete before closing this issue.
```[tasklist]
### Tasks
- [ ] The code is documented with docstrings and was merged in the `main` branch
- [ ] Docs are published at https://docs.haystack.deepset.ai/
- [ ] There is a Github workflow running the tests for the integration nightly and at every PR
- [ ] A label named like `integration:<your integration name>` has been added to this repo
- [ ] The [labeler.yml](https://github.com/deepset-ai/haystack-core-integrations/blob/main/.github/labeler.yml) file has been updated
- [ ] The package has been released on PyPI
- [ ] An integration tile has been added to https://github.com/deepset-ai/haystack-integrations
- [ ] The integration has been listed in the [Inventory section](https://github.com/deepset-ai/haystack-core-integrations#inventory) of this repo README
- [ ] There is an example available to demonstrate the feature
- [ ] The feature was announced through social media
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch: support dense, sparse, hybrid with inference in Elasticsearch #699

Summary and motivation

Questions

Detailed design

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Elasticsearch: support dense, sparse, hybrid with inference in Elasticsearch #699

Description

Summary and motivation

Questions

Detailed design

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions