You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update milvus-document-store.md
a quick sync with https://github.com/milvus-io/milvus-haystack/blob/main/README.md
* refine the milvus document store example
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
---------
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
-[2025.4.17][Full-text Search with Milvus and Haystack](https://milvus.io/docs/full_text_search_with_milvus_and_haystack.md) - Learn how to implement full-text and hybrid search in your application using Haystack and Milvus
32
35
33
36
## Installation
34
37
35
38
```shell
36
39
pip install --upgrade pymilvus milvus-haystack
37
40
```
38
41
39
-
*If you are using Google Colab, you may need to restart the runtime to enable dependencies just installed.*
40
-
41
42
## Usage
42
43
43
44
Use the `MilvusDocumentStore` in a Haystack pipeline as a quick start.
@@ -47,8 +48,7 @@ from haystack import Document
47
48
from milvus_haystack import MilvusDocumentStore
48
49
49
50
document_store = MilvusDocumentStore(
50
-
connection_args={"uri": "./milvus.db"}, # Milvus Lite
In the `connection_args`, setting the URI as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.
62
+
### Different ways to connect to Milvus
63
+
64
+
- For the case of [Milvus Lite](https://milvus.io/docs/milvus_lite.md), the most convenient method, just set the uri as a local file.
65
+
```python
66
+
document_store = MilvusDocumentStore(
67
+
connection_args={"uri": "./milvus.db"},
68
+
drop_old=True,
69
+
)
70
+
```
71
+
72
+
- For the case of Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md), it is recommended to use when you are dealing with large scale of data. After starting the Milvus service, you can use the specified uri to connect to the service.
- For the case of [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the uri and token, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud.
81
+
```python
82
+
from haystack.utils import Secret
83
+
document_store = MilvusDocumentStore(
84
+
connection_args={
85
+
"uri": "https://in03-ba4234asae.api.gcp-us-west1.zillizcloud.com", # Your Public Endpoint
86
+
"token": Secret.from_env_var("ZILLIZ_CLOUD_API_KEY"), # API key, we recommend using the Secret class to load the token from env variable for security.
87
+
"secure": True
88
+
},
89
+
drop_old=True,
90
+
)
91
+
```
92
+
63
93
64
-
If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). When using this setup, please use the server URI, e.g.`http://localhost:19530`, as your URI.
65
94
66
95
## Dive deep usage
67
96
@@ -71,15 +100,10 @@ Prepare an OpenAI API key and set it as an environment variable:
71
100
export OPENAI_API_KEY=<your_api_key>
72
101
```
73
102
74
-
Here are the ways to
75
-
76
-
- Create the indexing Pipeline
77
-
- Create the retrieval pipeline
78
-
- Create the RAG pipeline
79
-
80
103
### Create the indexing Pipeline and index some documents
result = retrieval_pipeline.run({"sparse_text_embedder": {"text": query}})
239
+
240
+
print(result["sparse_retriever"]["documents"][0])
241
+
242
+
# Document(id=..., content: 'full text search is supported by Milvus.', sparse_embedding: vector with 48 non-zero elements)
243
+
```
244
+
### Sparse retrieval with Milvus built-in BM25 function
245
+
Milvus provides a built-in BM25 function that can generate sparse vectors directly from text fields. This approach simplifies the pipeline construction compared to using Haystack's sparse embedders. The main differences are:
246
+
247
+
1. We need to specify a `BM25BuiltInFunction` in the document store with some field specification parameters.
248
+
2. We don't need to use the embedder explicitly since Milvus handles the sparse embedding in the Milvus server end.
249
+
3. The pipeline is simpler with fewer components and connections.
250
+
251
+
Here is an example:
252
+
253
+
```python
254
+
from milvus_haystack.function import BM25BuiltInFunction
# Document(id=..., content: 'full text search is supported by Milvus.', embedding: vector of size 1536, sparse_embedding: vector with 48 non-zero elements)
165
344
```
345
+
### Hybrid retrieval with Milvus built-in BM25 function
346
+
Milvus provides a built-in BM25 function that can generate sparse vectors directly from text fields. This approach simplifies the pipeline construction compared to using Haystack's sparse embedders, making it a useful complement to semantic search. The main differences are:
347
+
348
+
1. We need to specify a `BM25BuiltInFunction` in the document store with some field specification parameters.
349
+
2. We don't need to use the embedder explicitly since Milvus handles the sparse embedding in the Milvus server end.
350
+
3. The pipeline is simpler with fewer components and connections, which is especially beneficial in hybrid retrieval setups.
351
+
352
+
Here is an example:
353
+
354
+
```python
355
+
from milvus_haystack.function import BM25BuiltInFunction
0 commit comments