Skip to content

Commit e97cc05

Browse files
authored
feat: Add elasticsearch toolkit for conversational search (#233)
* feat: add elasticsearch setup and install * chore: update links * chore: add sample file * feat: add elasticsearch starter kit * fix: fix the links * chore: add new links * chore: update elasticsearch starter-kit * chore: add alternative to docker * chore: update Elasticsearch starter kit * chore: add comments for kibana setup * chore: update Elasticsearch toolkit * chore: simplify kibana command * chore: update readme to address review comments * fix: fix the format * chore: add more descriptions in a few places * fix: fix a type and a format issue
1 parent 8fb1cf6 commit e97cc05

18 files changed

Lines changed: 2756 additions & 18 deletions

integrations/extensions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Then we have our **additional starter kits**. These focus on showcasing the brea
3030
- [HubSpot](./starter-kits/hubspot/)
3131
- [IBM App Connect](./starter-kits/appconnect/)
3232
- [IBM Watson Discovery](./starter-kits/watson-discovery/)
33+
- [Elasticsearch](./starter-kits/elasticsearch)
3334
- [LLM: IBM watsonx](./starter-kits/language-model-watsonx/)
3435
- [LLM: IBM watsonx tech preview](./starter-kits/language-model-watsonx-tech-preview/)
3536
- [LLM: OpenAI](./starter-kits/language-model-openai/)
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# How to set up Elasticsearch from IBM Cloud and integrate it with Watson Assistant
2+
This is a documentation about how to set up Elasticsearch from IBM Cloud and create Watson Assistant search extension using Elasticsearch index.
3+
4+
## Table of contents:
5+
* [Step 1: Provision an Elasticsearch instance on IBM Cloud](#step-1-provision-an-elasticsearch-instance-on-ibm-cloud)
6+
* [Step 2: Set up Kibana to connect to Elasticsearch](#step-2-set-up-kibana-to-connect-to-elasticsearch)
7+
* [Step 3: Create an Elasticsearch index (keyword-search)](#step-3-create-an-elasticsearch-index-keyword-search)
8+
* [Step 4: Set up Watson Assistant search extension using Elasticsearch index](#step-4-set-up-watson-assistant-search-extension-using-elasticsearch-index)
9+
* [Step 5: Enable semantic search with ELSER](#step-5-enable-semantic-search-with-elser)
10+
11+
12+
## Step 1: Provision an Elasticsearch instance on IBM Cloud
13+
* Create an [IBM Cloud account](https://cloud.ibm.com/registration) if you don't have one.
14+
* Provision a Databases for Elasticsearch instance from the [IBM Cloud catalog](https://cloud.ibm.com/catalog/databases-for-elasticsearch).
15+
**A platinum plan with at least 4GB RAM is required in order to use the advanced ML features,
16+
such as [Elastic Learned Sparse EncodeR (ELSER)](https://www.elastic.co/guide/en/machine-learning/8.10/ml-nlp-elser.html)**
17+
* Create a service credentials from the left-side menu and find the `hostname`, `port`, `username` and `password`.
18+
The credentials will be used to connect to Kibana and Watson Assistant at next steps. You can use admin userid and password as well.
19+
Please refer to [this doc](https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-user-management&interface=ui#user-management-elasticsearch-ibm-superuser) to learn more about different user roles.
20+
21+
22+
## Step 2: Set up Kibana to connect to Elasticsearch
23+
* Install Docker so that you can pull the Kibana container image later
24+
* You can install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
25+
* If you don't want to use Docker Desktop,
26+
* For MacOS users, you can install [Colima](https://github.com/abiosoft/colima#installation) as an alternative
27+
[HomeBrew](https://brew.sh/) is required for running the following commands:
28+
```shell
29+
brew install docker
30+
brew install colima
31+
32+
colima start
33+
```
34+
* Other options: [Podman Desktop](https://podman-desktop.io/), [Rancher Desktop](https://docs.rancherdesktop.io/getting-started/installation/)
35+
* Create a kibana config folder, for example
36+
`mkdir -p ~/.kibana/config`
37+
* Download the certificate from the Elasticsearch instance overview page, and move the downloaded file to the kibana config folder
38+
* Under the kibana config folder, create a YAML file called `kibana.yml`. Inside the file, you need the following Kibana configuration settings:
39+
```YAML
40+
elasticsearch.ssl.certificateAuthorities: "/usr/share/kibana/config/<your-certificate-file-name>"
41+
elasticsearch.username: "<username>"
42+
elasticsearch.password: "<password>"
43+
elasticsearch.hosts: ["https://<hostname:port>"]
44+
server.name: "kibana"
45+
server.host: "0.0.0.0"
46+
```
47+
Notes:
48+
- Find the `hostname`, `port`, `username`, `password` from the service credentials created at Step 1
49+
- `elasticsearch.ssl.certificateAuthorities` is the location where the kibana deployment will look for the certificate in the docker container.
50+
`/usr/share/kibana/config/` is the default Kibana's config directory in the container
51+
52+
* Verify the Elasticsearch instance endpoint and find its version
53+
* Run
54+
```bash
55+
curl -u <username>:<password> --cacert <path-to-cert> https://<hostname:port>
56+
```
57+
* Find the version number from the output
58+
59+
* Download and start the Kibana container
60+
```bash
61+
docker run -it --name kibana --rm \
62+
-v <path_to_your_kibana_config_folder>:/usr/share/kibana/config \
63+
-p 5601:5601 docker.elastic.co/kibana/kibana:<kibana_version>
64+
```
65+
Once Kibana has connected to your Databases for Elasticsearch deployment and is running successfully, you will see the output in your terminal.
66+
```
67+
[2024-01-02T16:43:29.378+00:00][INFO ][http.server.Kibana] http server running at http://0.0.0.0:5601
68+
[2024-01-02T16:46:13.777+00:00][INFO ][status] Kibana is now available
69+
```
70+
71+
## Step 3: Create an Elasticsearch index (keyword-search)
72+
This step is to create an Elasticsearch index with default settings for quick testing and verification.
73+
With default settings, an Elasticsearch index does keyword search.
74+
75+
* Open http://0.0.0.0:5601 in browser and log into Kibana using the `username` and `password` from the service credentials of the Elasticsearch instance
76+
* Navigate to the indices page http://localhost:5601/app/enterprise_search/content/search_indices
77+
* Click on `Create a new index`, choose `Use the API`, and follow the steps there to create a new Elasticsearch index with default settings
78+
* Go to the overview page for your newly created index, follow the steps there to verify your Elasticsearch index.
79+
Notes:
80+
* Generate an API key, and you will use the API key for authentication and authorization for this specific Elasticsearch index
81+
* Use your `hostname` and `port` from the service credentials of the Elasticsearch instance to build `ES_URL`
82+
```bash
83+
export ES_URL=https://<hostname:port>
84+
```
85+
* Append `--cacert <path-to-your-cert>` to the cURL for SSL connection or append `--insecure` to the cURL commands to ignore the certificate
86+
* If you are able to run the `Build your first search query` command at the last step, your Elasticsearch index has been set up successfully!
87+
88+
## Step 4: Set up Watson Assistant search extension using Elasticsearch index
89+
* Provision a Watson Assistant instance from the [IBM cloud catalog](https://cloud.ibm.com/catalog/services/watsonx-assistant)
90+
* Create a new Assistant in the new experience
91+
* Add a Search extension to your Assistant
92+
Please follow the [Elasticsearch search integration set up](https://cloud.ibm.com/docs/watson-assistant?topic=watson-assistant-search-elasticsearch-add) documentation for more details.
93+
94+
* Verify the Search extension
95+
If you have used your index created at Step 3 to set up the Search integration, you can verify it by the following examples:
96+
* Verify the basic search
97+
In your preview chat or draft webchat, type in `Who wrote 1984?`. If you see the Elasticsearch search results, your search extension has been set up successfully.
98+
<img src="assets/wa_elasticsearch_result.png" width="280" height="343" />
99+
* Verify Conversational Search (beta)
100+
Go to your Search extension, find the Conversation Search toggle, toggle it on and save it. Then go to your preview chat or draft webchat, type in `Who wrote 1984?`.
101+
If you see an answer instead of a list of research result, your conversational search is working properly.
102+
<img src="assets/wa_conversational_search_result.png" width="281" height="261" />
103+
104+
105+
## Step 5: Enable semantic search with ELSER
106+
This step is to enable semantic search using ELSER. Here are the tutorials from Elasticsearch doc:
107+
ELSER v1: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html
108+
ELSER v2: https://www.elastic.co/guide/en/elasticsearch/reference/8.11/semantic-search-elser.html
109+
110+
The following steps are based on ELSER v1 model:
111+
### Create environment variables for ES credentials
112+
```bash
113+
export ES_URL=https://<hostname:port>
114+
export ES_USER=<username>
115+
export ES_PASSWORD=<password>
116+
export ES_CACERT=<path-to-your-cert>
117+
```
118+
You can find the credentials from the service credentials of your Elasticsearch instance.
119+
&nbsp;
120+
### Enable ELSER model (v1)
121+
ELSER model is not enabled by default, but you can enable it in Kibana. Please follow the [download-deploy-elser instructions](https://www.elastic.co/guide/en/machine-learning/8.10/ml-nlp-elser.html#download-deploy-elser) to do it.
122+
123+
### Load data into Elasticsearch
124+
In Kibana, you can upload a data file to Elasticsearch cluster using the Data Visualizer in the Machine Learning UI http://localhost:5601/app/ml/filedatavisualizer.
125+
126+
As an example, you can download [wa-docs-100](./assets/wa_docs_100.tsv) TSV data and upload it to Elasticsearch.
127+
This dataset contains documents processed from the watsonx Assistant product documents. There are three columns in this TSV file,
128+
`title`, `section_title` and `text`. The columns are extracted from the original documents. Specifically,
129+
each `text` value is a small chunk of text split from the original document.
130+
131+
In Kibana,
132+
* Select your downloaded file to upload
133+
<img src="assets/upload_file_though_data_visualizer.png" width="463" height="248" />
134+
* Click `Override settings` and then check `Has header row` checkbox because the example dataset has header row
135+
<img src="assets/override_settings_for_uploaded_file.png" width="553" height="446" />
136+
* Import the data to a new Elasticsearch index and name it `wa-docs`
137+
<img src="assets/import_data_to_new_index.png" width="509" height="356" />
138+
Once finished, you have created an index for the data you just uploaded.
139+
### Create an index with mappings for ELSER output
140+
```bash
141+
curl -X PUT "${ES_URL}/search-wa-docs?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
142+
-H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
143+
{
144+
"mappings": {
145+
"_source": {
146+
"excludes": [
147+
"ml.tokens"
148+
]
149+
},
150+
"properties": {
151+
"ml.tokens": {
152+
"type": "rank_features"
153+
},
154+
"text": {
155+
"type": "text"
156+
}
157+
}
158+
}
159+
}'
160+
```
161+
Notes:
162+
* `search-wa-docs` will be your index name
163+
* `ml.tokens` is the field that will keep ELSER output when data is ingested, and `rank_feature` type is required for ELSER output field
164+
* `text` is the input filed for the inference processor. In the example dataset, the name of the input field is `text` which will be used by ELSER model to process.
165+
* Learn more about [elser-mappings](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#elser-mappings) from the tutorial
166+
167+
### Create an ingest pipeline with an inference processor
168+
Create an ingest pipeline with an inference processor to use ELSER to infer against the data that will be ingested in the pipeline.
169+
```bash
170+
curl -X PUT "${ES_URL}/_ingest/pipeline/elser-v1-test?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
171+
-H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
172+
{
173+
"processors": [
174+
{
175+
"inference": {
176+
"model_id": ".elser_model_1",
177+
"target_field": "ml",
178+
"field_map": {
179+
"text": "text_field"
180+
},
181+
"inference_config": {
182+
"text_expansion": {
183+
"results_field": "tokens"
184+
}
185+
}
186+
}
187+
}
188+
]
189+
}'
190+
```
191+
Learn more about [inference-ingest-pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#inference-ingest-pipeline) from the tutorial
192+
### Ingest the data through the inference ingest pipeline
193+
Create the tokens from the text by reindexing the data through the inference pipeline that uses ELSER as the inference model.
194+
```bash
195+
curl -X POST "${ES_URL}/_reindex?wait_for_completion=false&pretty" -u "${ES_USER}:${ES_PASSWORD}" \
196+
-H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
197+
{
198+
"source": {
199+
"index": "wa-docs"
200+
},
201+
"dest": {
202+
"index": "search-wa-docs",
203+
"pipeline": "elser-v1-test"
204+
}
205+
}'
206+
```
207+
* `wa-docs` is the index you created when uploading the example file to Elasticsearch cluster. It contains the text data
208+
* `search_wa-docs` is the search index that has ELSER output field
209+
* `elser-v1-test` is the ingest pipeline with an inference processor using ELSER model
210+
### Semantic search by using the text_expansion query
211+
To perform semantic search, use the `text_expansion` query, and provide the query text and the ELSER model ID.
212+
The example below uses the query text "How to set up custom extension?", the `ml.tokens` field contains
213+
the generated ELSER output:
214+
```bash
215+
curl -X GET "${ES_URL}/search-wa-docs/_search?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
216+
-H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
217+
{
218+
"query":{
219+
"text_expansion":{
220+
"ml.tokens":{
221+
"model_id":".elser_model_1",
222+
"model_text":"how to set up custom extension?"
223+
}
224+
}
225+
}
226+
}'
227+
```
228+
Notes:
229+
* You can also use `API_KEY` for authorization. You can generate an `API_KEY` for your search index on the index overview page in Kibana.
230+
* Learn more about [text-expansion-query](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#text-expansion-query) from the tutorial.
231+
### Enable semantic search for your Search extension on Watson Assistant
232+
To enable semantic search for your Search extension on Watson Assistant, you just need to specify the following query body in the Search extension settings:
233+
```json
234+
{
235+
"query":{
236+
"text_expansion":{
237+
"ml.tokens":{
238+
"model_id":".elser_model_1",
239+
"model_text":"$QUERY"
240+
}
241+
}
242+
}
243+
}
244+
```
245+
<img src="assets/query_body_for_elasticsearch.png" width="547" height="638" />
246+
247+
Notes:
248+
* `$QUERY` is the query variable that contains the user search query by default.
249+
* The query body is likely to change for ELSER v2.
95.4 KB
Loading
123 KB
Loading
94.6 KB
Loading
413 KB
Loading
116 KB
Loading
93.7 KB
Loading
150 KB
Loading
44.9 KB
Loading

0 commit comments

Comments
 (0)