|
| 1 | +# How to set up Elasticsearch from IBM Cloud and integrate it with Watson Assistant |
| 2 | +This is a documentation about how to set up Elasticsearch from IBM Cloud and create Watson Assistant search extension using Elasticsearch index. |
| 3 | + |
| 4 | +## Table of contents: |
| 5 | +* [Step 1: Provision an Elasticsearch instance on IBM Cloud](#step-1-provision-an-elasticsearch-instance-on-ibm-cloud) |
| 6 | +* [Step 2: Set up Kibana to connect to Elasticsearch](#step-2-set-up-kibana-to-connect-to-elasticsearch) |
| 7 | +* [Step 3: Create an Elasticsearch index (keyword-search)](#step-3-create-an-elasticsearch-index-keyword-search) |
| 8 | +* [Step 4: Set up Watson Assistant search extension using Elasticsearch index](#step-4-set-up-watson-assistant-search-extension-using-elasticsearch-index) |
| 9 | +* [Step 5: Enable semantic search with ELSER](#step-5-enable-semantic-search-with-elser) |
| 10 | + |
| 11 | + |
| 12 | +## Step 1: Provision an Elasticsearch instance on IBM Cloud |
| 13 | +* Create an [IBM Cloud account](https://cloud.ibm.com/registration) if you don't have one. |
| 14 | +* Provision a Databases for Elasticsearch instance from the [IBM Cloud catalog](https://cloud.ibm.com/catalog/databases-for-elasticsearch). |
| 15 | + **A platinum plan with at least 4GB RAM is required in order to use the advanced ML features, |
| 16 | + such as [Elastic Learned Sparse EncodeR (ELSER)](https://www.elastic.co/guide/en/machine-learning/8.10/ml-nlp-elser.html)** |
| 17 | +* Create a service credentials from the left-side menu and find the `hostname`, `port`, `username` and `password`. |
| 18 | + The credentials will be used to connect to Kibana and Watson Assistant at next steps. You can use admin userid and password as well. |
| 19 | + Please refer to [this doc](https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-user-management&interface=ui#user-management-elasticsearch-ibm-superuser) to learn more about different user roles. |
| 20 | + |
| 21 | + |
| 22 | +## Step 2: Set up Kibana to connect to Elasticsearch |
| 23 | +* Install Docker so that you can pull the Kibana container image later |
| 24 | + * You can install [Docker Desktop](https://www.docker.com/products/docker-desktop/) |
| 25 | + * If you don't want to use Docker Desktop, |
| 26 | + * For MacOS users, you can install [Colima](https://github.com/abiosoft/colima#installation) as an alternative |
| 27 | + [HomeBrew](https://brew.sh/) is required for running the following commands: |
| 28 | + ```shell |
| 29 | + brew install docker |
| 30 | + brew install colima |
| 31 | + |
| 32 | + colima start |
| 33 | + ``` |
| 34 | + * Other options: [Podman Desktop](https://podman-desktop.io/), [Rancher Desktop](https://docs.rancherdesktop.io/getting-started/installation/) |
| 35 | +* Create a kibana config folder, for example |
| 36 | + `mkdir -p ~/.kibana/config` |
| 37 | +* Download the certificate from the Elasticsearch instance overview page, and move the downloaded file to the kibana config folder |
| 38 | +* Under the kibana config folder, create a YAML file called `kibana.yml`. Inside the file, you need the following Kibana configuration settings: |
| 39 | + ```YAML |
| 40 | + elasticsearch.ssl.certificateAuthorities: "/usr/share/kibana/config/<your-certificate-file-name>" |
| 41 | + elasticsearch.username: "<username>" |
| 42 | + elasticsearch.password: "<password>" |
| 43 | + elasticsearch.hosts: ["https://<hostname:port>"] |
| 44 | + server.name: "kibana" |
| 45 | + server.host: "0.0.0.0" |
| 46 | + ``` |
| 47 | + Notes: |
| 48 | + - Find the `hostname`, `port`, `username`, `password` from the service credentials created at Step 1 |
| 49 | + - `elasticsearch.ssl.certificateAuthorities` is the location where the kibana deployment will look for the certificate in the docker container. |
| 50 | + `/usr/share/kibana/config/` is the default Kibana's config directory in the container |
| 51 | +
|
| 52 | +* Verify the Elasticsearch instance endpoint and find its version |
| 53 | + * Run |
| 54 | + ```bash |
| 55 | + curl -u <username>:<password> --cacert <path-to-cert> https://<hostname:port> |
| 56 | + ``` |
| 57 | + * Find the version number from the output |
| 58 | +
|
| 59 | +* Download and start the Kibana container |
| 60 | + ```bash |
| 61 | + docker run -it --name kibana --rm \ |
| 62 | + -v <path_to_your_kibana_config_folder>:/usr/share/kibana/config \ |
| 63 | + -p 5601:5601 docker.elastic.co/kibana/kibana:<kibana_version> |
| 64 | + ``` |
| 65 | + Once Kibana has connected to your Databases for Elasticsearch deployment and is running successfully, you will see the output in your terminal. |
| 66 | + ``` |
| 67 | + [2024-01-02T16:43:29.378+00:00][INFO ][http.server.Kibana] http server running at http://0.0.0.0:5601 |
| 68 | + [2024-01-02T16:46:13.777+00:00][INFO ][status] Kibana is now available |
| 69 | + ``` |
| 70 | +
|
| 71 | +## Step 3: Create an Elasticsearch index (keyword-search) |
| 72 | +This step is to create an Elasticsearch index with default settings for quick testing and verification. |
| 73 | +With default settings, an Elasticsearch index does keyword search. |
| 74 | +
|
| 75 | +* Open http://0.0.0.0:5601 in browser and log into Kibana using the `username` and `password` from the service credentials of the Elasticsearch instance |
| 76 | +* Navigate to the indices page http://localhost:5601/app/enterprise_search/content/search_indices |
| 77 | +* Click on `Create a new index`, choose `Use the API`, and follow the steps there to create a new Elasticsearch index with default settings |
| 78 | +* Go to the overview page for your newly created index, follow the steps there to verify your Elasticsearch index. |
| 79 | + Notes: |
| 80 | + * Generate an API key, and you will use the API key for authentication and authorization for this specific Elasticsearch index |
| 81 | + * Use your `hostname` and `port` from the service credentials of the Elasticsearch instance to build `ES_URL` |
| 82 | + ```bash |
| 83 | + export ES_URL=https://<hostname:port> |
| 84 | + ``` |
| 85 | + * Append `--cacert <path-to-your-cert>` to the cURL for SSL connection or append `--insecure` to the cURL commands to ignore the certificate |
| 86 | + * If you are able to run the `Build your first search query` command at the last step, your Elasticsearch index has been set up successfully! |
| 87 | +
|
| 88 | +## Step 4: Set up Watson Assistant search extension using Elasticsearch index |
| 89 | +* Provision a Watson Assistant instance from the [IBM cloud catalog](https://cloud.ibm.com/catalog/services/watsonx-assistant) |
| 90 | +* Create a new Assistant in the new experience |
| 91 | +* Add a Search extension to your Assistant |
| 92 | + Please follow the [Elasticsearch search integration set up](https://cloud.ibm.com/docs/watson-assistant?topic=watson-assistant-search-elasticsearch-add) documentation for more details. |
| 93 | +
|
| 94 | +* Verify the Search extension |
| 95 | + If you have used your index created at Step 3 to set up the Search integration, you can verify it by the following examples: |
| 96 | + * Verify the basic search |
| 97 | + In your preview chat or draft webchat, type in `Who wrote 1984?`. If you see the Elasticsearch search results, your search extension has been set up successfully. |
| 98 | + <img src="assets/wa_elasticsearch_result.png" width="280" height="343" /> |
| 99 | + * Verify Conversational Search (beta) |
| 100 | + Go to your Search extension, find the Conversation Search toggle, toggle it on and save it. Then go to your preview chat or draft webchat, type in `Who wrote 1984?`. |
| 101 | + If you see an answer instead of a list of research result, your conversational search is working properly. |
| 102 | + <img src="assets/wa_conversational_search_result.png" width="281" height="261" /> |
| 103 | +
|
| 104 | +
|
| 105 | +## Step 5: Enable semantic search with ELSER |
| 106 | +This step is to enable semantic search using ELSER. Here are the tutorials from Elasticsearch doc: |
| 107 | +ELSER v1: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html |
| 108 | +ELSER v2: https://www.elastic.co/guide/en/elasticsearch/reference/8.11/semantic-search-elser.html |
| 109 | +
|
| 110 | +The following steps are based on ELSER v1 model: |
| 111 | +### Create environment variables for ES credentials |
| 112 | + ```bash |
| 113 | + export ES_URL=https://<hostname:port> |
| 114 | + export ES_USER=<username> |
| 115 | + export ES_PASSWORD=<password> |
| 116 | + export ES_CACERT=<path-to-your-cert> |
| 117 | + ``` |
| 118 | +You can find the credentials from the service credentials of your Elasticsearch instance. |
| 119 | + |
| 120 | +### Enable ELSER model (v1) |
| 121 | +ELSER model is not enabled by default, but you can enable it in Kibana. Please follow the [download-deploy-elser instructions](https://www.elastic.co/guide/en/machine-learning/8.10/ml-nlp-elser.html#download-deploy-elser) to do it. |
| 122 | +
|
| 123 | +### Load data into Elasticsearch |
| 124 | +In Kibana, you can upload a data file to Elasticsearch cluster using the Data Visualizer in the Machine Learning UI http://localhost:5601/app/ml/filedatavisualizer. |
| 125 | +
|
| 126 | +As an example, you can download [wa-docs-100](./assets/wa_docs_100.tsv) TSV data and upload it to Elasticsearch. |
| 127 | +This dataset contains documents processed from the watsonx Assistant product documents. There are three columns in this TSV file, |
| 128 | +`title`, `section_title` and `text`. The columns are extracted from the original documents. Specifically, |
| 129 | +each `text` value is a small chunk of text split from the original document. |
| 130 | +
|
| 131 | +In Kibana, |
| 132 | +* Select your downloaded file to upload |
| 133 | + <img src="assets/upload_file_though_data_visualizer.png" width="463" height="248" /> |
| 134 | +* Click `Override settings` and then check `Has header row` checkbox because the example dataset has header row |
| 135 | + <img src="assets/override_settings_for_uploaded_file.png" width="553" height="446" /> |
| 136 | +* Import the data to a new Elasticsearch index and name it `wa-docs` |
| 137 | + <img src="assets/import_data_to_new_index.png" width="509" height="356" /> |
| 138 | +Once finished, you have created an index for the data you just uploaded. |
| 139 | +### Create an index with mappings for ELSER output |
| 140 | + ```bash |
| 141 | + curl -X PUT "${ES_URL}/search-wa-docs?pretty" -u "${ES_USER}:${ES_PASSWORD}" \ |
| 142 | + -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d' |
| 143 | + { |
| 144 | + "mappings": { |
| 145 | + "_source": { |
| 146 | + "excludes": [ |
| 147 | + "ml.tokens" |
| 148 | + ] |
| 149 | + }, |
| 150 | + "properties": { |
| 151 | + "ml.tokens": { |
| 152 | + "type": "rank_features" |
| 153 | + }, |
| 154 | + "text": { |
| 155 | + "type": "text" |
| 156 | + } |
| 157 | + } |
| 158 | + } |
| 159 | + }' |
| 160 | + ``` |
| 161 | +Notes: |
| 162 | +* `search-wa-docs` will be your index name |
| 163 | +* `ml.tokens` is the field that will keep ELSER output when data is ingested, and `rank_feature` type is required for ELSER output field |
| 164 | +* `text` is the input filed for the inference processor. In the example dataset, the name of the input field is `text` which will be used by ELSER model to process. |
| 165 | +* Learn more about [elser-mappings](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#elser-mappings) from the tutorial |
| 166 | +
|
| 167 | +### Create an ingest pipeline with an inference processor |
| 168 | +Create an ingest pipeline with an inference processor to use ELSER to infer against the data that will be ingested in the pipeline. |
| 169 | + ```bash |
| 170 | + curl -X PUT "${ES_URL}/_ingest/pipeline/elser-v1-test?pretty" -u "${ES_USER}:${ES_PASSWORD}" \ |
| 171 | + -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d' |
| 172 | + { |
| 173 | + "processors": [ |
| 174 | + { |
| 175 | + "inference": { |
| 176 | + "model_id": ".elser_model_1", |
| 177 | + "target_field": "ml", |
| 178 | + "field_map": { |
| 179 | + "text": "text_field" |
| 180 | + }, |
| 181 | + "inference_config": { |
| 182 | + "text_expansion": { |
| 183 | + "results_field": "tokens" |
| 184 | + } |
| 185 | + } |
| 186 | + } |
| 187 | + } |
| 188 | + ] |
| 189 | + }' |
| 190 | + ``` |
| 191 | +Learn more about [inference-ingest-pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#inference-ingest-pipeline) from the tutorial |
| 192 | +### Ingest the data through the inference ingest pipeline |
| 193 | +Create the tokens from the text by reindexing the data through the inference pipeline that uses ELSER as the inference model. |
| 194 | + ```bash |
| 195 | + curl -X POST "${ES_URL}/_reindex?wait_for_completion=false&pretty" -u "${ES_USER}:${ES_PASSWORD}" \ |
| 196 | + -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d' |
| 197 | + { |
| 198 | + "source": { |
| 199 | + "index": "wa-docs" |
| 200 | + }, |
| 201 | + "dest": { |
| 202 | + "index": "search-wa-docs", |
| 203 | + "pipeline": "elser-v1-test" |
| 204 | + } |
| 205 | + }' |
| 206 | + ``` |
| 207 | +* `wa-docs` is the index you created when uploading the example file to Elasticsearch cluster. It contains the text data |
| 208 | +* `search_wa-docs` is the search index that has ELSER output field |
| 209 | +* `elser-v1-test` is the ingest pipeline with an inference processor using ELSER model |
| 210 | +### Semantic search by using the text_expansion query |
| 211 | +To perform semantic search, use the `text_expansion` query, and provide the query text and the ELSER model ID. |
| 212 | +The example below uses the query text "How to set up custom extension?", the `ml.tokens` field contains |
| 213 | +the generated ELSER output: |
| 214 | + ```bash |
| 215 | + curl -X GET "${ES_URL}/search-wa-docs/_search?pretty" -u "${ES_USER}:${ES_PASSWORD}" \ |
| 216 | + -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d' |
| 217 | + { |
| 218 | + "query":{ |
| 219 | + "text_expansion":{ |
| 220 | + "ml.tokens":{ |
| 221 | + "model_id":".elser_model_1", |
| 222 | + "model_text":"how to set up custom extension?" |
| 223 | + } |
| 224 | + } |
| 225 | + } |
| 226 | + }' |
| 227 | + ``` |
| 228 | +Notes: |
| 229 | +* You can also use `API_KEY` for authorization. You can generate an `API_KEY` for your search index on the index overview page in Kibana. |
| 230 | +* Learn more about [text-expansion-query](https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html#text-expansion-query) from the tutorial. |
| 231 | +### Enable semantic search for your Search extension on Watson Assistant |
| 232 | +To enable semantic search for your Search extension on Watson Assistant, you just need to specify the following query body in the Search extension settings: |
| 233 | + ```json |
| 234 | + { |
| 235 | + "query":{ |
| 236 | + "text_expansion":{ |
| 237 | + "ml.tokens":{ |
| 238 | + "model_id":".elser_model_1", |
| 239 | + "model_text":"$QUERY" |
| 240 | + } |
| 241 | + } |
| 242 | + } |
| 243 | + } |
| 244 | + ``` |
| 245 | + <img src="assets/query_body_for_elasticsearch.png" width="547" height="638" /> |
| 246 | +
|
| 247 | +Notes: |
| 248 | +* `$QUERY` is the query variable that contains the user search query by default. |
| 249 | +* The query body is likely to change for ELSER v2. |
0 commit comments