|
1 | 1 | --- |
2 | | -title: ElasticSearch |
3 | | -sidebarTitle: ElasticSearch |
| 2 | +title: Elasticsearch |
| 3 | +sidebarTitle: Elasticsearch |
4 | 4 | --- |
5 | 5 |
|
6 | | -This documentation describes the integration of MindsDB with [ElasticSearch](https://www.elastic.co/), a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.. |
7 | | -The integration allows MindsDB to access data from ElasticSearch and enhance ElasticSearch with AI capabilities. |
| 6 | +This documentation describes the integration of MindsDB with [Elasticsearch](https://www.elastic.co/elasticsearch/), a distributed search and analytics engine. |
| 7 | +The integration allows MindsDB to access data stored in Elasticsearch indices and enhance Elasticsearch with AI capabilities. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +This handler uses a **SQL-first architecture** with automatic fallback: |
| 12 | + |
| 13 | +1. **Primary**: Elasticsearch SQL API for maximum performance and compatibility |
| 14 | +2. **Fallback**: Search API for array-containing indexes with automatic array-to-JSON conversion |
| 15 | +3. **Security**: SSL/TLS support with certificate validation |
| 16 | +4. **Efficiency**: Memory-efficient pagination for large datasets |
| 17 | + |
| 18 | +The handler automatically detects when SQL queries encounter array fields and seamlessly falls back to the Search API, converting arrays to JSON strings for SQL compatibility. This approach provides the best performance while handling all Elasticsearch data types. |
8 | 19 |
|
9 | 20 | ## Prerequisites |
10 | 21 |
|
11 | 22 | Before proceeding, ensure the following prerequisites are met: |
12 | 23 |
|
13 | 24 | 1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop). |
14 | | -2. To connect ElasticSearch to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies). |
15 | | -3. Install or ensure access to ElasticSearch. |
| 25 | +2. To connect Elasticsearch to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies). |
| 26 | +3. **If installing from source**: Python 3.11 or 3.12 is recommended. Install with: `pip install -e '.[elasticsearch]'` |
16 | 27 |
|
17 | 28 | ## Connection |
18 | 29 |
|
19 | | -Establish a connection to ElasticSearch from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/elasticsearch_handler) as an engine. |
| 30 | +Establish a connection to your Elasticsearch cluster from MindsDB by executing the following SQL command: |
20 | 31 |
|
21 | 32 | ```sql |
22 | | -CREATE DATABASE elasticsearch_datasource |
| 33 | +CREATE DATABASE elasticsearch_conn |
23 | 34 | WITH ENGINE = 'elasticsearch', |
24 | | -PARAMETERS={ |
25 | | - 'cloud_id': 'xyz', -- optional, if hosts are provided |
26 | | - 'hosts': 'https://xyz.xyz.gcp.cloud.es.io:123', -- optional, if cloud_id is provided |
27 | | - 'api_key': 'xyz', -- optional, if user and password are provided |
28 | | - 'user': 'elastic', -- optional, if api_key is provided |
29 | | - 'password': 'xyz' -- optional, if api_key is provided |
| 35 | +PARAMETERS = { |
| 36 | + "hosts": "localhost:9200", |
| 37 | + "user": "elastic", |
| 38 | + "password": "changeme" |
30 | 39 | }; |
31 | 40 | ``` |
32 | 41 |
|
33 | | -The connection parameters include the following: |
34 | | - |
35 | | -* `cloud_id`: The Cloud ID provided with the ElasticSearch deployment. Required only when `hosts` is not provided. |
36 | | -* `hosts`: The ElasticSearch endpoint provided with the ElasticSearch deployment. Required only when `cloud_id` is not provided. |
37 | | -* `api_key`: The API key that you generated for the ElasticSearch deployment. Required only when `user` and `password` are not provided. |
38 | | -* `user` and `password`: The user and password used to authenticate. Required only when `api_key` is not provided. |
| 42 | +Required connection parameters include the following: |
39 | 43 |
|
40 | | -<Tip> |
41 | | -If you want to connect to the local instance of ElasticSearch, use the below statement: |
42 | | - |
43 | | -```sql |
44 | | -CREATE DATABASE elasticsearch_datasource |
45 | | -WITH ENGINE = 'elasticsearch', |
46 | | -PARAMETERS = { |
47 | | - "hosts": "127.0.0.1:9200", |
48 | | - "user": "user", |
49 | | - "password": "password" |
50 | | -}; |
51 | | -``` |
| 44 | +* `hosts`: The Elasticsearch host(s) in format "host:port". For multiple hosts, use comma separation like "host1:port1,host2:port2". |
52 | 45 |
|
53 | | -Required connection parameters include the following (at least one of these parameters should be provided): |
| 46 | +Optional connection parameters include the following: |
54 | 47 |
|
55 | | -* `hosts`: The IP address and port where ElasticSearch is deployed. |
56 | | -* `user`: The user used to autheticate access. |
57 | | -* `password`: The password used to autheticate access. |
58 | | -</Tip> |
| 48 | +* `user`: The username for Elasticsearch authentication. |
| 49 | +* `password`: The password for Elasticsearch authentication. |
| 50 | +* `api_key`: API key for authentication (alternative to user/password). |
| 51 | +* `cloud_id`: Elastic Cloud deployment ID for hosted Elasticsearch. |
| 52 | +* `ca_certs`: Path to CA certificate file for SSL verification. |
| 53 | +* `client_cert`: Path to client certificate file for SSL authentication. |
| 54 | +* `client_key`: Path to client private key file for SSL authentication. |
| 55 | +* `verify_certs`: Boolean to enable/disable SSL certificate verification (default: true). |
| 56 | +* `timeout`: Request timeout in seconds. |
59 | 57 |
|
60 | 58 | ## Usage |
61 | 59 |
|
| 60 | +The following usage examples utilize the connection to Elasticsearch made via the `CREATE DATABASE` statement and named `elasticsearch_conn`. |
| 61 | + |
62 | 62 | Retrieve data from a specified index by providing the integration name and index name: |
63 | 63 |
|
64 | 64 | ```sql |
65 | 65 | SELECT * |
66 | | -FROM elasticsearch_datasource.my_index |
| 66 | +FROM elasticsearch_conn.products |
67 | 67 | LIMIT 10; |
68 | 68 | ``` |
69 | 69 |
|
70 | | -<Note> |
71 | | -The above examples utilize `elasticsearch_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command. |
72 | | -</Note> |
| 70 | +Query with filtering and aggregation: |
| 71 | + |
| 72 | +```sql |
| 73 | +SELECT category, COUNT(*) as product_count, AVG(price) as avg_price |
| 74 | +FROM elasticsearch_conn.products |
| 75 | +WHERE price > 100 |
| 76 | +GROUP BY category |
| 77 | +ORDER BY product_count DESC; |
| 78 | +``` |
| 79 | + |
| 80 | +Run queries with array fields (automatically converted to JSON strings): |
| 81 | + |
| 82 | +```sql |
| 83 | +SELECT product_name, tags, categories |
| 84 | +FROM elasticsearch_conn.products |
| 85 | +WHERE product_id = '12345'; |
| 86 | +``` |
73 | 87 |
|
74 | 88 | <Tip> |
75 | | -At the moment, the Elasticsearch SQL API has certain limitations that have an impact on the queries that can be issued via MindsDB. The most notable of these limitations are listed below: |
76 | | -1. Only `SELECT` queries are supported at the moment. |
77 | | -2. Array fields are not supported. |
78 | | -3. Nested fields cannot be queried directly. However, they can be accessed using the `.` operator. |
| 89 | +**Array Field Support** |
79 | 90 |
|
80 | | -For a detailed guide on the limitations of the Elasticsearch SQL API, refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html). |
| 91 | +The Elasticsearch handler automatically detects and converts array fields to JSON strings for SQL compatibility. This prevents "Arrays not supported" errors while preserving the original data structure. |
81 | 92 | </Tip> |
82 | 93 |
|
83 | | -## Troubleshooting Guide |
| 94 | +## Troubleshooting |
84 | 95 |
|
85 | 96 | <Warning> |
86 | 97 | `Database Connection Error` |
87 | 98 |
|
88 | | -* **Symptoms**: Failure to connect MindsDB with the Elasticsearch server. |
| 99 | +* **Symptoms**: Failure to connect MindsDB with the Elasticsearch cluster. |
89 | 100 | * **Checklist**: |
90 | | - 1. Make sure the Elasticsearch server is active. |
91 | | - 2. Confirm that server, cloud ID and credentials are correct. |
| 101 | + 1. Make sure the Elasticsearch cluster is active and accessible. |
| 102 | + 2. Confirm that host, port, user, and password are correct. Try a direct Elasticsearch connection. |
92 | 103 | 3. Ensure a stable network between MindsDB and Elasticsearch. |
| 104 | + 4. Check if authentication is required and credentials are valid. |
93 | 105 | </Warning> |
94 | 106 |
|
95 | 107 | <Warning> |
96 | | -`Transport Error` or `Request Error` |
| 108 | +`Arrays Not Supported Error` |
97 | 109 |
|
98 | | -* **Symptoms**: Errors related to the issuing of unsupported queries to Elasticsearch. |
99 | | -* **Checklist**: |
100 | | - 1. Ensure the query is a `SELECT` query. |
101 | | - 2. Avoid querying array fields. |
102 | | - 3. Access nested fields using the `.` operator. |
103 | | - 4. Refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html) for more information if needed. |
| 110 | +* **Symptoms**: SQL queries failing with "Arrays are not supported" message. |
| 111 | +* **Solution**: This is automatically handled by the integration. Array fields are converted to JSON strings for SQL compatibility. |
| 112 | +* **Note**: If you still encounter this error, the handler will automatically fall back to the Search API. |
104 | 113 | </Warning> |
105 | 114 |
|
106 | 115 | <Warning> |
107 | | -`SQL statement cannot be parsed by mindsdb_sql` |
108 | | - |
109 | | -* **Symptoms**: SQL queries failing or not recognizing index names containing special characters. |
110 | | -* **Checklist**: |
111 | | - 1. Ensure table names with special characters are enclosed in backticks. |
112 | | - 2. Examples: |
113 | | - * Incorrect: SELECT * FROM integration.travel-data |
114 | | - * Incorrect: SELECT * FROM integration.'travel-data' |
115 | | - * Correct: SELECT * FROM integration.\`travel-data\` |
| 116 | +`SHOW TABLES returns empty or fails` |
| 117 | + |
| 118 | +* **Symptoms**: `SHOW TABLES FROM elasticsearch_conn` returns no results or fails. |
| 119 | +* **Solution**: Use the information_schema alternative: |
| 120 | + ```sql |
| 121 | + SELECT table_name FROM information_schema.tables |
| 122 | + WHERE table_schema = 'elasticsearch_conn'; |
| 123 | + ``` |
116 | 124 | </Warning> |
117 | 125 |
|
118 | | -This [troubleshooting guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/troubleshooting.html) provided by Elasticsearch might also be helpful. |
| 126 | +## Limitations |
| 127 | + |
| 128 | +* **JOINs**: Not supported due to Elasticsearch architecture limitations. |
| 129 | +* **Complex Subqueries**: Limited by Elasticsearch's SQL capabilities. |
| 130 | +* **Real-time Data**: Elasticsearch has near-real-time search characteristics due to refresh intervals. |
0 commit comments