Skip to content

Commit 869db5f

Browse files
authored
Feature/elasticsearch handler update: fix array support and standardize format (#11552)
2 parents b0f2729 + c153fcf commit 869db5f

5 files changed

Lines changed: 932 additions & 182 deletions

File tree

Lines changed: 79 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,118 +1,130 @@
11
---
2-
title: ElasticSearch
3-
sidebarTitle: ElasticSearch
2+
title: Elasticsearch
3+
sidebarTitle: Elasticsearch
44
---
55

6-
This documentation describes the integration of MindsDB with [ElasticSearch](https://www.elastic.co/), a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents..
7-
The integration allows MindsDB to access data from ElasticSearch and enhance ElasticSearch with AI capabilities.
6+
This documentation describes the integration of MindsDB with [Elasticsearch](https://www.elastic.co/elasticsearch/), a distributed search and analytics engine.
7+
The integration allows MindsDB to access data stored in Elasticsearch indices and enhance Elasticsearch with AI capabilities.
8+
9+
## Architecture
10+
11+
This handler uses a **SQL-first architecture** with automatic fallback:
12+
13+
1. **Primary**: Elasticsearch SQL API for maximum performance and compatibility
14+
2. **Fallback**: Search API for array-containing indexes with automatic array-to-JSON conversion
15+
3. **Security**: SSL/TLS support with certificate validation
16+
4. **Efficiency**: Memory-efficient pagination for large datasets
17+
18+
The handler automatically detects when SQL queries encounter array fields and seamlessly falls back to the Search API, converting arrays to JSON strings for SQL compatibility. This approach provides the best performance while handling all Elasticsearch data types.
819

920
## Prerequisites
1021

1122
Before proceeding, ensure the following prerequisites are met:
1223

1324
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
14-
2. To connect ElasticSearch to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
15-
3. Install or ensure access to ElasticSearch.
25+
2. To connect Elasticsearch to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
26+
3. **If installing from source**: Python 3.11 or 3.12 is recommended. Install with: `pip install -e '.[elasticsearch]'`
1627

1728
## Connection
1829

19-
Establish a connection to ElasticSearch from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/elasticsearch_handler) as an engine.
30+
Establish a connection to your Elasticsearch cluster from MindsDB by executing the following SQL command:
2031

2132
```sql
22-
CREATE DATABASE elasticsearch_datasource
33+
CREATE DATABASE elasticsearch_conn
2334
WITH ENGINE = 'elasticsearch',
24-
PARAMETERS={
25-
'cloud_id': 'xyz', -- optional, if hosts are provided
26-
'hosts': 'https://xyz.xyz.gcp.cloud.es.io:123', -- optional, if cloud_id is provided
27-
'api_key': 'xyz', -- optional, if user and password are provided
28-
'user': 'elastic', -- optional, if api_key is provided
29-
'password': 'xyz' -- optional, if api_key is provided
35+
PARAMETERS = {
36+
"hosts": "localhost:9200",
37+
"user": "elastic",
38+
"password": "changeme"
3039
};
3140
```
3241

33-
The connection parameters include the following:
34-
35-
* `cloud_id`: The Cloud ID provided with the ElasticSearch deployment. Required only when `hosts` is not provided.
36-
* `hosts`: The ElasticSearch endpoint provided with the ElasticSearch deployment. Required only when `cloud_id` is not provided.
37-
* `api_key`: The API key that you generated for the ElasticSearch deployment. Required only when `user` and `password` are not provided.
38-
* `user` and `password`: The user and password used to authenticate. Required only when `api_key` is not provided.
42+
Required connection parameters include the following:
3943

40-
<Tip>
41-
If you want to connect to the local instance of ElasticSearch, use the below statement:
42-
43-
```sql
44-
CREATE DATABASE elasticsearch_datasource
45-
WITH ENGINE = 'elasticsearch',
46-
PARAMETERS = {
47-
"hosts": "127.0.0.1:9200",
48-
"user": "user",
49-
"password": "password"
50-
};
51-
```
44+
* `hosts`: The Elasticsearch host(s) in format "host:port". For multiple hosts, use comma separation like "host1:port1,host2:port2".
5245

53-
Required connection parameters include the following (at least one of these parameters should be provided):
46+
Optional connection parameters include the following:
5447

55-
* `hosts`: The IP address and port where ElasticSearch is deployed.
56-
* `user`: The user used to autheticate access.
57-
* `password`: The password used to autheticate access.
58-
</Tip>
48+
* `user`: The username for Elasticsearch authentication.
49+
* `password`: The password for Elasticsearch authentication.
50+
* `api_key`: API key for authentication (alternative to user/password).
51+
* `cloud_id`: Elastic Cloud deployment ID for hosted Elasticsearch.
52+
* `ca_certs`: Path to CA certificate file for SSL verification.
53+
* `client_cert`: Path to client certificate file for SSL authentication.
54+
* `client_key`: Path to client private key file for SSL authentication.
55+
* `verify_certs`: Boolean to enable/disable SSL certificate verification (default: true).
56+
* `timeout`: Request timeout in seconds.
5957

6058
## Usage
6159

60+
The following usage examples utilize the connection to Elasticsearch made via the `CREATE DATABASE` statement and named `elasticsearch_conn`.
61+
6262
Retrieve data from a specified index by providing the integration name and index name:
6363

6464
```sql
6565
SELECT *
66-
FROM elasticsearch_datasource.my_index
66+
FROM elasticsearch_conn.products
6767
LIMIT 10;
6868
```
6969

70-
<Note>
71-
The above examples utilize `elasticsearch_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
72-
</Note>
70+
Query with filtering and aggregation:
71+
72+
```sql
73+
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
74+
FROM elasticsearch_conn.products
75+
WHERE price > 100
76+
GROUP BY category
77+
ORDER BY product_count DESC;
78+
```
79+
80+
Run queries with array fields (automatically converted to JSON strings):
81+
82+
```sql
83+
SELECT product_name, tags, categories
84+
FROM elasticsearch_conn.products
85+
WHERE product_id = '12345';
86+
```
7387

7488
<Tip>
75-
At the moment, the Elasticsearch SQL API has certain limitations that have an impact on the queries that can be issued via MindsDB. The most notable of these limitations are listed below:
76-
1. Only `SELECT` queries are supported at the moment.
77-
2. Array fields are not supported.
78-
3. Nested fields cannot be queried directly. However, they can be accessed using the `.` operator.
89+
**Array Field Support**
7990

80-
For a detailed guide on the limitations of the Elasticsearch SQL API, refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html).
91+
The Elasticsearch handler automatically detects and converts array fields to JSON strings for SQL compatibility. This prevents "Arrays not supported" errors while preserving the original data structure.
8192
</Tip>
8293

83-
## Troubleshooting Guide
94+
## Troubleshooting
8495

8596
<Warning>
8697
`Database Connection Error`
8798

88-
* **Symptoms**: Failure to connect MindsDB with the Elasticsearch server.
99+
* **Symptoms**: Failure to connect MindsDB with the Elasticsearch cluster.
89100
* **Checklist**:
90-
1. Make sure the Elasticsearch server is active.
91-
2. Confirm that server, cloud ID and credentials are correct.
101+
1. Make sure the Elasticsearch cluster is active and accessible.
102+
2. Confirm that host, port, user, and password are correct. Try a direct Elasticsearch connection.
92103
3. Ensure a stable network between MindsDB and Elasticsearch.
104+
4. Check if authentication is required and credentials are valid.
93105
</Warning>
94106

95107
<Warning>
96-
`Transport Error` or `Request Error`
108+
`Arrays Not Supported Error`
97109

98-
* **Symptoms**: Errors related to the issuing of unsupported queries to Elasticsearch.
99-
* **Checklist**:
100-
1. Ensure the query is a `SELECT` query.
101-
2. Avoid querying array fields.
102-
3. Access nested fields using the `.` operator.
103-
4. Refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html) for more information if needed.
110+
* **Symptoms**: SQL queries failing with "Arrays are not supported" message.
111+
* **Solution**: This is automatically handled by the integration. Array fields are converted to JSON strings for SQL compatibility.
112+
* **Note**: If you still encounter this error, the handler will automatically fall back to the Search API.
104113
</Warning>
105114

106115
<Warning>
107-
`SQL statement cannot be parsed by mindsdb_sql`
108-
109-
* **Symptoms**: SQL queries failing or not recognizing index names containing special characters.
110-
* **Checklist**:
111-
1. Ensure table names with special characters are enclosed in backticks.
112-
2. Examples:
113-
* Incorrect: SELECT * FROM integration.travel-data
114-
* Incorrect: SELECT * FROM integration.'travel-data'
115-
* Correct: SELECT * FROM integration.\`travel-data\`
116+
`SHOW TABLES returns empty or fails`
117+
118+
* **Symptoms**: `SHOW TABLES FROM elasticsearch_conn` returns no results or fails.
119+
* **Solution**: Use the information_schema alternative:
120+
```sql
121+
SELECT table_name FROM information_schema.tables
122+
WHERE table_schema = 'elasticsearch_conn';
123+
```
116124
</Warning>
117125

118-
This [troubleshooting guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/troubleshooting.html) provided by Elasticsearch might also be helpful.
126+
## Limitations
127+
128+
* **JOINs**: Not supported due to Elasticsearch architecture limitations.
129+
* **Complex Subqueries**: Limited by Elasticsearch's SQL capabilities.
130+
* **Real-time Data**: Elasticsearch has near-real-time search characteristics due to refresh intervals.

mindsdb/integrations/handlers/elasticsearch_handler/__about__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
__title__ = "MindsDB Elasticsearch handler"
22
__package_name__ = "mindsdb_elasticsearch_handler"
3-
__version__ = "0.0.1"
4-
__description__ = "MindsDB handler for Elasticsearch"
5-
__author__ = "Minura Punchihewa"
3+
__version__ = "0.1.0"
4+
__description__ = "MindsDB handler for Elasticsearch with SQL-first query execution"
5+
__author__ = "MindsDB Inc"
66
__github__ = "https://github.com/mindsdb/mindsdb"
77
__pypi__ = "https://pypi.org/project/mindsdb/"
88
__license__ = "MIT"

mindsdb/integrations/handlers/elasticsearch_handler/connection_args.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,26 @@
2929
"description": "The API key for authentication with the Elasticsearch server.",
3030
"secret": True,
3131
},
32+
ca_certs={
33+
"type": ARG_TYPE.STR,
34+
"description": "Path to CA certificate file for SSL verification.",
35+
},
36+
client_cert={
37+
"type": ARG_TYPE.STR,
38+
"description": "Path to client certificate file for SSL authentication.",
39+
},
40+
client_key={
41+
"type": ARG_TYPE.STR,
42+
"description": "Path to client private key file for SSL authentication.",
43+
},
44+
verify_certs={
45+
"type": ARG_TYPE.BOOL,
46+
"description": "Whether to verify SSL certificates. Default: true",
47+
},
48+
timeout={
49+
"type": ARG_TYPE.INT,
50+
"description": "Request timeout in seconds. Default: 30",
51+
},
3252
)
3353

3454
connection_args_example = OrderedDict(

0 commit comments

Comments
 (0)