Skip to content

Commit fa2f93b

Browse files
authored
[New Page]: How to write a query runner (#642)
1 parent e0bc022 commit fa2f93b

2 files changed

Lines changed: 289 additions & 1 deletion

File tree

src/pages/kb/open-source/dev-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Windows users: while it should be possible to run Redash on a Windows machine, w
3131
## Additional Resources
3232

3333
- [How to create a new visualization](https://discuss.redash.io/t/how-to-create-new-visualization-types-in-redash/86)
34-
- [How to create a new query runner](https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347)
34+
- [How to create a new query runner]({% link _kb/open-source/dev-guide/write-a-query-runner.md %})
3535

3636
## Getting Help
3737

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
---
2+
category: dev-guide
3+
parent_category: open-source
4+
title: Writing a New Query Runner
5+
slug: write-a-query-runner
6+
toc: true
7+
---
8+
9+
## Intro
10+
11+
Redash already connects to [many]({% link _kb/data-sources/querying/supported-data-sources.md %}) databases and REST APIs. To add support for a new data source type in Redash, you need to implement a Query Runner for it. A Query Runner is a Python class. This doc page shows the process of writing a new Query Runner. It uses the Firebolt Query Runner as an example.
12+
13+
Start by creating a new `firebolt.py` file in the `/redash/query_runner` directory and implement the `BaseQueryRunner` class:
14+
15+
```python
16+
from redash.query_runner import BaseQueryRunner, register
17+
18+
class Firebolt(BaseQueryRunner):
19+
def run_query(self, query, user):
20+
pass
21+
```
22+
23+
The only method that you must implement is the `run_query` method, which accepts a `query` parameter (string) and the `user` who invoked this query. The user is irrelevant for most query runners and can be ignored.
24+
25+
## Configuration
26+
27+
Usually the Query Runner needs some configuration to be used, so for this we need to implement the `configuration_schema` class method. The fields belong under the `properties` key:
28+
29+
```python
30+
@classmethod
31+
def configuration_schema(cls):
32+
return {
33+
"type": "object",
34+
"properties": {
35+
"api_endpoint": {"type": "string", "default": DEFAULT_API_URL},
36+
"engine_name": {"type": "string"},
37+
"DB": {"type": "string"},
38+
"user": {"type": "string"},
39+
"password": {"type": "string"}
40+
},
41+
"order": ["user", "password", "api_endpoint", "engine_name", "DB"],
42+
"required": ["user", "password", "engine_name", "DB"],
43+
"secret": ["password"],
44+
}
45+
```
46+
47+
This method returns a JSON schema object.
48+
49+
Each property must specify a `type`. The supported types for the properties are `string`, `number` and `boolean`. For file-like fields, see the next heading.
50+
51+
Optionally you may also specify a `default` value and `title` that will be displayed in the UI. If you do not specify a `title` the property name will be used. Properties without a default will be blank.
52+
53+
Also note the `required` field which defines the required properties (all of them except `api_endpoint` in this case) and `secret`, which defines the secret fields (which won’t be sent back to the UI).
54+
55+
Values for these settings are accessible as a dictionary on the `self.configuration` field of the Query Runner object.
56+
57+
### File uploads
58+
59+
When a user creates an instance of your data source, Redash stores the configuration in its metadata database. Some data sources will require users to upload a file (for example an SSL certificate or key file). To handle this, define the property with a name ending in `File` of type `string`. For example:
60+
61+
```python
62+
"properties": {
63+
"someFile": {"type": "string"},
64+
}
65+
```
66+
67+
The Redash front-end renders any property of type `string` whose name ends with `File` as a file-upload picker component. When saved, the contents of the file will be encrypted and saved to the metadata database as bytes. In your Query Runner code, you can read the value of `self.configuration['someFile']` into one of Python's built-in `tempfile` library fixtures. From there you can handle these bytes as you would any file stored on disk. You can see an example of this in the PostgreSQL Query Runner code.
68+
69+
## Executing the query
70+
71+
Now that we defined the configuration we can implement the `run_query` method:
72+
73+
```python
74+
def run_query(self, query, user):
75+
connection = connect(
76+
api_endpoint=(self.configuration.get("api_endpoint") or DEFAULT_API_URL),
77+
engine_name=(self.configuration.get("engine_name") or None),
78+
username=(self.configuration.get("user") or None),
79+
password=(self.configuration.get("password") or None),
80+
database=(self.configuration.get("DB") or None),
81+
)
82+
83+
cursor = connection.cursor()
84+
85+
try:
86+
cursor.execute(query)
87+
columns = self.fetch_columns(
88+
[(i[0], TYPES_MAP.get(i[1], None)) for i in cursor.description]
89+
)
90+
rows = [
91+
dict(zip((column["name"] for column in columns), row)) for row in cursor
92+
]
93+
94+
data = {"columns": columns, "rows": rows}
95+
error = None
96+
json_data = json_dumps(data)
97+
finally:
98+
connection.close()
99+
100+
return json_data, error
101+
```
102+
103+
This is the minimum required code. Here's what it does:
104+
105+
1. Connect to the the configured Firebolt endpoint or use the `DEFAULT_API_URL` which is imported from the official Firebolt Python API client.
106+
2. Run the query.
107+
3. Transform the results into the format Redash [expects]({% link _kb/data-sources/querying/json-api %}#Required-Data-Structure).
108+
109+
## Mapping Column Types to Redash Types
110+
111+
Note these lines:
112+
113+
```python
114+
columns = self.fetch_columns(
115+
[(i[0], TYPES_MAP.get(i[1], None)) for i in cursor.description]
116+
)
117+
```
118+
119+
The `BaseQueryRunner` includes a helper function (`fetch_columns`) which de-duplicates column names and assigns a type (if known) to the column. If no type is assigned, the default is string. The `TYPES_MAP` dictionary is a custom one we define at the top of the file. It will be different from one Query Runner to the next.
120+
121+
The return value of the `run_query` method is a tuple of the JSON encoded results and error string. The error string is used in case you want to return some kind of custom error message, otherwise you can let the exceptions propagate (this is useful when first developing your Query Runner).
122+
123+
## Fetching Database Schema
124+
125+
Up to this point, we've shown the minimum required to run a query. If you also want Redash to show the database schema and enable autocomplete, you need to implement the `get_schema` method:
126+
127+
```python
128+
def get_schema(self, get_stats=False):
129+
query = """
130+
SELECT TABLE_SCHEMA,
131+
TABLE_NAME,
132+
COLUMN_NAME
133+
FROM INFORMATION_SCHEMA.COLUMNS
134+
WHERE TABLE_SCHEMA <> 'INFORMATION_SCHEMA'
135+
"""
136+
137+
results, error = self.run_query(query, None)
138+
139+
if error is not None:
140+
raise Exception("Failed getting schema.")
141+
142+
schema = {}
143+
results = json_loads(results)
144+
145+
for row in results["rows"]:
146+
table_name = "{}.{}".format(row["table_schema"], row["table_name"])
147+
148+
if table_name not in schema:
149+
schema[table_name] = {"name": table_name, "columns": []}
150+
151+
schema[table_name]["columns"].append(row["column_name"])
152+
153+
return list(schema.values())
154+
```
155+
156+
The implementation of `get_schema` is specific to the data source you’re adding support to but the return value needs to be an array of dictionaries, where each dictionary has a `name` key (table name) and `columns` key (array of column names as strings).
157+
158+
### Including Column Types in the Schema Browser
159+
160+
If you want the Redash schema browser to also show column types, you can adjust your `get_schema` method so that the `columns` key contains an array of dictionaries with the keys `name` and `type`.
161+
162+
Here is an example without column types:
163+
164+
```json
165+
[
166+
{
167+
"name": "Table1",
168+
"columns": ["field1", "field2", "field3"]
169+
}
170+
]
171+
```
172+
173+
Here is an example that includes column types:
174+
175+
```json
176+
[
177+
{
178+
"name": "Table1",
179+
"columns": [
180+
{
181+
"name": "field1",
182+
"type": "VARCHAR"
183+
},
184+
{
185+
"name": "field2",
186+
"type": "BIGINT"
187+
},
188+
{
189+
"name": "field3",
190+
"type": "DATE"
191+
}
192+
]
193+
}
194+
]
195+
```
196+
197+
Note that the column type string is meant only to assist query authors. If it is present in the output of `get_schema` Redash trusts it and does not compare it to the type information returned by `run_query`. It is possible, therefore, that the type shown in the schema browser is different from the column type at the database. We recommend testing manually against a known schema to ensure that the correct types appear in the schema browser.
198+
199+
## Adding Test Connection Support
200+
201+
You can also implement the Test Connection button support. The Test Connection button appears on the data source setup and configuration screen. You can either supply a `noop_query` property on your Query Runner or implement the `test_connection` method yourself. In this example we opted for the first:
202+
203+
```python
204+
class Firebolt(BaseQueryRunner):
205+
noop_query = "SELECT 1"
206+
```
207+
208+
## Supporting Auto Limit for SQL Databases
209+
210+
The Redash front-end includes a tick box to automatically limit query results. This helps avoid overloading the Redash web app with large result sets. For most SQL style databases, you can automatically add auto limit support by inheriting `BaseSQLQueryRunner` instead of `BaseQueryRunner`.
211+
212+
```python
213+
from redash.query_runner import BaseSQLQueryRunner, register
214+
215+
class Firebolt(BaseSQLQueryRunner):
216+
def run_query(self, query, user):
217+
pass
218+
```
219+
220+
The `BaseSQLQueryRunner` uses `sqplarse` to intelligently append `LIMIT 1000` to a query prior to execution, as long as the tick box in the query editor is selected. For databases that use a different syntax (notably Microsoft SQL Server or any NoSQL database), you can continue to inherit `BaseQueryRunner` and implement the following:
221+
222+
```python
223+
@property
224+
def supports_auto_limit(self):
225+
return True
226+
227+
def apply_auto_limit(self, query_text: str, should_apply_auto_limit: bool):
228+
...
229+
```
230+
231+
For the `BaseQueryRunner`, the `supports_auto_limit` property is false by default and `apply_auto_limit` returns the query text unmodified.
232+
233+
## Checking for Required Dependencies
234+
235+
If the Query Runner needs some external Python packages, we wrap those imports with a try/except block, to prevent crashing deployments where this package is not available:
236+
237+
```python
238+
try:
239+
from firebolt.db import connect
240+
from firebolt.client import DEFAULT_API_URL
241+
enabled = True
242+
except ImportError:
243+
enabled = False
244+
```
245+
246+
The enabled variable is later used in the Query Runner’s enabled class method:
247+
248+
```python
249+
@classmethod
250+
def enabled(cls):
251+
return enabled
252+
```
253+
254+
If it returns False the Query Runner won’t be enabled.
255+
256+
## Finishing up
257+
258+
At the top of your file, import the `register` function and call it at the bottom of `firebolt.py`
259+
260+
```python
261+
# top of file
262+
263+
try:
264+
from firebolt.db import connect
265+
from firebolt.client import DEFAULT_API_URL
266+
enabled = True
267+
except ImportError:
268+
enabled = False
269+
270+
from redash.query_runner import BaseQueryRunner, register
271+
from redash.query_runner import TYPE_STRING, TYPE_INTEGER, TYPE_BOOLEAN
272+
from redash.utils import json_dumps, json_loads
273+
274+
TYPES_MAP = {1: TYPE_STRING, 2: TYPE_INTEGER, 3: TYPE_BOOLEAN}
275+
276+
# ... implementation
277+
278+
# bottom of file
279+
register(Firebolt)
280+
```
281+
282+
Usually the connector will need to have some additional Python packages, we add those to the `requirements_all_ds.txt` file. If the required Python packages don’t have any special dependencies (like some system packages), we usually add the query runner to the `default_query_runners` in `redash/settings/__init__.py`.
283+
284+
You can see the full pull request for the Firebolt query runner [here](https://github.com/getredash/redash/pull/5689).
285+
286+
## Summary
287+
288+
A Redash Query runner is a Python class that, at minimum, implements a `run_query` method that returns results in the format Redash expects. Configurable data source settings are defined by the `configuration_schema` class method which returns a JSON schema. You may optionally implement a connection test, schema fetching, and automatic limits. You can enable your data source by adding it to the `default_query_runners` list in settings, or by setting the `ADDITIONAL_QUERY_RUNNERS` environment variable.

0 commit comments

Comments
 (0)