Engine Adapter Type: trino
NOTE: Trino may not be used for the SQLMesh state connection.
pip install "sqlmesh[trino]"
If you are using Oauth for Authentication, it is recommended to install keyring cache:
pip install "trino[external-authentication-token-cache]"
The trino engine adapter has been tested against the Hive Connector, Iceberg Connector, and Delta Lake Connector.
Please let us know on Slack if you are wanting to use another connector or have tried another connector.
Recommended hive catalog properties configuration (<catalog_name>.properties):
hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval=5s
hive.metastore-timeout=10s
hive.allow-drop-table=true
hive.allow-add-column=true
hive.allow-drop-column=true
hive.allow-rename-column=true
hive.allow-rename-table=trueIf you're using a hive metastore for the Iceberg catalog, the properties are mostly the same as the Hive connector.
iceberg.catalog.type=hive_metastore
# metastore properties as per the Hive Connector Configuration aboveNote: The Trino Iceberg Connector must be configured with an iceberg.catalog.type that supports views. At the time of this writing, this is hive_metastore, glue, and rest.
The jdbc and nessie iceberg catalog types do not support views and are thus incompatible with SQLMesh.
!!! info "Nessie"
Nessie is supported when used as an Iceberg REST Catalog (iceberg.catalog.type=rest).
For more information on how to configure the Trino Iceberg connector for this, see the Nessie documentation.
The Trino adapter Delta Lake connector has only been tested with the Hive metastore catalog type.
The properties file must include the Hive metastore URI and catalog name in addition to any other general properties.
hive.metastore.uri=thrift://example.net:9083
delta.hive-catalog-name=datalake_delta # example catalog name, can be any valid stringAWS Glue provides an implementation of the Hive metastore catalog.
Your Trino project's physical data objects are stored in a specific location, such as an AWS S3 bucket. Hive provides a default location, which you can override in its configuration file.
Set the default location for your project's tables in the Hive catalog configuration's hive.metastore.glue.default-warehouse-dir parameter.
For example:
hive.metastore=glue
hive.metastore.glue.default-warehouse-dir=s3://my-bucket/
| Option | Description | Type | Required |
|---|---|---|---|
type |
Engine type name - must be trino |
string | Y |
user |
The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username. | string | Y |
host |
The hostname of your cluster. Don't include the http:// or https:// prefix. |
string | Y |
catalog |
The name of a catalog in your cluster. | string | Y |
http_scheme |
The HTTP scheme to use when connecting to your cluster. By default, it's https and can only be http for no-auth or basic auth. |
string | N |
port |
The port to connect to your cluster. By default, it's 443 for https scheme and 80 for http |
int | N |
roles |
Mapping of catalog name to a role | dict | N |
source |
Value to send as Trino's source field for query attribution / auditing. Default: sqlmesh. |
string | N |
http_headers |
Additional HTTP headers to send with each request. | dict | N |
session_properties |
Trino session properties. Run SHOW SESSION to see all options. |
dict | N |
retries |
Number of retries to attempt when a request fails. Default: 3 |
int | N |
timezone |
Timezone to use for the connection. Default: client-side local timezone | string | N |
schema_location_mapping |
A mapping of regex patterns to S3 locations to use for the LOCATION property when creating schemas. See Table and Schema locations for more details. |
dict | N |
catalog_type_overrides |
A mapping of catalog names to their connector type. This is used to enable/disable connector specific behavior. See Catalog Type Overrides for more details. | dict | N |
When using connectors that are decoupled from their storage (such as the Iceberg, Hive or Delta connectors), when creating new tables Trino needs to know the location in the physical storage it should write the table data to.
This location gets stored against the table in the metastore so that any engine trying to read the data knows where to look.
Trino allows you to optionally configure a default-warehouse-dir property at the Metastore level. When creating objects, Trino will infer schema locations to be <default warehouse dir>/<schema name> and table locations to be <default warehouse dir>/<schema name>/<table name>.
However, if you dont set this property, Trino can still infer table locations if a schema location is explicitly set.
For example, if you specify the LOCATION property when creating a schema like so:
CREATE SCHEMA staging_data
WITH (LOCATION = 's3://warehouse/production/staging_data')Then any tables created under that schema will have their location inferred as <schema location>/<table name>.
If you specify neither a default-warehouse-dir in the metastore config nor a schema location when creating the schema, you must specify an explicit table location when creating the table or Trino will produce an error.
Creating a table in a specific location is very similar to creating a schema in a specific location:
CREATE TABLE staging_data.customers (customer_id INT)
WITH (LOCATION = 's3://warehouse/production/staging_data/customers')Within SQLMesh, you can configure the value to use for the LOCATION property when SQLMesh creates tables and schemas. This overrides what Trino would have inferred based on the cluster configuration.
To configure the LOCATION property that SQLMesh will specify when issuing CREATE SCHEMA statements, you can use the schema_location_mapping connection property. This applies to all schemas that SQLMesh creates, including its internal ones.
The simplest example is to emulate a default-warehouse-dir:
gateways:
trino:
connection:
type: trino
...
schema_location_mapping:
'.*': 's3://warehouse/production/@{schema_name}'This will cause all schemas to get created with their location set to s3://warehouse/production/<schema name>. The table locations will be inferred by Trino as s3://warehouse/production/<schema name>/<table name> so all objects will effectively be created under s3://warehouse/production/.
It's worth mentioning that if your models are using fully qualified three part names, eg <catalog>.<schema>.<name> then string being matched against the schema_location_mapping regex will be <catalog>.<schema> and not just the <schema> itself. This allows you to set different locations for the same schema name if that schema name is used across multiple catalogs.
If your models are using two part names, eg <schema>.<table> then only the <schema> part will be matched against the regex.
Here's an example:
gateways:
trino:
connection:
type: trino
...
schema_location_mapping:
'^utils$': 's3://utils-bucket/@{schema_name}'
'^landing\..*$': 's3://raw-data/@{catalog_name}/@{schema_name}'
'^staging.*$': 's3://bucket/@{schema_name}_dev'
'^sqlmesh.*$': 's3://sqlmesh-internal/dev/@{schema_name}'This would perform the following mappings:
- a schema called
saleswould not be mapped to a location at all because it doesnt match any of the patterns. It would be created without aLOCATIONproperty - a schema called
utilswould be mapped to the locations3://utils-bucket/utilsbecause it directly matches the^utils$pattern - a schema called
transactionsin a catalog calledlandingwould be mapped to the locations3://raw-data/landing/transactionsbecause the stringlanding.transactionsmatches the^landing\..*$pattern - schemas called
staging_customersandstaging_accountswould be mapped to the locationss3://bucket/staging_customers_devands3://bucket/staging_accounts_devrespectively because they match the^staging.*$pattern - a schema called
accountsin a catalog calledstagingwould be mapped to the locations3://bucket/accounts_devbecause the stringstaging.accountsmatches the^staging.*$pattern - schemas called
sqlmesh__staging_customersandsqlmesh__staging_utilswould be mapped to the locationss3://sqlmesh-internal/dev/sqlmesh__staging_customersands3://sqlmesh-internal/dev/sqlmesh__staging_utilsrespectively because they match the^sqlmesh.*$pattern
!!! info "Placeholders"
You may use the @{catalog_name} and @{schema_name} placeholders in the mapping value.
If there is a match on one of the patterns then the catalog / schema that SQLMesh is about to use in the `CREATE SCHEMA` statement will be substituted into these placeholders.
Note the use of curly brace syntax `@{}` when referencing these placeholders - learn more [here](../../concepts/macros/sqlmesh_macros.md#embedding-variables-in-strings).
Often, you don't need to configure an explicit table location because if you have configured explicit schema locations, table locations are automatically inferred by Trino to be a subdirectory under the schema location.
However, if you need to, you can configure an explicit table location by adding a location property to the model physical_properties.
Note that you need to use the @resolve_template macro to generate a unique table location for each model version. Otherwise, all model versions will be written to the same location and clobber each other.
MODEL (
name staging.customers,
kind FULL,
physical_properties (
location = @resolve_template('s3://warehouse/@{catalog_name}/@{schema_name}/@{table_name}')
)
);
SELECT ...This will cause SQLMesh to set the specified LOCATION when issuing a CREATE TABLE statement.
SQLMesh attempts to determine the connector type of a catalog by querying the system.metadata.catalogs table and checking the connector_name column.
It checks if the connector name is hive for Hive connector behavior or contains iceberg or delta_lake for Iceberg or Delta Lake connector behavior respectively.
However, the connector name may not always be a reliable way to determine the connector type, for example when using a custom connector or a fork of an existing connector.
To handle such cases, you can use the catalog_type_overrides connection property to explicitly specify the connector type for specific catalogs.
For example, to specify that the datalake catalog is using the Iceberg connector and the analytics catalog is using the Hive connector, you can configure the connection as follows:
gateways:
trino:
connection:
type: trino
...
catalog_type_overrides:
datalake: iceberg
analytics: hive=== "No Auth"
| Option | Description | Type | Required |
|------------|------------------------------------------|:------:|:--------:|
| method | no-auth (Default) | string | N |
```yaml linenums="1"
gateway_name:
connection:
type: trino
user: [user]
host: [host]
catalog: [catalog]
# Most likely you will want http for scheme when not using auth
http_scheme: http
```
=== "Basic Auth"
| Option | Description | Type | Required |
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :------: |
| `method` | `basic` | string | Y |
| `password` | The password to use when authenticating. | string | Y |
| `verify` | Boolean flag for whether SSL verification should occur. Default: [trinodb Python client](https://github.com/trinodb/trino-python-client) default (`true` as of this writing) | bool | N |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: basic
user: [user]
password: [password]
host: [host]
catalog: [catalog]
```
* [Trino Documentation on Basic Authentication](https://trino.io/docs/current/security/password-file.html)
* [Python Client Basic Authentication](https://github.com/trinodb/trino-python-client#basic-authentication)
=== "LDAP"
| Option | Description | Type | Required |
|----------------------|-------------------------------------------------------------------------|:------:|:--------:|
| `method` | `ldap` | string | Y |
| `password` | The password to use when authenticating. | string | Y |
| `impersonation_user` | Override the provided username. This lets you impersonate another user. | string | N |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: ldap
user: [user]
password: [password]
host: [host]
catalog: [catalog]
```
* [Trino Documentation on LDAP Authentication](https://trino.io/docs/current/security/ldap.html)
* [Python Client LDAP Authentication](https://github.com/trinodb/trino-python-client#basic-authentication)
=== "Kerberos"
| Option | Description | Type | Required |
|----------------------------------|-----------------------------------------------------------------------------------|:------:|:--------:|
| `method` | `kerberos` | string | Y |
| `keytab` | Path to keytab. Ex: `/tmp/trino.keytab` | string | Y |
| `krb5_config` | Path to config. Ex: `/tmp/krb5.conf` | string | Y |
| `principal` | Principal. Ex: `user@company.com` | string | Y |
| `service_name` | Service name (default is `trino`) | string | N |
| `hostname_override` | Kerberos hostname for a host whose DNS name doesn't match | string | N |
| `mutual_authentication` | Boolean flag for mutual authentication. Default: `false` | bool | N |
| `force_preemptive` | Boolean flag to preemptively initiate the Kerberos GSS exchange. Default: `false` | bool | N |
| `sanitize_mutual_error_response` | Boolean flag to strip content and headers from error responses. Default: `true` | bool | N |
| `delegate` | Boolean flag for credential delegation (`GSS_C_DELEG_FLAG`). Default: `false` | bool | N |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: kerberos
user: user
keytab: /tmp/trino.keytab
krb5_config: /tmp/krb5.conf
principal: trino@company.com
host: trino.company.com
catalog: datalake
```
* [Trino Documentation on Kerberos Authentication](https://trino.io/docs/current/security/kerberos.html)
* [Python Client Kerberos Authentication](https://github.com/trinodb/trino-python-client#kerberos-authentication)
=== "JWT"
| Option | Description | Type | Required |
|-------------|-----------------|:------:|:--------:|
| `method` | `jwt` | string | Y |
| `jwt_token` | The JWT string. | string | Y |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: jwt
user: [user]
password: [password]
host: [host]
catalog: [catalog]
```
* [Trino Documentation on JWT Authentication](https://trino.io/docs/current/security/jwt.html)
* [Python Client JWT Authentication](https://github.com/trinodb/trino-python-client#jwt-authentication)
=== "Certificate"
| Option | Description | Type | Required |
|----------------------|---------------------------------------------------|:------:|:--------:|
| `method` | `certificate` | string | Y |
| `cert` | The full path to a certificate file | string | Y |
| `client_certificate` | Path to client certificate. Ex: `/tmp/client.crt` | string | Y |
| `client_private_key` | Path to client private key. Ex: `/tmp/client.key` | string | Y |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: certificate
user: [user]
password: [password]
host: [host]
catalog: [catalog]
cert: [path/to/cert_file]
client_certificate: [path/to/client/cert]
client_private_key: [path/to/client/key]
```
=== "Oauth"
| Option | Description | Type | Required |
|----------------------|---------------------------------------------------|:------:|:--------:|
| `method` | `oauth` | string | Y |
```yaml linenums="1"
gateway_name:
connection:
type: trino
method: oauth
host: trino.company.com
catalog: datalake
```
* [Trino Documentation on Oauth Authentication](https://trino.io/docs/current/security/oauth2.html)
* [Python Client Oauth Authentication](https://github.com/trinodb/trino-python-client#oauth2-authentication)