pip install "sqlmesh[athena]"
SQLMesh leverages the PyAthena DBAPI driver to connect to Athena. Therefore, the connection options relate to the PyAthena connection options. Note that PyAthena uses boto3 under the hood so you can also use boto3 environment variables for configuration.
| Option | Description | Type | Required |
|---|---|---|---|
type |
Engine type name - must be athena |
string | Y |
aws_access_key_id |
The access key for your AWS user | string | N |
aws_secret_access_key |
The secret key for your AWS user | string | N |
role_arn |
The ARN of a role to assume once authenticated | string | N |
role_session_name |
The session name to use when assuming role_arn |
string | N |
region_name |
The AWS region to use | string | N |
work_group |
The Athena workgroup to send queries to | string | N |
s3_staging_dir |
The S3 location for Athena to write query results. Only required if not using work_group OR the configured work_group doesnt have a results location set |
string | N |
schema_name |
The default schema to place objects in if a schema isnt specified. Defaults to default |
string | N |
catalog_name |
The default catalog to place schemas in. Defaults to AwsDataCatalog |
string | N |
These options are specific to SQLMesh itself and are not passed to PyAthena
| Option | Description | Type | Required |
|---|---|---|---|
s3_warehouse_location |
Set the base path in S3 where SQLMesh will instruct Athena to place table data. Only required if you arent specifying the location in the model itself. See S3 Locations below. | string | N |
The Athena adapter utilises the following model top-level properties:
| Name | Description | Type | Required |
|---|---|---|---|
table_format |
Sets the table_type Athena uses when creating the table. Valid values are hive or iceberg. |
string | N |
storage_format |
Configures the file format to be used by the table_format. For Hive tables, this sets the STORED AS option. For Iceberg tables, this sets format property. |
string | N |
The Athena adapter recognises the following model physical_properties:
| Name | Description | Type | Default |
|---|---|---|---|
s3_base_location |
s3:// base URI of where the snapshot tables for this model should be written. Overrides s3_warehouse_location if one is configured. |
string |
When creating tables, Athena needs to know where in S3 the table data is located. You cannot issue a CREATE TABLE statement without specifying a LOCATION for the table data.
In addition, unlike other engines such as Trino, Athena will not infer a table location if you set a schema location via CREATE SCHEMA <schema> LOCATION 's3://schema/location'.
Therefore, in order for SQLMesh to issue correct CREATE TABLE statements to Athena, you need to configure where the tables should be stored. There are two options for this:
- Project-wide: set
s3_warehouse_locationin the connection config. SQLMesh will set the tableLOCATIONto be<s3_warehouse_location>/<schema_name>/<snapshot_table_name>when it creates a snapshot of your model. - Per-model: set
s3_base_locationin the modelphysical_properties. SQLMesh will set the tableLOCATIONto be<s3_base_location>/<snapshot_table_name>every time it creates a snapshot of your model. This takes precedence over anys3_warehouse_locationset in the connection config.
Athena was initially designed to read data stored in S3 and to do so without changing that data. This means that it does not have good support for mutating tables. In particular, it will not delete data from Hive tables.
Consequently, forward only changes that mutate the schemas of existing tables have a high chance of failure because Athena supports very limited schema modifications on Hive tables.
However, Athena does support Apache Iceberg tables which allow a full range of operations. These can be used for more complex model types such as INCREMENTAL_BY_UNIQUE_KEY and SCD_TYPE_2.
To use an Iceberg table for a model, set table_format iceberg in the model properties.
In general, Iceberg tables offer the most flexibility and you'll run into the least SQLMesh limitations when using them. However, we create Hive tables by default because Athena creates Hive tables by default, so Iceberg tables are opt-in rather than opt-out.