diff --git a/docs/snippets/cloud/integrations/databricks.mdx b/docs/snippets/cloud/integrations/databricks.mdx index 25b49138f..9514f122f 100644 --- a/docs/snippets/cloud/integrations/databricks.mdx +++ b/docs/snippets/cloud/integrations/databricks.mdx @@ -38,93 +38,6 @@ Then, select your authentication method: long-lived personal access tokens. -### Storage Access - -Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors. -You can configure this in one of the following ways: - -#### Option 1: Fetch history using `DESCRIBE HISTORY` - -Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse. -In the Elementary UI, choose **None** under **Storage access method**. - -This requires `SELECT` access on the relevant tables, as described in the permissions and security section above. - -#### Option 2: Credentials vending - -Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending). -In the Elementary UI, choose **Credentials vending** under **Storage access method**. - -This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas. - -When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. - -#### Option 3: Direct storage access - -Elementary can access the storage directly using credentials that you configure. -In the Elementary UI, choose **Direct storage access** under **Storage access method**. - -When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. - -For S3-backed Databricks storage, you can configure access in one of the following ways: - -__AWS Role authentication__ - -Databricks direct storage access using AWS role ARN - -This is the recommended approach, as it provides better security and follows AWS best practices. -After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**. - -1. Create an IAM role that Elementary can assume. -2. Select "Another AWS account" as the trusted entity. -3. Enter Elementary's AWS account ID: `743289191656`. -4. Optionally enable an external ID. -5. Attach a policy that grants read access to the Delta log files. - -Use a policy similar to the following: - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "VisualEditor0", - "Effect": "Allow", - "Action": [ - "s3:GetObject", - "s3:ListBucket" - ], - "Resource": [ - "arn:aws:s3:::databricks-metastore-bucket", - "arn:aws:s3:::databricks-metastore-bucket/*_delta_log*" - ] - } - ] -} -``` - -This policy is scoped to the bucket itself and objects matching `*_delta_log*`, so it does not grant access to other objects in the bucket. - -Provide the role ARN in the Elementary UI, and the external ID as well if you configured one. - -__AWS access keys__ - -Databricks direct storage access using AWS access keys - -If needed, you can instead provide direct AWS credentials. -After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**. - -1. Create an IAM user that Elementary will use for storage access. -2. Enable programmatic access. -3. Attach the same read-only S3 policy shown above. -4. Provide the AWS access key ID and secret access key in the Elementary UI. - #### Access token (legacy) TO ``; GRANT USE SCHEMA, SELECT ON SCHEMA TO ``; --- Grant access to information schema tables +-- Grant access to system tables GRANT USE CATALOG ON CATALOG system TO ``; + GRANT USE SCHEMA ON SCHEMA system.information_schema TO ``; +GRANT USE SCHEMA ON SCHEMA system.query TO ``; +GRANT USE SCHEMA ON SCHEMA system.access TO ``; GRANT SELECT ON TABLE system.information_schema.tables TO ``; GRANT SELECT ON TABLE system.information_schema.columns TO ``; +GRANT SELECT ON TABLE system.query.history TO ``; +GRANT SELECT ON TABLE system.access.table_lineage TO ``; + +-- Grant access to billing metadata +GRANT USE SCHEMA ON SCHEMA system.billing TO ``; +GRANT SELECT ON TABLE system.billing.usage TO ``; +GRANT SELECT ON TABLE system.billing.list_prices TO ``; +``` + +### Storage Access + +Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors. +You can configure this in one of the following ways: + +#### Option 1: Fetch history using `DESCRIBE HISTORY` + +Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse. +In the Elementary UI, choose **None** under **Storage access method**. --- Grant select on tables for history & statistics access --- (Optional, required for automated volume & freshness tests - see explanation above. You can also limit to specific schemas used by dbt instead of granting on the full catalog) +This require granting SELECT access on your tables. This is a Databricks limitation - Elementary **never** reads any data from your tables, only metadata. However, there isn't +today any table-level metadata-only permission available in Databricks, so SELECT is required. + +To grant the access, use the following SQL statements: + +```sql GRANT USE CATALOG, USE SCHEMA, SELECT ON catalog to ``; ``` + + +#### Option 2: Credentials vending + +Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending). +In the Elementary UI, choose **Credentials vending** under **Storage access method**. + +This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas. + +When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. + +#### Option 3: Direct storage access + +Elementary can access the storage directly using credentials that you configure. +In the Elementary UI, choose **Direct storage access** under **Storage access method**. + +When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. + +For S3-backed Databricks storage, you can configure access in one of the following ways: + +__AWS Role authentication__ + +Databricks direct storage access using AWS role ARN + +This is the recommended approach, as it provides better security and follows AWS best practices. +After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**. + +1. Create an IAM role that Elementary can assume. +2. Select "Another AWS account" as the trusted entity. +3. Enter Elementary's AWS account ID: `743289191656`. +4. Optionally enable an external ID. +5. Attach a policy that grants read access to the Delta log files. + +Use a policy similar to the following: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "VisualEditor0", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::databricks-metastore-bucket", + "arn:aws:s3:::databricks-metastore-bucket/*_delta_log*" + ] + } + ] +} +``` + +This policy is scoped to the bucket itself and objects matching `*_delta_log*`, so it does not grant access to other objects in the bucket. + +Provide the role ARN in the Elementary UI, and the external ID as well if you configured one. + +__AWS access keys__ + +Databricks direct storage access using AWS access keys + +If needed, you can instead provide direct AWS credentials. +After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**. + +1. Create an IAM user that Elementary will use for storage access. +2. Enable programmatic access. +3. Attach the same read-only S3 policy shown above. +4. Provide the AWS access key ID and secret access key in the Elementary UI.