Skip to content

Commit 822f307

Browse files
authored
Merge pull request #2194 from elementary-data/core-670-update-permissions-on-dbx-docs
Update DBX permissions: Include cost; Reorganized Storage access
2 parents 5f37d40 + 2f50257 commit 822f307

2 files changed

Lines changed: 109 additions & 96 deletions

File tree

docs/snippets/cloud/integrations/databricks.mdx

Lines changed: 0 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -38,93 +38,6 @@ Then, select your authentication method:
3838
long-lived personal access tokens.
3939
</Info>
4040

41-
### Storage Access
42-
43-
Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors.
44-
You can configure this in one of the following ways:
45-
46-
#### Option 1: Fetch history using `DESCRIBE HISTORY`
47-
48-
Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse.
49-
In the Elementary UI, choose **None** under **Storage access method**.
50-
51-
This requires `SELECT` access on the relevant tables, as described in the permissions and security section above.
52-
53-
#### Option 2: Credentials vending
54-
55-
Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending).
56-
In the Elementary UI, choose **Credentials vending** under **Storage access method**.
57-
58-
This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas.
59-
60-
When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions.
61-
62-
#### Option 3: Direct storage access
63-
64-
Elementary can access the storage directly using credentials that you configure.
65-
In the Elementary UI, choose **Direct storage access** under **Storage access method**.
66-
67-
When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions.
68-
69-
For S3-backed Databricks storage, you can configure access in one of the following ways:
70-
71-
__AWS Role authentication__
72-
73-
<img
74-
src="/pics/cloud/integrations/databricks/storage-direct-access-role.png"
75-
alt="Databricks direct storage access using AWS role ARN"
76-
/>
77-
78-
This is the recommended approach, as it provides better security and follows AWS best practices.
79-
After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**.
80-
81-
1. Create an IAM role that Elementary can assume.
82-
2. Select "Another AWS account" as the trusted entity.
83-
3. Enter Elementary's AWS account ID: `743289191656`.
84-
4. Optionally enable an external ID.
85-
5. Attach a policy that grants read access to the Delta log files.
86-
87-
Use a policy similar to the following:
88-
89-
```json
90-
{
91-
"Version": "2012-10-17",
92-
"Statement": [
93-
{
94-
"Sid": "VisualEditor0",
95-
"Effect": "Allow",
96-
"Action": [
97-
"s3:GetObject",
98-
"s3:ListBucket"
99-
],
100-
"Resource": [
101-
"arn:aws:s3:::databricks-metastore-bucket",
102-
"arn:aws:s3:::databricks-metastore-bucket/*_delta_log*"
103-
]
104-
}
105-
]
106-
}
107-
```
108-
109-
This policy is scoped to the bucket itself and objects matching `*_delta_log*`, so it does not grant access to other objects in the bucket.
110-
111-
Provide the role ARN in the Elementary UI, and the external ID as well if you configured one.
112-
113-
__AWS access keys__
114-
115-
<img
116-
src="/pics/cloud/integrations/databricks/storage-direct-access-keys.png"
117-
alt="Databricks direct storage access using AWS access keys"
118-
/>
119-
120-
If needed, you can instead provide direct AWS credentials.
121-
After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**.
122-
123-
1. Create an IAM user that Elementary will use for storage access.
124-
2. Enable programmatic access.
125-
3. Attach the same read-only S3 policy shown above.
126-
4. Provide the AWS access key ID and secret access key in the Elementary UI.
127-
12841
#### Access token (legacy)
12942

13043
<img

docs/snippets/dwh/databricks/databricks_permissions_and_security.mdx

Lines changed: 109 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,12 @@ Elementary cloud requires the following permissions:
77
- **Elementary schema read-only access** - This is required by Elementary to read dbt metadata & test results collected by the Elementary dbt package as a part of your pipeline runs.
88
This permission does not give access to your data.
99

10-
- **Information schema metadata access** - Elementary needs access to the `system.information_schema.tables` and `system.information_schema.columns` system tables, to get metadata
11-
about existing tables and columns in your data warehouse. This is used to power features such as column-level lineage and automated volume & freshness monitors.
10+
- **System metadata access** - Elementary needs access to the `system.information_schema.tables`, `system.information_schema.columns`, `system.query.history` and `system.access.table_lineage` system tables.
11+
This access is used to get metadata about existing tables and columns, and to power features such as column-level lineage and automated volume & freshness monitors.
1212

13-
- **Read access needed for some metadata operations (optional)** - In order to enable Elementary's automated volume & freshness monitors, Elementary needs access to query history, as well
14-
as Databricks APIs to obtain table statistics.
15-
These operations require granting SELECT access on your tables. This is a Databricks limitation - Elementary **never** reads any data from your tables, only metadata. However, there isn't
16-
today any table-level metadata-only permission available in Databricks, so SELECT is required.
13+
- **Billing metadata access** - Elementary needs access to the `system.billing.usage` and `system.billing.list_prices`. This allows Elementary to monitor the warehouse cost and alert on it.
14+
15+
- **Storage read-only access** - See details below.
1716

1817

1918
#### Grants SQL template
@@ -25,13 +24,114 @@ Please use the following SQL statements to grant the permissions specified above
2524
GRANT USE CATALOG ON CATALOG <catalog> TO `<service_principal_app_id>`;
2625
GRANT USE SCHEMA, SELECT ON SCHEMA <elementary_schema> TO `<service_principal_app_id>`;
2726

28-
-- Grant access to information schema tables
27+
-- Grant access to system tables
2928
GRANT USE CATALOG ON CATALOG system TO `<service_principal_app_id>`;
29+
3030
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `<service_principal_app_id>`;
31+
GRANT USE SCHEMA ON SCHEMA system.query TO `<service_principal_app_id>`;
32+
GRANT USE SCHEMA ON SCHEMA system.access TO `<service_principal_app_id>`;
3133
GRANT SELECT ON TABLE system.information_schema.tables TO `<service_principal_app_id>`;
3234
GRANT SELECT ON TABLE system.information_schema.columns TO `<service_principal_app_id>`;
35+
GRANT SELECT ON TABLE system.query.history TO `<service_principal_app_id>`;
36+
GRANT SELECT ON TABLE system.access.table_lineage TO `<service_principal_app_id>`;
37+
38+
-- Grant access to billing metadata
39+
GRANT USE SCHEMA ON SCHEMA system.billing TO `<service_principal_app_id>`;
40+
GRANT SELECT ON TABLE system.billing.usage TO `<service_principal_app_id>`;
41+
GRANT SELECT ON TABLE system.billing.list_prices TO `<service_principal_app_id>`;
42+
```
43+
44+
### Storage Access
45+
46+
Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors.
47+
You can configure this in one of the following ways:
48+
49+
#### Option 1: Fetch history using `DESCRIBE HISTORY`
50+
51+
Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse.
52+
In the Elementary UI, choose **None** under **Storage access method**.
3353

34-
-- Grant select on tables for history & statistics access
35-
-- (Optional, required for automated volume & freshness tests - see explanation above. You can also limit to specific schemas used by dbt instead of granting on the full catalog)
54+
This require granting SELECT access on your tables. This is a Databricks limitation - Elementary **never** reads any data from your tables, only metadata. However, there isn't
55+
today any table-level metadata-only permission available in Databricks, so SELECT is required.
56+
57+
To grant the access, use the following SQL statements:
58+
59+
```sql
3660
GRANT USE CATALOG, USE SCHEMA, SELECT ON catalog <catalog> to `<service_principal_app_id>`;
3761
```
62+
63+
64+
#### Option 2: Credentials vending
65+
66+
Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending).
67+
In the Elementary UI, choose **Credentials vending** under **Storage access method**.
68+
69+
This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas.
70+
71+
When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions.
72+
73+
#### Option 3: Direct storage access
74+
75+
Elementary can access the storage directly using credentials that you configure.
76+
In the Elementary UI, choose **Direct storage access** under **Storage access method**.
77+
78+
When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions.
79+
80+
For S3-backed Databricks storage, you can configure access in one of the following ways:
81+
82+
__AWS Role authentication__
83+
84+
<img
85+
src="/pics/cloud/integrations/databricks/storage-direct-access-role.png"
86+
alt="Databricks direct storage access using AWS role ARN"
87+
/>
88+
89+
This is the recommended approach, as it provides better security and follows AWS best practices.
90+
After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**.
91+
92+
1. Create an IAM role that Elementary can assume.
93+
2. Select "Another AWS account" as the trusted entity.
94+
3. Enter Elementary's AWS account ID: `743289191656`.
95+
4. Optionally enable an external ID.
96+
5. Attach a policy that grants read access to the Delta log files.
97+
98+
Use a policy similar to the following:
99+
100+
```json
101+
{
102+
"Version": "2012-10-17",
103+
"Statement": [
104+
{
105+
"Sid": "VisualEditor0",
106+
"Effect": "Allow",
107+
"Action": [
108+
"s3:GetObject",
109+
"s3:ListBucket"
110+
],
111+
"Resource": [
112+
"arn:aws:s3:::databricks-metastore-bucket",
113+
"arn:aws:s3:::databricks-metastore-bucket/*_delta_log*"
114+
]
115+
}
116+
]
117+
}
118+
```
119+
120+
This policy is scoped to the bucket itself and objects matching `*_delta_log*`, so it does not grant access to other objects in the bucket.
121+
122+
Provide the role ARN in the Elementary UI, and the external ID as well if you configured one.
123+
124+
__AWS access keys__
125+
126+
<img
127+
src="/pics/cloud/integrations/databricks/storage-direct-access-keys.png"
128+
alt="Databricks direct storage access using AWS access keys"
129+
/>
130+
131+
If needed, you can instead provide direct AWS credentials.
132+
After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**.
133+
134+
1. Create an IAM user that Elementary will use for storage access.
135+
2. Enable programmatic access.
136+
3. Attach the same read-only S3 policy shown above.
137+
4. Provide the AWS access key ID and secret access key in the Elementary UI.

0 commit comments

Comments
 (0)