Skip to content

Commit 337c57f

Browse files
authored
Update Glue doc with Cloud-specific guidance (#1275)
1 parent 1be2c0c commit 337c57f

1 file changed

Lines changed: 55 additions & 18 deletions

File tree

modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc

Lines changed: 55 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,29 @@
22
:description: Add Redpanda topics as Iceberg tables that you can query from AWS Glue Data Catalog.
33
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
44
:page-beta: true
5-
ifdef::env-cloud[]
6-
:rpk-install-doc: manage:rpk/rpk-install.adoc
7-
endif::[]
8-
ifndef::env-cloud[]
9-
:rpk-install-doc: get-started:rpk-install.adoc
10-
endif::[]
11-
125

136
[NOTE]
147
====
158
include::shared:partial$enterprise-license.adoc[]
169
====
1710

1811
// tag::single-source[]
12+
ifdef::env-cloud[]
13+
:rp_version: 25.2
14+
:rpk_install_doc: manage:rpk/rpk-install.adoc
15+
endif::[]
16+
17+
ifndef::env-cloud[]
18+
:rp_version: 25.1.7
19+
:rpk_install_doc: get-started:rpk-install.adoc
20+
endif::[]
1921

2022
This guide walks you through querying Redpanda topics as Iceberg tables stored in AWS S3, using a catalog integration with https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro[AWS Glue^]. For general information about Iceberg catalog integrations in Redpanda, see xref:manage:iceberg/use-iceberg-catalogs.adoc[].
2123

2224
== Prerequisites
2325

24-
* Redpanda version 25.1.7 or later.
25-
* xref:{rpk-install-doc}[`rpk`] installed or updated to the latest version.
26+
* Redpanda version {rp_version} or later.
27+
* xref:{rpk_install_doc}[`rpk`] installed or updated to the latest version.
2628
ifdef::env-cloud[]
2729
** You can also use the Redpanda Cloud API to xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[reference secrets in your cluster configuration].
2830
endif::[]
@@ -44,15 +46,21 @@ If you want to use partitioning, you must specify a custom partition specificati
4446

4547
== Authorize access to AWS Glue
4648

49+
ifndef::env-cloud[]
4750
You must allow Redpanda access to AWS Glue services in your AWS account. You can use the same access credentials that you configured for S3 (IAM role, access keys, and KMS key), as long as you have also added read and write access to AWS Glue Data Catalog.
4851

4952
For example, you could create a separate IAM policy that manages access to AWS Glue, and attach it to the IAM role that Redpanda also uses to access S3. It is recommended to add all AWS Glue API actions in the policy (`"glue:*"`) on the following resources:
53+
endif::[]
54+
55+
ifdef::env-cloud[]
56+
You must allow Redpanda access to AWS Glue services in your AWS account. It is recommended to create a new IAM policy or role that manages access to AWS Glue, allowing all AWS Glue API actions (`"glue:*"`) on the following resources:
57+
endif::[]
5058

5159
- Root catalog (`catalog`)
5260
- All databases (`database/*`)
5361
- All tables (`table/\*/*`)
5462

55-
Your policy should include a statement similar to the following:
63+
Your IAM policy should include a statement similar to the following:
5664

5765
[,json]
5866
----
@@ -78,14 +86,24 @@ For more information on configuring IAM permissions, see the https://docs.aws.am
7886

7987
== Configure authentication and credentials
8088

81-
You can configure credentials for the AWS Glue Data Catalog integration in either of the following ways:
89+
ifndef::env-cloud[]
90+
You must configure credentials for the AWS Glue Data Catalog integration in either of the following ways:
8291

8392
* Allow Redpanda to use the same `cloud_storage_*` credential properties configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach.
8493
* If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication:
8594
** config_ref:iceberg_rest_catalog_aws_access_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_access_key`] overrides config_ref:cloud_storage_access_key,true,properties/cluster-properties[`cloud_storage_access_key`]
8695
** config_ref:iceberg_rest_catalog_aws_secret_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_secret_key`] overrides config_ref:cloud_storage_secret_key,true,properties/cluster-properties[`cloud_storage_secret_key`]
8796
** config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] overrides config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`]
8897
** config_ref:iceberg_rest_catalog_aws_credentials_source,true,properties/cluster-properties[`iceberg_rest_catalog_aws_credentials_source`] overrides config_ref:cloud_storage_credentials_source,true,properties/cluster-properties[`cloud_storage_credentials_source`]
98+
endif::[]
99+
100+
ifdef::env-cloud[]
101+
You must configure credentials for the AWS Glue Data Catalog integration using the following properties:
102+
103+
* config_ref:iceberg_rest_catalog_aws_access_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_access_key`]
104+
* config_ref:iceberg_rest_catalog_aws_secret_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_secret_key`], added as a secret value (see the <<update-cluster-configuration,next section>> for details)
105+
* config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`]
106+
endif::[]
89107

90108
== Update cluster configuration
91109

@@ -100,16 +118,31 @@ Run `rpk cluster config edit` to update these properties:
100118
----
101119
iceberg_enabled: true
102120
iceberg_catalog_type: rest
103-
iceberg_rest_catalog_endpoint: https://glue.<aws-region>.amazonaws.com/iceberg
121+
iceberg_rest_catalog_endpoint: https://glue.<glue-region>.amazonaws.com/iceberg
104122
iceberg_rest_catalog_authentication_mode: aws_sigv4
105123
iceberg_rest_catalog_base_location: s3://<bucket-name>/<warehouse-path>
124+
# Use the iceberg_rest_catalog_aws_* properties if you want to
125+
# use separate AWS credentials for the catalog, or delete to reuse S3
126+
# (cloud_storage_*) credentials.
127+
# For access using access keys only, use iceberg_rest_catalog_aws_access_key
128+
# and iceberg_rest_catalog_aws_secret_key. For access with an IAM role, use
129+
# iceberg_rest_catalog_aws_credentials_source only.
130+
# iceberg_rest_catalog_aws_region:
131+
# iceberg_rest_catalog_aws_access_key:
132+
# iceberg_rest_catalog_aws_secret_key:
133+
# iceberg_rest_catalog_aws_credentials_source:
106134
----
135+
+
136+
Use your own values for the following placeholders:
137+
+
138+
--
139+
- `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
140+
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
141+
--
107142
endif::[]
108143
ifdef::env-cloud[]
109144
Use `rpk` like in the following example, or use the Cloud API to xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[update these cluster properties]. The update might take several minutes to complete.
110145
+
111-
To reference a secret in a cluster property, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value].
112-
+
113146
[,bash]
114147
----
115148
rpk cloud login
@@ -119,19 +152,23 @@ rpk profile create --from-cloud <cluster-id>
119152
rpk cluster config set \
120153
iceberg_enabled=true \
121154
iceberg_catalog_type=rest \
122-
iceberg_rest_catalog_endpoint=https://glue.<aws-region>.amazonaws.com/iceberg \
155+
iceberg_rest_catalog_endpoint=https://glue.<glue-region>.amazonaws.com/iceberg \
123156
iceberg_rest_catalog_authentication_mode=aws_sigv4 \
124157
iceberg_rest_catalog_base_location=s3://<bucket-name>/<warehouse-path>
125-
158+
iceberg_rest_catalog_aws_region=<glue-region>
159+
iceberg_rest_catalog_aws_access_key=<glue-access-key>
160+
iceberg_rest_catalog_aws_secret_key=${secrets.<glue-secret-key-name>}
126161
----
127-
endif::[]
128162
+
129163
Use your own values for the following placeholders:
130164
+
131165
--
132-
- `<aws-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
166+
- `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in your config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
133167
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
168+
- `<glue-access-key>`: The AWS access key ID for your Glue service account.
169+
- `<glue-secret-key-name>`: The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value].
134170
--
171+
endif::[]
135172
+
136173
[,bash,role=no-copy]
137174
----

0 commit comments

Comments
 (0)