You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc
+55-18Lines changed: 55 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,27 +2,29 @@
2
2
:description: Add Redpanda topics as Iceberg tables that you can query from AWS Glue Data Catalog.
3
3
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
4
4
:page-beta: true
5
-
ifdef::env-cloud[]
6
-
:rpk-install-doc: manage:rpk/rpk-install.adoc
7
-
endif::[]
8
-
ifndef::env-cloud[]
9
-
:rpk-install-doc: get-started:rpk-install.adoc
10
-
endif::[]
11
-
12
5
13
6
[NOTE]
14
7
====
15
8
include::shared:partial$enterprise-license.adoc[]
16
9
====
17
10
18
11
// tag::single-source[]
12
+
ifdef::env-cloud[]
13
+
:rp_version: 25.2
14
+
:rpk_install_doc: manage:rpk/rpk-install.adoc
15
+
endif::[]
16
+
17
+
ifndef::env-cloud[]
18
+
:rp_version: 25.1.7
19
+
:rpk_install_doc: get-started:rpk-install.adoc
20
+
endif::[]
19
21
20
22
This guide walks you through querying Redpanda topics as Iceberg tables stored in AWS S3, using a catalog integration with https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro[AWS Glue^]. For general information about Iceberg catalog integrations in Redpanda, see xref:manage:iceberg/use-iceberg-catalogs.adoc[].
21
23
22
24
== Prerequisites
23
25
24
-
* Redpanda version 25.1.7 or later.
25
-
* xref:{rpk-install-doc}[`rpk`] installed or updated to the latest version.
26
+
* Redpanda version {rp_version} or later.
27
+
* xref:{rpk_install_doc}[`rpk`] installed or updated to the latest version.
26
28
ifdef::env-cloud[]
27
29
** You can also use the Redpanda Cloud API to xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[reference secrets in your cluster configuration].
28
30
endif::[]
@@ -44,15 +46,21 @@ If you want to use partitioning, you must specify a custom partition specificati
44
46
45
47
== Authorize access to AWS Glue
46
48
49
+
ifndef::env-cloud[]
47
50
You must allow Redpanda access to AWS Glue services in your AWS account. You can use the same access credentials that you configured for S3 (IAM role, access keys, and KMS key), as long as you have also added read and write access to AWS Glue Data Catalog.
48
51
49
52
For example, you could create a separate IAM policy that manages access to AWS Glue, and attach it to the IAM role that Redpanda also uses to access S3. It is recommended to add all AWS Glue API actions in the policy (`"glue:*"`) on the following resources:
53
+
endif::[]
54
+
55
+
ifdef::env-cloud[]
56
+
You must allow Redpanda access to AWS Glue services in your AWS account. It is recommended to create a new IAM policy or role that manages access to AWS Glue, allowing all AWS Glue API actions (`"glue:*"`) on the following resources:
57
+
endif::[]
50
58
51
59
- Root catalog (`catalog`)
52
60
- All databases (`database/*`)
53
61
- All tables (`table/\*/*`)
54
62
55
-
Your policy should include a statement similar to the following:
63
+
Your IAM policy should include a statement similar to the following:
56
64
57
65
[,json]
58
66
----
@@ -78,14 +86,24 @@ For more information on configuring IAM permissions, see the https://docs.aws.am
78
86
79
87
== Configure authentication and credentials
80
88
81
-
You can configure credentials for the AWS Glue Data Catalog integration in either of the following ways:
89
+
ifndef::env-cloud[]
90
+
You must configure credentials for the AWS Glue Data Catalog integration in either of the following ways:
82
91
83
92
* Allow Redpanda to use the same `cloud_storage_*` credential properties configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach.
84
93
* If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication:
* config_ref:iceberg_rest_catalog_aws_secret_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_secret_key`], added as a secret value (see the <<update-cluster-configuration,next section>> for details)
Use your own values for the following placeholders:
137
+
+
138
+
--
139
+
- `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
140
+
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
141
+
--
107
142
endif::[]
108
143
ifdef::env-cloud[]
109
144
Use `rpk` like in the following example, or use the Cloud API to xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[update these cluster properties]. The update might take several minutes to complete.
110
145
+
111
-
To reference a secret in a cluster property, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value].
Use your own values for the following placeholders:
130
164
+
131
165
--
132
-
- `<aws-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
166
+
- `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in your config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
133
167
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
168
+
- `<glue-access-key>`: The AWS access key ID for your Glue service account.
169
+
- `<glue-secret-key-name>`: The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value].
0 commit comments