redpanda-data · kbatuigas · Feb 25, 2025 · Feb 26, 2025 · Feb 26, 2025 · Feb 27, 2025
@@ -183,6 +183,7 @@
 *** xref:manage:iceberg/topic-iceberg-integration.adoc[About Iceberg Topics]
 *** xref:manage:iceberg/use-iceberg-catalogs.adoc[Use Iceberg Catalogs]
 *** xref:manage:iceberg/query-iceberg-topics.adoc[Query Iceberg Topics]
+*** xref:manage:iceberg/iceberg-topics-databricks-unity.adoc[Query Iceberg Topics with Databricks]
 *** xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[Query Iceberg Topics with Snowflake]
 ** xref:manage:schema-reg/index.adoc[Schema Registry]
 *** xref:manage:schema-reg/schema-reg-overview.adoc[Overview]

@@ -0,0 +1,160 @@
+= Query Iceberg Topics using Databricks and Unity Catalog
+:description: Add Redpanda topics as Iceberg tables that you can query in Databricks managed by Unity Catalog.
+:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
+:page-beta: true
+
+[NOTE]
+====
+include::shared:partial$enterprise-license.adoc[]
+====
+
+This guide walks you through querying Redpanda topics as Iceberg tables in Databricks, with AWS S3 as object storage and a catalog integration using https://docs.databricks.com/aws/en/data-governance/unity-catalog[Unity Catalog^]. For general information about Iceberg catalog integrations in Redpanda, see xref:manage:iceberg/use-iceberg-catalogs.adoc[].
+
+== Prerequisites
+
+* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables.
++
+You need the AWS S3 bucket URI, so you can configure it as an external location in Unity Catalog.
+* A Databricks workspace in the same region as your S3 bucket. See the https://docs.databricks.com/aws/en/resources/supported-regions#supported-regions-list[list of supported AWS regions^].
+* Unity Catalog enabled in your Databricks workspace. See the https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started[official Databricks documentation^] to set up Unity Catalog for your workspace.
+* Predictive optimization enabled.
+* External data access enabled in your metastore.
+* Workspace admin privileges to complete the steps to create a Unity Catalog storage credential and external location that connects your Tiered Storage bucket to Databricks.
+
+== Create a Unity Catalog storage credential
+
+A storage credential is a Databricks object that controls access to external object storage, in this case S3. You associate a storage credential with an AWS IAM role that defines what actions Unity Catalog can perform in the S3 bucket.
+
+Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials[official Databricks documentation^] to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you should have the following configured in AWS and Databricks:
-Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials[official Databricks documentation^] to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you should have the following configured in AWS and Databricks:
+Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials[official Databricks documentation^] to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you will have the following configured in AWS and Databricks:
-Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials[official Databricks documentation^] to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you should have the following configured in AWS and Databricks:
+Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/storage-credentials[official Databricks documentation^] to create an AWS IAM role that has the required permissions for the bucket. When you have completed these steps, you will have the following configured in AWS and Databricks:
+
+* A self-assuming IAM role, meaning you've defined the role trust policy so the role trusts itself.
+* Two IAM policies attached to the IAM role. The first policy grants Unity Catalog read and write access to the bucket. The second policy allows Unity Catalog to configure file events.
++
+
+* A storage credential in Databricks associated with the IAM role, using the role's ARN. You also use the storage credential's external ID in the role's trust relationship policy to make the role self-assuming.
+
+== Create a Unity Catalog external location
+
+The external location points to the location of the Iceberg data in your S3 bucket. 
+
+Follow the steps in the https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/external-locations[official Databricks documentation] to *manually* create an external location. You can create the external location in the Catalog Explorer, or using SQL. You must create the external location manually because the location needs to be associated with the existing Tiered Storage bucket URL, `s3://<bucket-name>`.
+
+== Create a new catalog
+
+Follow the steps in the official Databricks documentation to https://docs.databricks.com/aws/en/catalogs/create-catalog[create a standard catalog^]. When you create the catalog, specify the following as the storage location:
+
+* The external location you created in the previous step.
+* In the subpath field, enter `redpanda-iceberg-catalog`.
+
+You use the catalog name when you set the Iceberg cluster configuration properties in Redpanda in a later step.
+
+== Authorize access to Unity Catalog
+
+Redpanda recommends using OAuth for service principals to grant Redpanda access to Unity Catalog. 
+
+. Follow the steps in the https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m[official Databricks documentation] to create a service principal, and then generate an OAuth secret. You use the client ID and secret to set Iceberg cluster configuration properties in Redpanda in the next step.
+. Open your catalog in the Catalog Explorer, then click *Permissions*.
+. Click *Grant* to grant the service principal the following permissions on the catalog:
++
+* `ALL PRIVILEGES`
+* `EXTERNAL USE SCHEMA`
+
+The Iceberg integration for Redpanda also supports using bearer tokens.
+
+== Update cluster configuration
+
+To configure your Redpanda cluster to enable Iceberg on a topic and integrate with Unity Catalog:
+
+. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You can run `rpk cluster config edit` to update these properties:
++
+[,bash]
+----
+iceberg_enabled: true 
+iceberg_catalog_type: rest
+iceberg_rest_catalog_endpoint: https://<workspace-instance>/api/2.1/unity-catalog/iceberg
+iceberg_rest_catalog_authentication_mode: oauth2
+iceberg_rest_catalog_oauth2_server_uri: https://<workspace-instance>/oidc/v1/token
+iceberg_rest_catalog_oauth2_scope: all-apis
+iceberg_rest_catalog_client_id: <service-principal-client-id>
+iceberg_rest_catalog_client_secret: <service-principal-client-secret>
+iceberg_rest_catalog_warehouse: <unity-catalog-name>
+iceberg_disable_snapshot_tagging: true
+
+# Optional
+iceberg_translation_interval_ms_default: 1000
+iceberg_catalog_commit_interval_ms: 1000
+----
++
+Use your own values for the following placeholders:
++
+--
+- `<workspace-instance>`: The URL of your https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-instance-names-urls-and-ids[Databricks workspace instance^], for example, `cust-success.cloud.databricks.com`.
+- `<service-principal-client-id>`: The client ID of the service principal you created in an earlier step.
+- `<service-principal-client-secret>`: The client secret of the service principal you created in an earlier step.
+- `<unity-catalog-name>`: The name of your catalog in Unity Catalog.
+--
++
+[,bash,role=no-copy]
+----
+Successfully updated configuration. New configuration version is 2.
+----
+
+. You must restart your cluster if you change the configuration for a running cluster. 
+
+. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. This mode creates an Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:iceberg/topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes. The following examples show  how to use xref:get-started:rpk-install.adoc[`rpk`] to either create a new topic, or alter the configuration for an existing topic, to set the Iceberg mode to `key_value`.  
++
+.Create a new topic and set `redpanda.iceberg.mode`:
+[,bash]
+----
+rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
+----
++
+.Set `redpanda.iceberg.mode` for an existing topic:
+[,bash]
+----
+rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
+----
+
+. Produce to the topic. For example, 
++
+[,bash]
+----
+echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'
+----
+
+You should see the topic as a table in Unity Catalog.
+
+. In Catalog Explorer, open your catalog. You should see a `redpanda` schema, in addition to `default` and `information_schema`.
+. The `redpanda` schema and the table residing within this schema are automatically added for you. The table name is the same as the topic name. 
+
+== Query Iceberg table using Databricks SQL
+
+You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table. Review the official Databricks documentation on https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore[granting permissions to objects^] and https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges[Unity Catalog privileges^] for details.
+
+The following example shows how to query the Iceberg table using SQL in Databricks SQL.
+
+. In the Databricks console, open *SQL Editor*.
+. In the query editor, run the following:
++
+[,sql]
+----
+-- Ensure that the catalog and table name are correctly parsed in case they contain special characters
+SELECT * FROM `<catalog-name>`.redpanda.`<table-name>`;
+----
++
+Your query results should look like the following:
++
+[,sql,role="no-copy no-wrap"]
+----
+# Example for redpanda.iceberg.mode=key_value with 1 record produced to topic
++--------------------------------------------------------------------------+------------+-------------------------+
+| redpanda                                                                 | value      | redpanda.timestamp_hour |
++--------------------------------------------------------------------------+------------+-------------------------+
+| {"partition":0,"offset":"0","timestamp":"2025-04-02T18:57:11.127Z",      | 776f726c64 | 2025-04-02-18           |
+| "headers":null,"key":"68656c6c6f"}                                       |            |                         |
++--------------------------------------------------------------------------+------------+-------------------------+
+----
+
+== See also
+
+- xref:manage:iceberg/query-iceberg-topics.adoc[]