PyIceberg Multi-Cloud Integration

Pangolin's credential vending and REST support extend beyond AWS to Azure Blob Storage (ADLS Gen2) and Google Cloud Storage (GCS).

🔷 Azure ADLS Gen2

Dependencies

Install PyIceberg with the adlfs and pyarrow extras:

pip install "pyiceberg[adlfs,pyarrow]"

Configuration (Authenticated)

When using Azure, the catalog URI remains the same, but the storage properties change.

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "azure_catalog",
    **{
        "type": "rest",
        "uri": "http://localhost:8080/v1/azure_catalog",
        "token": "YOUR_JWT_TOKEN",
        
        # Enable Vending
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        
        # Optional: Direct Key Access
        # "adls.account-name": "mystorageaccount",
        # "adls.account-key": "YOUR_ACCOUNT_KEY"
    }
)

Supported Properties (PyIceberg Compatible)

PyIceberg Property	Description	Vended by Pangolin
`adls.token`	OAuth2 access token for Azure AD authentication.	✅ Yes (OAuth2 mode)
`adls.account-name`	Azure storage account name.	✅ Yes (all modes)
`adls.account-key`	Azure storage account key.	✅ Yes (account key mode)
`adls.container`	Container name within the storage account.	✅ Yes (all modes)

Note: When using Pangolin's credential vending, you don't need to provide these properties manually. Pangolin vends them automatically based on your warehouse configuration.

🔶 Google Cloud Storage (GCS)

Dependencies

Install PyIceberg with the gcsfs and pyarrow extras:

pip install "pyiceberg[gcsfs,pyarrow]"

Configuration (Authenticated)

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "gcp_catalog",
    **{
        "type": "rest",
        "uri": "http://localhost:8080/v1/gcp_catalog",
        "token": "YOUR_JWT_TOKEN",
        
        # Enable Vending
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        
        # Optional: Direct Service Account Access
        # "gcs.project-id": "my-project",
        # "gcs.service-account-key": "/path/to/key.json"
    }
)

Supported Properties (PyIceberg Compatible)

PyIceberg Property	Description	Vended by Pangolin
`gcp-oauth-token`	OAuth2 access token for GCP authentication.	✅ Yes (OAuth2 mode)
`gcp-project-id`	GCP project ID.	✅ Yes (all modes)
`gcs.service-account-key`	Path to service account JSON key (client-provided).	❌ No (client manages)

Note: When using Pangolin's credential vending, you only need to provide token for authentication. Pangolin vends gcp-oauth-token and gcp-project-id automatically.

🗄️ On-Premise & S3-Compatible (MinIO)

For testing or private cloud deployments using MinIO or other S3-compatible storage.

Configuration

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "minio",
    **{
        "type": "rest",
        "uri": "http://localhost:8080/v1/minio",
        "token": "YOUR_TOKEN",
        
        # Mandatory for MinIO
        "s3.endpoint": "http://localhost:9000",
        "s3.path-style-access": "true",
        "s3.region": "us-east-1",
    }
)

Property	Description
`s3.endpoint`	The full HTTP(S) URL of your MinIO server.
`s3.path-style-access`	Must be `true` for MinIO to use `endpoint/bucket/key` instead of `bucket.endpoint/key`.

🎛️ Hybrid Cloud Scenarios

Pangolin allows you to manage catalogs across different clouds from a single control plane. One tenant can have an S3 warehouse for analytics and a GCS warehouse for ML workloads.

Key Considerations

Signer Implementation: Ensure your Pangolin server is built with the appropriate cloud SDK features (--features azure-oauth or --features gcp-oauth).
Library Versions: Earlier versions of PyIceberg had limited support for non-S3 backends. We recommend using PyIceberg 0.7.0+ for the best multi-cloud experience.
Region Consistency: Ensure the region or location properties in your Pangolin Warehouse configuration match the physical bucket locations to minimize latency and costs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyIceberg Multi-Cloud Integration

🔷 Azure ADLS Gen2

Dependencies

Configuration (Authenticated)

Supported Properties (PyIceberg Compatible)

🔶 Google Cloud Storage (GCS)

Dependencies

Configuration (Authenticated)

Supported Properties (PyIceberg Compatible)

🗄️ On-Premise & S3-Compatible (MinIO)

Configuration

🎛️ Hybrid Cloud Scenarios

Key Considerations

FilesExpand file tree

multi_cloud.md

Latest commit

History

multi_cloud.md

File metadata and controls

PyIceberg Multi-Cloud Integration

🔷 Azure ADLS Gen2

Dependencies

Configuration (Authenticated)

Supported Properties (PyIceberg Compatible)

🔶 Google Cloud Storage (GCS)

Dependencies

Configuration (Authenticated)

Supported Properties (PyIceberg Compatible)

🗄️ On-Premise & S3-Compatible (MinIO)

Configuration

🎛️ Hybrid Cloud Scenarios

Key Considerations