Skip to content

Missing AWS Profile Support in PyIceberg #2841

@thomas-pfeiffer

Description

@thomas-pfeiffer

Feature Request / Improvement

Feature: Missing AWS Profile Support in PyIceberg / PyIceberg should support AWS profiles

Description:
When working with multiple AWS configs / credentials in parallel, AWS profiles are a convenient way to achieve this. Ideally, PyIceberg should therefore also support AWS profiles, which it currently does not.

Current state (as of writing - pyIceberg v0.10.0):

  • The Glue part of the GlueCatalog can be configured to use the profile by specifying the Glue client explicitly in the Glue Catalog or via glue.profile-name config parameter:
from boto3 import Session
...
catalog = GlueCatalog(name="your_glue_catalog",client=Session(profile_name="your_aws_profile").client("glue"),...)

or

catalog = GlueCatalog(
    name="your_glue_catalog",
    **{ 
        "glue.profile-name": "your_aws_profile",
        ...
    },
)
from s3fs import S3FileSystem
from aiobotocore.session import AioSession
...
fs = S3FileSystem(session=AioSession(profile="your_aws_profile"),...)

Workaround for this feature gap:

session = Session(profile_name="your_aws_profile")
credentials = session.get_credentials()  
if credentials is None:
    raise ValueError("Could not retrieve credentials for profile")
catalog = GlueCatalog(
    name="your_glue_catalog",
    **{ 
        "client.access-key-id": credentials.access_key,
        "client.secret-access-key": credentials.secret_key,
        "client.session-token": credentials.token,
        ...
    },
)

To-Be / Expected Behavior:

  1. PyIceberg should have a new client.profile-name and s3.profile-name configuration parameter (next to existing glue.profile-name.
  2. New client.profile-name should also set glue.profile-name (same behaviour as for all the other unified AWS credentials).
  3. For now, AWS profile support should be implemented for fsspec backend and client.profile-name and s3.profile-name should only be supported when using fsspec backend ("py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO").
  4. Once PyArrow supports AWS profile names (see [Python][C++] Add Profile support to S3FileSystem arrow#47880), AWS profile support should be implemented for PyArrow backend as well and client.profile-name and s3.profile-name should be fully supported.

Remark: I found this feature gap with the GlueCatalog; it might be that the RestCatalog is equally affected, but not sure.
Issues possibly related to this issue: #570, #1207, #2657

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions