Skip to content

Commit 382a15b

Browse files
authored
AWS profile support to glue and fsspec s3 fileio (#2948)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> Closes #2841 # Rationale for this change This PR adds explicit AWS profile support for both the Glue catalog client and fsspec-based S3 FileIO. While `GlueCatalog` already supports profile configuration, fsspec-based S3 operations did not propagate profile selection to the underlying `S3FileSystem` or async AWS session. As a result, users had to rely on environment variables or the default AWS profile, which makes it difficult to work with multiple AWS configurations in parallel. This change introduces two configuration properties: - `client.profile-name`: a unified AWS profile for the catalog client and FileIO - `s3.profile-name`: an AWS profile specifically for S3 FileIO Profile resolution follows this precedence: 1. `s3.profile-name` 2. `client.profile-name` This ensures consistent and explicit credential selection across catalog and FileIO layers when using the fsspec backend. ## Are these changes tested? Yes. New unit tests were added to validate the profile propagation behavior. - **Glue Catalog** - Verifies that `boto3.Session(profile_name=...)` is created when initializing `GlueCatalog` with `client.profile-name`. - **S3 FileIO (fsspec)** - Verifies that `client.profile-name` or `s3.profile-name` results in the creation of an async AWS session with the correct profile, which is then passed to `S3FileSystem`. The tests were run locally with: ```bash pytest tests/catalog/test_glue_profile.py tests/io/test_fsspec_profile.py ``` Output would be: ``` ==================== test session starts ===================== platform darwin -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0 rootdir: ${ROOTDIR}/iceberg-python configfile: pyproject.toml plugins: anyio-4.2.0, lazy-fixture-0.6.3, requests-mock-1.12.1 collected 3 items tests/catalog/test_glue_profile.py . [ 33%] tests/io/test_fsspec_profile.py .. [100%] ===================== 3 passed in 1.02s ====================== ``` ## Are there any user-facing changes? Yes, this adds new configuration properties that users can set: - `client.profile-name`: Sets the AWS profile for both the catalog client and FileIO (unified configuration). - `s3.profile-name`: Sets the AWS profile specifically for S3 FileIO. **Example Usage:** ```python catalog = GlueCatalog( "my_catalog", **{ "client.profile-name": "my-aws-profile", # ... other config } )
1 parent 57fc3f1 commit 382a15b

File tree

6 files changed

+192
-4
lines changed

6 files changed

+192
-4
lines changed

mkdocs/docs/configuration.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ For the FileIO there are several configuration options available:
115115
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
116116
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
117117
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
118+
| s3.profile-name | default | Configure the AWS profile used to access the S3 FileIO. |
118119
| s3.role-session-name | session | An optional identifier for the assumed role session. |
119120
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
120121
| s3.signer | bearer | Configure the signature version of the FileIO. |
@@ -720,7 +721,7 @@ catalog:
720721
| glue.id | 111111111111 | Configure the 12-digit ID of the Glue Catalog |
721722
| glue.skip-archive | true | Configure whether to skip the archival of older table versions. Default to true |
722723
| glue.endpoint | <https://glue.us-east-1.amazonaws.com> | Configure an alternative endpoint of the Glue service for GlueCatalog to access |
723-
| glue.profile-name | default | Configure the static profile used to access the Glue Catalog |
724+
| glue.profile-name | default | Configure the AWS profile used to access the Glue Catalog |
724725
| glue.region | us-east-1 | Set the region of the Glue Catalog |
725726
| glue.access-key-id | admin | Configure the static access key id used to access the Glue Catalog |
726727
| glue.secret-access-key | password | Configure the static secret access key used to access the Glue Catalog |
@@ -826,6 +827,7 @@ configures the AWS credentials for both Glue Catalog and S3 FileIO.
826827
| client.access-key-id | admin | Configure the static access key id used to access both the Glue/DynamoDB Catalog and the S3 FileIO |
827828
| client.secret-access-key | password | Configure the static secret access key used to access both the Glue/DynamoDB Catalog and the S3 FileIO |
828829
| client.session-token | AQoDYXdzEJr... | Configure the static session token used to access both the Glue/DynamoDB Catalog and the S3 FileIO |
830+
| client.profile-name | default | Configure the AWS profile used to access both the Glue/DynamoDB Catalog and the S3 FileIO |
829831
| client.role-session-name | session | An optional identifier for the assumed role session. |
830832
| client.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
831833

pyiceberg/catalog/glue.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
NoSuchTableError,
4949
TableAlreadyExistsError,
5050
)
51-
from pyiceberg.io import AWS_ACCESS_KEY_ID, AWS_REGION, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
51+
from pyiceberg.io import AWS_ACCESS_KEY_ID, AWS_PROFILE_NAME, AWS_REGION, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
5252
from pyiceberg.partitioning import UNPARTITIONED_PARTITION_SPEC, PartitionSpec
5353
from pyiceberg.schema import Schema, SchemaVisitor, visit
5454
from pyiceberg.serializers import FromInputFile
@@ -329,7 +329,7 @@ def __init__(self, name: str, client: Optional["GlueClient"] = None, **propertie
329329
retry_mode_prop_value = get_first_property_value(properties, GLUE_RETRY_MODE)
330330

331331
session = boto3.Session(
332-
profile_name=properties.get(GLUE_PROFILE_NAME),
332+
profile_name=get_first_property_value(properties, GLUE_PROFILE_NAME, AWS_PROFILE_NAME),
333333
region_name=get_first_property_value(properties, GLUE_REGION, AWS_REGION),
334334
botocore_session=properties.get(BOTOCORE_SESSION),
335335
aws_access_key_id=get_first_property_value(properties, GLUE_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID),

pyiceberg/io/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,14 @@
4141

4242
logger = logging.getLogger(__name__)
4343

44+
AWS_PROFILE_NAME = "client.profile-name"
4445
AWS_REGION = "client.region"
4546
AWS_ACCESS_KEY_ID = "client.access-key-id"
4647
AWS_SECRET_ACCESS_KEY = "client.secret-access-key"
4748
AWS_SESSION_TOKEN = "client.session-token"
4849
AWS_ROLE_ARN = "client.role-arn"
4950
AWS_ROLE_SESSION_NAME = "client.role-session-name"
51+
S3_PROFILE_NAME = "s3.profile-name"
5052
S3_ANONYMOUS = "s3.anonymous"
5153
S3_ENDPOINT = "s3.endpoint"
5254
S3_ACCESS_KEY_ID = "s3.access-key-id"

pyiceberg/io/fsspec.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
ADLS_TENANT_ID,
5252
ADLS_TOKEN,
5353
AWS_ACCESS_KEY_ID,
54+
AWS_PROFILE_NAME,
5455
AWS_REGION,
5556
AWS_SECRET_ACCESS_KEY,
5657
AWS_SESSION_TOKEN,
@@ -71,6 +72,7 @@
7172
S3_CONNECT_TIMEOUT,
7273
S3_ENDPOINT,
7374
S3_FORCE_VIRTUAL_ADDRESSING,
75+
S3_PROFILE_NAME,
7476
S3_PROXY_URI,
7577
S3_REGION,
7678
S3_REQUEST_TIMEOUT,
@@ -205,7 +207,16 @@ def _s3(properties: Properties) -> AbstractFileSystem:
205207
else:
206208
anon = False
207209

208-
fs = S3FileSystem(anon=anon, client_kwargs=client_kwargs, config_kwargs=config_kwargs)
210+
s3_fs_kwargs = {
211+
"anon": anon,
212+
"client_kwargs": client_kwargs,
213+
"config_kwargs": config_kwargs,
214+
}
215+
216+
if profile_name := get_first_property_value(properties, S3_PROFILE_NAME, AWS_PROFILE_NAME):
217+
s3_fs_kwargs["profile"] = profile_name
218+
219+
fs = S3FileSystem(**s3_fs_kwargs)
209220

210221
for event_name, event_function in register_events.items():
211222
fs.s3.meta.events.unregister(event_name, unique_id=1925)

tests/catalog/test_glue_profile.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
from unittest import mock
19+
20+
from moto import mock_aws
21+
22+
from pyiceberg.catalog.glue import GlueCatalog
23+
from pyiceberg.typedef import Properties
24+
from tests.conftest import UNIFIED_AWS_SESSION_PROPERTIES
25+
26+
27+
@mock_aws
28+
def test_passing_client_profile_name_properties_to_glue() -> None:
29+
session_properties: Properties = {
30+
"client.profile-name": "profile_name",
31+
**UNIFIED_AWS_SESSION_PROPERTIES,
32+
}
33+
34+
with mock.patch("boto3.Session") as mock_session:
35+
test_catalog = GlueCatalog("glue", **session_properties)
36+
37+
mock_session.assert_called_with(
38+
aws_access_key_id="client.access-key-id",
39+
aws_secret_access_key="client.secret-access-key",
40+
aws_session_token="client.session-token",
41+
region_name="client.region",
42+
profile_name="profile_name",
43+
botocore_session=None,
44+
)
45+
assert test_catalog.glue is mock_session().client()
46+
47+
48+
@mock_aws
49+
def test_glue_profile_precedence() -> None:
50+
session_properties: Properties = {
51+
"glue.profile-name": "glue-profile",
52+
"client.profile-name": "client-profile",
53+
**UNIFIED_AWS_SESSION_PROPERTIES,
54+
}
55+
56+
with mock.patch("boto3.Session") as mock_session:
57+
test_catalog = GlueCatalog("glue", **session_properties)
58+
59+
mock_session.assert_called_with(
60+
aws_access_key_id="client.access-key-id",
61+
aws_secret_access_key="client.secret-access-key",
62+
aws_session_token="client.session-token",
63+
region_name="client.region",
64+
profile_name="glue-profile",
65+
botocore_session=None,
66+
)
67+
assert test_catalog.glue is mock_session().client()

tests/io/test_fsspec_profile.py

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
19+
import uuid
20+
from unittest import mock
21+
22+
from pyiceberg.io.fsspec import FsspecFileIO
23+
from pyiceberg.typedef import Properties
24+
from tests.conftest import UNIFIED_AWS_SESSION_PROPERTIES
25+
26+
27+
def test_fsspec_s3_session_properties_with_profile() -> None:
28+
session_properties: Properties = {
29+
"s3.profile-name": "test-profile",
30+
"s3.endpoint": "http://localhost:9000",
31+
**UNIFIED_AWS_SESSION_PROPERTIES,
32+
}
33+
34+
with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
35+
s3_fileio = FsspecFileIO(properties=session_properties)
36+
filename = str(uuid.uuid4())
37+
38+
s3_fileio.new_input(location=f"s3://warehouse/{filename}")
39+
40+
mock_s3fs.assert_called_with(
41+
anon=False,
42+
client_kwargs={
43+
"endpoint_url": "http://localhost:9000",
44+
"aws_access_key_id": "client.access-key-id",
45+
"aws_secret_access_key": "client.secret-access-key",
46+
"region_name": "client.region",
47+
"aws_session_token": "client.session-token",
48+
},
49+
config_kwargs={},
50+
profile="test-profile",
51+
)
52+
53+
54+
def test_fsspec_s3_session_properties_with_client_profile() -> None:
55+
session_properties: Properties = {
56+
"client.profile-name": "test-profile",
57+
"s3.endpoint": "http://localhost:9000",
58+
**UNIFIED_AWS_SESSION_PROPERTIES,
59+
}
60+
61+
with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
62+
s3_fileio = FsspecFileIO(properties=session_properties)
63+
filename = str(uuid.uuid4())
64+
65+
s3_fileio.new_input(location=f"s3://warehouse/{filename}")
66+
67+
mock_s3fs.assert_called_with(
68+
anon=False,
69+
client_kwargs={
70+
"endpoint_url": "http://localhost:9000",
71+
"aws_access_key_id": "client.access-key-id",
72+
"aws_secret_access_key": "client.secret-access-key",
73+
"region_name": "client.region",
74+
"aws_session_token": "client.session-token",
75+
},
76+
config_kwargs={},
77+
profile="test-profile",
78+
)
79+
80+
81+
def test_fsspec_s3_session_properties_with_s3_and_client_profile() -> None:
82+
session_properties: Properties = {
83+
"s3.profile-name": "s3-profile",
84+
"client.profile-name": "client-profile",
85+
"s3.endpoint": "http://localhost:9000",
86+
**UNIFIED_AWS_SESSION_PROPERTIES,
87+
}
88+
89+
with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
90+
s3_fileio = FsspecFileIO(properties=session_properties)
91+
filename = str(uuid.uuid4())
92+
93+
s3_fileio.new_input(location=f"s3://warehouse/{filename}")
94+
95+
mock_s3fs.assert_called_with(
96+
anon=False,
97+
client_kwargs={
98+
"endpoint_url": "http://localhost:9000",
99+
"aws_access_key_id": "client.access-key-id",
100+
"aws_secret_access_key": "client.secret-access-key",
101+
"region_name": "client.region",
102+
"aws_session_token": "client.session-token",
103+
},
104+
config_kwargs={},
105+
profile="s3-profile",
106+
)

0 commit comments

Comments
 (0)