Skip to content

Commit 6437502

Browse files
committed
Disable resolve by default
1 parent 1e3e360 commit 6437502

File tree

4 files changed

+32
-24
lines changed

4 files changed

+32
-24
lines changed

mkdocs/docs/configuration.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -108,22 +108,23 @@ For the FileIO there are several configuration options available:
108108

109109
<!-- markdown-link-check-disable -->
110110

111-
| Key | Example | Description |
112-
|----------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
113-
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
114-
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
115-
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
116-
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
117-
| s3.role-session-name | session | An optional identifier for the assumed role session. |
118-
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
119-
| s3.signer | bearer | Configure the signature version of the FileIO. |
120-
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
121-
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
122-
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. |
123-
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
124-
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
125-
| s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. |
126-
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
111+
| Key | Example | Description |
112+
|-----------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
113+
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
114+
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
115+
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
116+
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
117+
| s3.role-session-name | session | An optional identifier for the assumed role session. |
118+
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
119+
| s3.signer | bearer | Configure the signature version of the FileIO. |
120+
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
121+
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
122+
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically tries to resolve the region if this isn't set (only supported for AWS S3 Buckets). |
123+
| s3.resolve-region | False | Only supported for `PyArrowFileIO`, when enabled, it will always try to resolve the location of the bucket (only supported for AWS S3 Buckets). |
124+
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
125+
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
126+
| s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. |
127+
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
127128

128129
<!-- markdown-link-check-enable-->
129130

pyiceberg/io/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
S3_SECRET_ACCESS_KEY = "s3.secret-access-key"
6060
S3_SESSION_TOKEN = "s3.session-token"
6161
S3_REGION = "s3.region"
62+
S3_RESOLVE_REGION = "s3.resolve-region"
6263
S3_PROXY_URI = "s3.proxy-uri"
6364
S3_CONNECT_TIMEOUT = "s3.connect-timeout"
6465
S3_REQUEST_TIMEOUT = "s3.request-timeout"

pyiceberg/io/pyarrow.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@
107107
S3_PROXY_URI,
108108
S3_REGION,
109109
S3_REQUEST_TIMEOUT,
110+
S3_RESOLVE_REGION,
110111
S3_ROLE_ARN,
111112
S3_ROLE_SESSION_NAME,
112113
S3_SECRET_ACCESS_KEY,
@@ -427,15 +428,20 @@ def _initialize_oss_fs(self) -> FileSystem:
427428
def _initialize_s3_fs(self, netloc: Optional[str]) -> FileSystem:
428429
from pyarrow.fs import S3FileSystem
429430

430-
# Resolve region from netloc(bucket), fallback to user-provided region
431431
provided_region = get_first_property_value(self.properties, S3_REGION, AWS_REGION)
432-
bucket_region = _cached_resolve_s3_region(bucket=netloc) or provided_region
433432

434-
if provided_region is not None and bucket_region != provided_region:
435-
logger.warning(
436-
f"PyArrow FileIO overriding S3 bucket region for bucket {netloc}: "
437-
f"provided region {provided_region}, actual region {bucket_region}"
438-
)
433+
# Do this when we don't provide the region at all, or when we explicitly enable it
434+
if provided_region is None or property_as_bool(self.properties, S3_RESOLVE_REGION, False) is True:
435+
# Resolve region from netloc(bucket), fallback to user-provided region
436+
# Only supported by buckets hosted by S3
437+
bucket_region = _cached_resolve_s3_region(bucket=netloc) or provided_region
438+
if provided_region is not None and bucket_region != provided_region:
439+
logger.warning(
440+
f"PyArrow FileIO overriding S3 bucket region for bucket {netloc}: "
441+
f"provided region {provided_region}, actual region {bucket_region}"
442+
)
443+
else:
444+
bucket_region = provided_region
439445

440446
client_kwargs: Dict[str, Any] = {
441447
"endpoint_override": self.properties.get(S3_ENDPOINT),

tests/io/test_pyarrow.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2285,7 +2285,7 @@ def _s3_region_map(bucket: str) -> str:
22852285
raise OSError("Unknown bucket")
22862286

22872287
# For a pyarrow io instance with configured default s3 region
2288-
pyarrow_file_io = PyArrowFileIO({"s3.region": user_provided_region})
2288+
pyarrow_file_io = PyArrowFileIO({"s3.region": user_provided_region, "s3.resolve-region": "true"})
22892289
with patch("pyarrow.fs.resolve_s3_region") as mock_s3_region_resolver:
22902290
mock_s3_region_resolver.side_effect = _s3_region_map
22912291

0 commit comments

Comments
 (0)