Skip to content

Commit 551f524

Browse files
Fix read from multiple s3 regions (#1453)
* Take netloc into account for s3 filesystem when calling `_initialize_fs` * Fix unit test for s3 fileystem * Update ArrowScan to use different FileSystem per file * Add unit test for `PyArrorFileIO.fs_by_scheme` cache behavior * Add error handling * Update tests/io/test_pyarrow.py Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> * Update `s3.region` document and a test case * Add test case for `PyArrowFileIO.new_input` multi region * Shuffle code location for better maintainability * Comment for future integration test * Typo fix * Document wording * Add warning when the bucket region for a file cannot be resolved (for `pyarrow.S3FileSystem`) * Fix code linting * Update mkdocs/docs/configuration.md Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> * Code refactoring * Unit test * Code refactoring * Test cases * Code format * Code tidy-up * Update pyiceberg/io/pyarrow.py Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> --------- Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
1 parent e5bfa1e commit 551f524

4 files changed

Lines changed: 273 additions & 94 deletions

File tree

mkdocs/docs/configuration.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -102,21 +102,21 @@ For the FileIO there are several configuration options available:
102102

103103
<!-- markdown-link-check-disable -->
104104

105-
| Key | Example | Description |
106-
|----------------------|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
107-
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
108-
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
109-
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
110-
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
111-
| s3.role-session-name | session | An optional identifier for the assumed role session. |
112-
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
113-
| s3.signer | bearer | Configure the signature version of the FileIO. |
114-
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
115-
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
116-
| s3.region | us-west-2 | Sets the region of the bucket |
117-
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
118-
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
119-
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
105+
| Key | Example | Description |
106+
|----------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
107+
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
108+
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
109+
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
110+
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
111+
| s3.role-session-name | session | An optional identifier for the assumed role session. |
112+
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
113+
| s3.signer | bearer | Configure the signature version of the FileIO. |
114+
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
115+
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
116+
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. |
117+
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
118+
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
119+
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
120120

121121
<!-- markdown-link-check-enable-->
122122

0 commit comments

Comments
 (0)