Skip to content

Commit ffaba2d

Browse files
blarghmateyCopilot
andcommitted
fix: use Duration strings for Polars object_store S3 timeout options
Polars' native Rust reader passes storage_options through to the object_store crate, which requires Duration strings (e.g. '10s', '120s') rather than plain numeric second values ('10', '120'). pyiceberg's PyArrowFileIO uses float(value) for the same keys, so the two callers need different formats. Split into separate dicts: pyiceberg_s3_properties (plain numbers for GlueCatalog) and polars_s3_storage_options (duration strings for pl.scan_iceberg). Fixes: object-store error: Generic Config error: failed to parse "120" as Duration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent fafb9d4 commit ffaba2d

1 file changed

Lines changed: 13 additions & 9 deletions

File tree

packages/ol-orchestrate-lib/src/ol_orchestrate/lib/glue_helper.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -106,22 +106,26 @@ def get_dbt_model_as_dataframe(database_name: str, table_name: str) -> pl.LazyFr
106106
KeyError: If the table metadata doesn't contain the expected fields
107107
boto3 exceptions: If the AWS Glue API call fails
108108
"""
109-
# s3.connect-timeout and s3.request-timeout apply to pyiceberg's PyArrowFileIO
110-
# (used for manifest reads). They are also mapped by Polars 1.40+ to
111-
# object_store's `connect_timeout` and `timeout` for its native Rust S3 reader,
112-
# but only when passed explicitly as storage_options to pl.scan_iceberg().
113-
# Without them, Polars' Tokio runtime accumulates CLOSE_WAIT S3 connections
114-
# that block process exit indefinitely.
115-
s3_storage_options = {
109+
# pyiceberg's PyArrowFileIO expects timeout values as plain numeric seconds strings.
110+
pyiceberg_s3_properties = {
116111
"s3.region": "us-east-1",
117112
"s3.connect-timeout": "10",
118113
"s3.request-timeout": "120",
119114
}
115+
# Polars 1.40+ maps s3.connect-timeout / s3.request-timeout to object_store's
116+
# `connect_timeout` / `timeout` for its native Rust S3 reader, preventing
117+
# CLOSE_WAIT connections from blocking Tokio runtime shutdown at process exit.
118+
# object_store requires Duration strings (e.g. "10s"), not bare integers.
119+
polars_s3_storage_options = {
120+
"s3.region": "us-east-1",
121+
"s3.connect-timeout": "10s",
122+
"s3.request-timeout": "120s",
123+
}
120124
glue = GlueCatalog(
121125
"default",
122126
client=boto3.client("glue", region_name="us-east-1"),
123-
**s3_storage_options,
127+
**pyiceberg_s3_properties,
124128
)
125129
table = glue.load_table(f"{database_name}.{table_name}")
126130

127-
return pl.scan_iceberg(table, storage_options=s3_storage_options)
131+
return pl.scan_iceberg(table, storage_options=polars_s3_storage_options)

0 commit comments

Comments
 (0)