fix: add S3 timeouts to GlueCatalog to prevent PyIceberg thread pool hangs

blarghmatey · Copilot · blarghmatey · commit 656275ec0d8d · 2026-05-20T11:56:43.000-04:00
PyIceberg's ExecutorFactory creates a singleton ThreadPoolExecutor that
uses executor.map() to read Iceberg manifest files from S3 in parallel
during plan_files() — which is invoked even by Polars' native Iceberg
reader path at collect() time.

When an S3 connection enters CLOSE_WAIT state (server closed connection,
client not yet), the PyArrow S3FileSystem read blocks indefinitely because
no request timeout is configured. This causes executor.map() to never
return, collect() to never complete, and the Dagster step to hang forever
holding a concurrency slot.

Setting s3.connect-timeout and s3.request-timeout on the GlueCatalog
causes PyArrowFileIO to pass these values to pyarrow.fs.S3FileSystem,
bounding stuck S3 reads to 120 seconds. Affected steps will now fail
with a recoverable TimeoutError rather than hanging indefinitely.

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/packages/ol-orchestrate-lib/src/ol_orchestrate/lib/glue_helper.py b/packages/ol-orchestrate-lib/src/ol_orchestrate/lib/glue_helper.py
@@ -106,7 +106,14 @@ def get_dbt_model_as_dataframe(database_name: str, table_name: str) -> pl.LazyFr
         KeyError: If the table metadata doesn't contain the expected fields
         boto3 exceptions: If the AWS Glue API call fails
     """
-    glue = GlueCatalog("default", client=boto3.client("glue", region_name="us-east-1"))
+    glue = GlueCatalog(
+        "default",
+        client=boto3.client("glue", region_name="us-east-1"),
+        **{
+            "s3.connect-timeout": "10",
+            "s3.request-timeout": "120",
+        },
+    )
     table = glue.load_table(f"{database_name}.{table_name}")
 
     return table.to_polars()