Skip to content

feat(datafusion): add try_new_with_runtime to route catalog IO off the caller's runtime#20

Merged
toutane merged 1 commit into
branch-0.9from
charlesantoine.leger/feat-scan-on-io-runtime
May 12, 2026
Merged

feat(datafusion): add try_new_with_runtime to route catalog IO off the caller's runtime#20
toutane merged 1 commit into
branch-0.9from
charlesantoine.leger/feat-scan-on-io-runtime

Conversation

@toutane
Copy link
Copy Markdown

@toutane toutane commented May 12, 2026

When DataFusion runs with a CPU/IO split runtime, catalog IO (load_table, plan_files, manifest fetches) runs on the CPU runtime by default, which can cause blocking or starvation.

Changes

Adds try_new_with_runtime(…, runtime: Option<Runtime>) to IcebergTableProvider, IcebergSchemaProvider, and IcebergCatalogProvider.

When a Runtime is provided, all IO-bound work is dispatched to runtime.io() via a new run_on_io helper:

  • scan: load_table + plan_files + manifest fetches
  • insert_into: load_table

The existing try_new signatures are unchanged (they delegate to try_new_with_runtime(…, None)), so this is fully backward-compatible.

…gTableProvider

Adds `try_new_with_runtime` to `IcebergTableProvider`, `IcebergSchemaProvider`,
and `IcebergCatalogProvider`. When a `Runtime` is supplied, the IO-bound work
in `scan` (catalog reload, `plan_files`, manifest fetches) and `insert_into`
(catalog reload) is spawned on `runtime.io()` via a new `run_on_io` helper,
so the calling runtime — typically DataFusion's CPU runtime in a split setup —
only awaits a join handle. The existing `try_new` signatures are unchanged.
@toutane toutane changed the title feat(datafusion): IcebergTableProvider scan runs plan_files on IO feat(datafusion): add try_new_with_runtime to route catalog IO off the caller's runtime May 12, 2026
@toutane toutane marked this pull request as ready for review May 12, 2026 08:48
@toutane toutane merged commit 7c4e63d into branch-0.9 May 12, 2026
2 checks passed
@toutane toutane deleted the charlesantoine.leger/feat-scan-on-io-runtime branch May 12, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants