Skip to content

feat: add HF Storage Bucket support via OpenDAL#3

Open
davanstrien wants to merge 99 commits into
feature/hf-bucket-sinkfrom
hf-opendal-sink
Open

feat: add HF Storage Bucket support via OpenDAL#3
davanstrien wants to merge 99 commits into
feature/hf-bucket-sinkfrom
hf-opendal-sink

Conversation

@davanstrien

Copy link
Copy Markdown
Owner

Summary

Add hf:// URL support for Polars cloud writes via OpenDAL, enabling:

df.lazy().sink_parquet("hf://buckets/org/name/data.parquet")

Approach

Follows the same pattern as existing cloud backends (S3/GCS/Azure):

  • crates/polars-io/src/cloud/hf.rs — URL parsing, token resolution, OpenDAL ObjectStore construction
  • object_store_setup.rs — 6-line CloudType::Hf match arm calling build_hf()
  • Feature flag hf propagated through workspace Cargo.toml chain

HF URLs flow through the standard FileSink — no custom sink node, no IR special-casing.

Dependencies

Using local path deps for opendal and object_store_opendal during development. Will swap to published crate versions once apache/opendal#7185 ships.

Test plan

  • cargo check -p polars-stream --features hf compiles
  • cargo check -p polars-stream no regression without feature
  • End-to-end test with real HF bucket (pending OpenDAL release)

🤖 Generated with Claude Code

Kevin-Patyk and others added 30 commits March 13, 2026 21:35
…ola-rs#26938)

Co-authored-by: gabriel <gabriel.g.robin@airbus.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: moktamd <moktamd@users.noreply.github.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
…6764)

Co-authored-by: Simon Lin <simonlin.rqmmw@slmail.me>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
MarcoGorelli and others added 21 commits March 30, 2026 14:48
…#27104)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
pola-rs#27087)

Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
Co-authored-by: Dani Pinyol <dani@avatarcognition.com>
…la-rs#27118)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
Add an ObjectStore implementation for `hf://` URLs backed by OpenDAL's
HF service, enabling `sink_parquet("hf://buckets/org/name/file.parquet")`
to stream directly to Hugging Face Storage Buckets.

The implementation follows the same pattern as existing cloud backends
(S3/GCS/Azure): a `build_hf()` function in a new `hf.rs` module
constructs the ObjectStore, and `object_store_setup.rs` calls it from
the `CloudType::Hf` match arm. HF URLs flow through the standard
FileSink path with no custom sink node or IR special-casing.

New files:
- crates/polars-io/src/cloud/hf.rs — URL parsing, token resolution,
  OpenDAL ObjectStore construction

Feature flag: `hf` (opt-in, propagated through the workspace)

Dependencies: opendal + object_store_opendal (local path deps for now,
will switch to published crate versions once apache/opendal#7185 ships)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davanstrien and others added 5 commits April 7, 2026 07:23
Point opendal and object_store_opendal at kszucs/opendal@4c70bd8
(hf-revamp branch) which uses published hf-xet 1.5.0 from crates.io.

Once apache/opendal#7185 merges and publishes, these become simple
version deps (e.g. opendal = "0.56").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Manual dispatch workflow that builds Linux x64 and ARM64 wheels
with the hf feature enabled via maturin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apache/opendal#7185 merged today — HF backend XET support is now on
upstream main. Pin to commit 8d3dbcc3ef until a release ships.

Next: once opendal publishes a release with services-hf, swap these
git deps for a version number (opendal = "0.56" or whatever ships).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apache/opendal v0.56.0 was published to crates.io on 2026-05-01
with the services-hf feature, so we no longer need to pin to a
git rev. Drop the [patch.crates-io] entries and bump the
workspace dep declarations from 0.55.0 to 0.56.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
opendal 0.56's HfBackend sits under a 5-deep stack of accessor layers
(TypeErase / Correctness / Complete / Simulate / ErrorContext), and
the dist-release profile overflows the default 128 query-depth limit
when computing async fn body layouts (e.g. HfBackend::create_dir).
Gate the bump to feature = "hf" so non-HF builds are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.