Skip to content

Add icechunk constructor to table provider#23

Closed
kylebarron wants to merge 3 commits into
mainfrom
kyle/icechunk
Closed

Add icechunk constructor to table provider#23
kylebarron wants to merge 3 commits into
mainfrom
kyle/icechunk

Conversation

@kylebarron
Copy link
Copy Markdown
Member

@kylebarron kylebarron commented Oct 23, 2025

The Python API to extract all information out of the icechunk session would be a lot of work.

I see that there's an as_bytes undocumented method in Icechunk Session::as_bytes that looks like it's also piped through to the Python API.

@sharkinsspatial can you help me create an icechunk python session object, and check whether Session.from_bytes(session.as_bytes()) works to serde?

@sharkinsspatial
Copy link
Copy Markdown
Member

sharkinsspatial commented Oct 24, 2025

@kylebarron Running

import icechunk
import zarr
import numpy as np
import shapely
import tempfile
from zarr.dtype import VariableLengthUTF8
from zarr_datafusion_search import ZarrTable

location = tempfile.TemporaryDirectory().name 
storage = icechunk.local_filesystem_storage(location)
repo = icechunk.Repository.create(storage)
session = repo.writable_session("main")

root = zarr.open_group(session.store, mode="w", zarr_format=3)
meta = root.create_group("meta")

date_data = np.array(["2023-01-01", "2023-01-02", "2023-01-03"], dtype="datetime64[ms]")
meta.create_array("date", data=date_data)

meta.create_array(
    "collection",
    shape=(3,),
    dtype=VariableLengthUTF8(),
)
meta["collection"][...] = ["collection_a", "collection_b", "collection_c"]

bbox_data = shapely.to_wkt(
    [
        shapely.box(-10.0, -10.0, 10.0, 10.0),
        shapely.box(-20.0, -20.0, 20.0, 20.0),
        shapely.box(-30.0, -30.0, 30.0, 30.0),
    ]
)

meta.create_array(
    "bbox",
    shape=len(bbox_data),
    dtype=VariableLengthUTF8(),
)
meta["bbox"][...] = bbox_data

session.commit("First")

ZarrTable.from_icechunk(session=session, group_path="meta")

panics with

thread '<unnamed>' (3472762) panicked at src/table.rs:37:9:
not yet implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/Users/seanharkins/projects/zarr-datafusion-internal/test.py", line 57, in <module>
    ZarrTable.from_icechunk(session=session, group_path="meta")
pyo3_runtime.PanicException: not yet implemented

@kylebarron
Copy link
Copy Markdown
Member Author

Maybe I didn't make it clear enough, but it's not implemented yet.

The Python API to extract all information out of the icechunk session would be a lot of work.

I see that there's an as_bytes undocumented method in Icechunk Session::as_bytes that looks like it's also piped through to the Python API.

@sharkinsspatial can you help me create an icechunk python session object, and check whether Session.from_bytes(session.as_bytes()) works to serde?

@sharkinsspatial
Copy link
Copy Markdown
Member

😆 I was surprised by how quickly you set this up. Is this session example enough for you to get started?

@kylebarron
Copy link
Copy Markdown
Member Author

Superseded by #24

@kylebarron kylebarron closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants