Skip to content

Add example of Icechunk HTTPS virtual zarr, with caveat about auth #978

@maxrjones

Description

@maxrjones

I put together a recent example of using VirtualiZarr + Icechunk over HTTPS, which could be helpful for our docs. However, people were still stuck because auth does not work with Icechunk + HTTPS. We should also caveat that HTTPS auth is not yet implemented, with a link to earth-mover/icechunk#997 as the upstream tracking issue. Our team at https://github.com/NASA-IMPACT/veda-odd is evaluating whether we could contribute a fix for that in our next project increment, which starts this month. cc @edshred2000 @owenlittlejohns @abarciauskas-bgse

# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "virtualizarr",
#     "virtual-tiff",
#     "obstore",
#     "obspec-utils",
#     "icechunk",
#     "xarray",
#     "zarr",
# ]
# ///

from obstore.store import HTTPStore
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
from virtualizarr import open_virtual_dataset
import icechunk
import xarray as xr

base_url = "https://storage.googleapis.com/"
file = "solus100pub/cec7_0_cm_p.tif"
url = f"{base_url}/{file}"
print(f"URL: {url}")

store = HTTPStore.from_url(base_url)
registry = ObjectStoreRegistry({base_url: store})
parser = VirtualTIFF(ifd=4)

# Open as virtual dataset
with open_virtual_dataset(url=url, parser=parser, registry=registry) as vds:
    print("\nVirtual dataset:")
    print(vds)

    # Write to in-memory Icechunk store
    storage = icechunk.Storage.new_in_memory()
    config = icechunk.RepositoryConfig.default()

    container = icechunk.VirtualChunkContainer(
        url_prefix=base_url,
        store=icechunk.http_store(),
    )
    config.set_virtual_chunk_container(container)

    repo = icechunk.Repository.create(
        storage=storage,
        config=config,
        authorize_virtual_chunk_access={base_url: None},
    )
    session = repo.writable_session("main")

    vds.vz.to_icechunk(session.store)
    session.commit("Write virtual TIFF references")
    print("\nCommitted to Icechunk")

    # Read back from Icechunk
    read_session = repo.readonly_session("main")
    ds = xr.open_zarr(read_session.store, consolidated=False, zarr_format=3)
    print("\nRoundtripped dataset from Icechunk:")
    print(ds)

    # Load actual data
    loaded = ds.load()
    print("\nLoaded data:")
    print(loaded)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Icechunk 🧊Relates to Icechunk library / specdocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions