You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is that the current implementation of ChunkManifest uses a clever trick: it's just 3 numpy arrays in a trenchcoat. This gives us loads of stuff for free:
Efficient contiguous in-memory representation
Efficient handling of Variable-length strings (via the numpy 2 dtype)
Efficient functions for iterating over every element
Efficient multi-dimensional concat/stack functions for merging chunk manifests
No unusual or non-python dependencies
But I don't know how to keep that design and also store non-virtual chunks in arbitrary locations within those arrays.
Some alternatives:
Numpy object array (almost certainly very inefficient)-
Generalizing the
ChunkManifestclass to hold native chunks as well as virtual refs would unlock several features.It's needed for:
IcechunkParser(Reading virtual references back out into VirtualiZarr Manifests earth-mover/icechunk#104)append_dim)ManifestStore.to_icechunk()/kerchunk(), and thereby making xarray an optional dependency (Make xarray an optional dependency? #521)The problem is that the current implementation of
ChunkManifestuses a clever trick: it's just 3 numpy arrays in a trenchcoat. This gives us loads of stuff for free:But I don't know how to keep that design and also store non-virtual chunks in arbitrary locations within those arrays.
Some alternatives: