Skip to content

Add optional chunk caching to get() function#296

Closed
aganders3 wants to merge 8 commits into
manzt:mainfrom
aganders3:chunk-caching-get
Closed

Add optional chunk caching to get() function#296
aganders3 wants to merge 8 commits into
manzt:mainfrom
aganders3:chunk-caching-get

Conversation

@aganders3
Copy link
Copy Markdown
Contributor

@aganders3 aganders3 commented Aug 13, 2025

Summary

This adds optional chunk caching to the get() function to avoid repeated decompression of chunks when accessing overlapping selections or making multiple calls to the same data. I've noticed this can be a significant bottleneck when indexing data slice-by-slice, for example, where the chunks span many slices.

This may be a niche need, so no worries if this is out of scope for this library!

New API

  • cache?: ChunkCache in GetOptions
  • ChunkCache interface with get(key: string): Chunk<DataType> | undefined and set(key: string, value: Chunk<DataType>): any (meant to be compatible with simple Map)

Usage

  // Use built-in Map as cache
  const cache = new Map();
  const result = await get(array, selection, { cache });

  // Custom LRU cache
  const cache = new LRUCache({ max: 100 });
  const result = await get(array, selection, { cache });

  // No cache (default - unchanged behavior)
  const result = await get(array, selection);

Implementation

  • Cache keys use store_N:${array.path}:${chunkKey} format
  • Store isolation: WeakMap assigns unique IDs to store instances to prevent cache collisions when sharing caches across stores (perhaps not recommended)
  • Single cache can hold chunks from arrays with different data types

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Aug 13, 2025

🦋 Changeset detected

Latest commit: 7fe18d9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
zarrita Minor
@zarrita/ndarray Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@aganders3
Copy link
Copy Markdown
Contributor Author

@manzt - any thoughts on this? We can work around the issue but some kind of support in Zarrita itself would make it cleaner. zarr-python added caching recently so we could also consider modeling a solution closer to that. This PR is just the simplest thing I could think of when I first ran into issues.

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented Dec 17, 2025

out of scope but please let us know how we can improve our cache on the zarr-python side! it's still flagged as experimental, and I would love the kind of feedback / design input one gets from another implementation

@manzt
Copy link
Copy Markdown
Owner

manzt commented Dec 18, 2025

Thanks for the PR and the ping! I've been thinking about this.

My hesitation is that this adds caching at the indexing level (decompressed chunks), whereas I've generally thought of caching as a store-level concern. The vizarr approach (lru-store.ts) wraps the store itself, which keeps get() simple.

You could even do decompression in a store wrapper - but then we'd need to expose the codec pipeline somehow, which opens a different can of worms.

Is decompression specifically the bottleneck? I'd be curious to explore that first since my assumption is that I/O latency is usually the limiting factor, so a store-level LRU (caching raw bytes) would help. That'd help me understand if this needs to live in get() or could be separate.

out of scope but please let us know how we can improve our cache on the zarr-python side.

Would be curious to see/understand how other implementations are thinking about this use case.

@aganders3
Copy link
Copy Markdown
Contributor Author

In my testing decompression does seem to be a bottleneck. Here is a gist with my benchmark in case it helps.
Screenshot 2025-12-19 at 12 02 39 PM

I think this is for two reasons:

  1. the browser cache is effective enough for us to cache raw bytes (compressed chunks), though I could understand wanting more control over this as well
  2. with a fetch store, the requests are async but decompression is blocking

So far we've worked around this by using getChunk directly and moving it to a WebWorker. However we'd like to have more control to manage data in our application layer as "logical chunks" with fixed size, uncoupled from the chunk size in the source zarr. I think this will amount to a re-implementation of much of zarr.get to avoid redundant decompression.

If these logical chunks are smaller than the source chunk size, it may decompress the same source chunk for each sub-chunk. As another example, we were using zarr.get to fetch complete z-slices of an image, but the source data had a z chunk size >> 1.

Again I totally understand if this is not an intended use case!

My other attempt was to create a wrapper around the Array, but this felt like a clunky hack (also please excuse that I am still somewhat new to JS/TS).

@manzt
Copy link
Copy Markdown
Owner

manzt commented Apr 8, 2026

Hey @aganders3, sorry for the slow reply. I've been mulling this over.

Your benchmarks are convincing that decompression is a bottleneck. But I don't think get() is the right level to add caching.. it's the indexing/selection layer, and threading a cache through it makes things harder to compose.

All zarr.get() really needs from an array is getChunk. So if you proxy getChunk with some caching behavior, everything else just works:

import type { Array, Chunk, DataType, Readable } from "zarrita";

function cached<D extends DataType, S extends Readable>(
  arr: Array<D, S>,
  maxSize = 256,
): Array<D, S> {
  let cache = new Map<string, Chunk<D>>();
  return new Proxy(arr, {
    get(target, prop, receiver) {
      if (prop === "getChunk") {
        return async (
          chunk_coords: number[],
          ...rest: unknown[]
        ): Promise<Chunk<D>> => {
          let key = chunk_coords.join("/");
          let hit = cache.get(key);
          if (hit) {
            cache.delete(key);
            cache.set(key, hit);
            return hit;
          }
          let chunk = await target.getChunk(chunk_coords, ...rest);
          cache.set(key, chunk);
          while (cache.size > maxSize) {
            cache.delete(cache.keys().next().value!);
          }
          return chunk;
        };
      }
      return Reflect.get(target, prop, receiver);
    },
  });
}

Then usage is just:

let arr = cached(await zarr.open(store, { kind: "array" }));
let a = await zarr.get(arr, [zarr.slice(0, 1), null]);
let b = await zarr.get(arr, [zarr.slice(1, 2), null]); // reuses cached chunks

Hopefully that covers your use case. I'm going to close this out, but ping me if it doesn't. I do think a byte-level store cache is within scope for the library (similar to zarr-python's CacheStore) — see #349. Thanks for the PR and the thorough benchmarks!

@manzt manzt closed this Apr 8, 2026
@aganders3
Copy link
Copy Markdown
Contributor Author

Thanks - no worries on closing. I appreciate the help here and I agree threading the cache through each call to get is not ergonomic.

I think what you propose covers my use case, and seems closer to what I was initially trying to do with this attempt but cleaner and more concise.

It looks like it will be even better once #384 is landed, so I will keep an eye out for that.

@aganders3 aganders3 deleted the chunk-caching-get branch April 9, 2026 14:32
manzt added a commit that referenced this pull request Apr 13, 2026
`AsyncReadable<Options>` existed so stores could receive arbitrary
per-call state. Auth headers, presigning context, cancellation signals,
even chunk-layer concerns like caching and prefetch priority (#296,
vole-core's `wrapArray`).

It was a catch-all for extensions that had nowhere else to live, and the
cost was a pile of type magic: higher-kinded-type encoding in the store
middleware system (#384), threading generic types that didn't actually
provide that much type safety. (TypeScript could often bail out to `any`
when inference broke.)

Those extensions now have proper homes. Store middleware (#384) gives
transport-layer concerns (auth, presigning, request transformation) a
proper extension point, and the custom `fetch` option on `FetchStore`
(#388) handles the per-store cases at the callsite. Chunk-layer concerns
will move to `zarr.extendArray` in a follow-up.

What's left is `signal`, which now lives properly on the (non-generic)
`AsyncReadable` interface and can be passed directly in `zarr.get` and
`zarr.set` from the caller:

    // Before
    interface AsyncReadable<Options = unknown> {
      get(key: AbsolutePath, opts?: Options): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { opts: { signal: ctl.signal } });

    // After
    interface AsyncReadable {
      get(key: AbsolutePath, opts?: { signal?: AbortSignal }): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { signal: ctl.signal });

Batched caller signals in `withRangeBatching` are now merged with
`AbortSignal.any` instead of a user-supplied `mergeOptions` reducer. The
deprecated `opts?: { signal? }` shape still works for one major version
and is folded into the new `signal` via `AbortSignal.any`.
manzt added a commit that referenced this pull request Apr 13, 2026
`AsyncReadable<Options>` existed so stores could receive arbitrary
per-call state. Auth headers, presigning context, cancellation signals,
even chunk-layer concerns like caching and prefetch priority (#296,
vole-core's `wrapArray`).

It was a catch-all for extensions that had nowhere else to live, and the
cost was a pile of type magic: higher-kinded-type encoding in the store
middleware system (#384), threading generic types that didn't actually
provide that much type safety. (TypeScript could often bail out to `any`
when inference broke.)

Those extensions now have proper homes. Store middleware (#384) gives
transport-layer concerns (auth, presigning, request transformation) a
proper extension point, and the custom `fetch` option on `FetchStore`
(#388) handles the per-store cases at the callsite. Chunk-layer concerns
will move to `zarr.extendArray` in a follow-up.

What's left is `signal`, which now lives properly on the (non-generic)
`AsyncReadable` interface and can be passed directly in `zarr.get` and
`zarr.set` from the caller:

    // Before
    interface AsyncReadable<Options = unknown> {
      get(key: AbsolutePath, opts?: Options): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { opts: { signal: ctl.signal } });

    // After
    interface AsyncReadable {
      get(key: AbsolutePath, opts?: { signal?: AbortSignal }): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { signal: ctl.signal });

Batched caller signals in `withRangeBatching` are now merged with
`AbortSignal.any` instead of a user-supplied `mergeOptions` reducer. The
deprecated `opts?: { signal? }` shape still works for one major version
and is folded into the new `signal` via `AbortSignal.any`.
manzt added a commit that referenced this pull request Apr 13, 2026
`AsyncReadable<Options>` existed so stores could receive arbitrary
per-call state. Auth headers, presigning context, cancellation signals,
even chunk-layer concerns like caching and prefetch priority (#296,
vole-core's `wrapArray`).

It was a catch-all for extensions that had nowhere else to live, and the
cost was a pile of type magic: higher-kinded-type encoding in the store
middleware system (#384), threading generic types that didn't actually
provide that much type safety. (TypeScript could often bail out to `any`
when inference broke.)

Those extensions now have proper homes. Store middleware (#384) gives
transport-layer concerns (auth, presigning, request transformation) a
proper extension point, and the custom `fetch` option on `FetchStore`
(#388) handles the per-store cases at the callsite. Chunk-layer concerns
will move to `zarr.extendArray` in a follow-up.

What's left is `signal`, which now lives properly on the (non-generic)
`AsyncReadable` interface and can be passed directly in `zarr.get` and
`zarr.set` from the caller:

    // Before
    interface AsyncReadable<Options = unknown> {
      get(key: AbsolutePath, opts?: Options): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { opts: { signal: ctl.signal } });

    // After
    interface AsyncReadable {
      get(key: AbsolutePath, opts?: { signal?: AbortSignal }): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { signal: ctl.signal });

Batched caller signals in `withRangeBatching` are now merged with
`AbortSignal.any` instead of a user-supplied `mergeOptions` reducer. The
deprecated `opts?: { signal? }` shape still works for one major version
and is folded into the new `signal` via `AbortSignal.any`.
manzt added a commit that referenced this pull request Apr 13, 2026
* Drop `Options` generic and thread `signal` directly

`AsyncReadable<Options>` existed so stores could receive arbitrary
per-call state. Auth headers, presigning context, cancellation signals,
even chunk-layer concerns like caching and prefetch priority (#296,
vole-core's `wrapArray`).

It was a catch-all for extensions that had nowhere else to live, and the
cost was a pile of type magic: higher-kinded-type encoding in the store
middleware system (#384), threading generic types that didn't actually
provide that much type safety. (TypeScript could often bail out to `any`
when inference broke.)

Those extensions now have proper homes. Store middleware (#384) gives
transport-layer concerns (auth, presigning, request transformation) a
proper extension point, and the custom `fetch` option on `FetchStore`
(#388) handles the per-store cases at the callsite. Chunk-layer concerns
will move to `zarr.extendArray` in a follow-up.

What's left is `signal`, which now lives properly on the (non-generic)
`AsyncReadable` interface and can be passed directly in `zarr.get` and
`zarr.set` from the caller:

    // Before
    interface AsyncReadable<Options = unknown> {
      get(key: AbsolutePath, opts?: Options): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { opts: { signal: ctl.signal } });

    // After
    interface AsyncReadable {
      get(key: AbsolutePath, opts?: { signal?: AbortSignal }): Promise<Uint8Array | undefined>;
    }
    await zarr.get(arr, null, { signal: ctl.signal });

Batched caller signals in `withRangeBatching` are now merged with
`AbortSignal.any` instead of a user-supplied `mergeOptions` reducer. The
deprecated `opts?: { signal? }` shape still works for one major version
and is folded into the new `signal` via `AbortSignal.any`.

* Add regression tests for signal propagation

Covers the three paths that previously only worked incidentally through
the `Options` generic: `zarr.get()` with a first-class `signal`,
propagation through the sharded chunk getter (#306), and the deprecated
`opts.signal` shim.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants