Skip to content

Sub-chunk slicing for uncompressed ManifestArrays (part of #86)#996

Merged
TomNicholas merged 7 commits into
zarr-developers:mainfrom
TomNicholas:subchunk_uncompressed_axis0
May 18, 2026
Merged

Sub-chunk slicing for uncompressed ManifestArrays (part of #86)#996
TomNicholas merged 7 commits into
zarr-developers:mainfrom
TomNicholas:subchunk_uncompressed_axis0

Conversation

@TomNicholas

Copy link
Copy Markdown
Member

Summary

Lets you sub-divide a chunk by slicing an uncompressed ManifestArray along its largest-stride storage axis. The result is a new chunk reference with a bumped byte offset and a smaller length — no data loaded, no rechunking.

The motivating case is xarray.Dataset.isel(time=slice(...)) on a virtual dataset produced by a parser like the netCDF3 one, where a single source "chunk" covers many timesteps and the user wants to pick out a subset without splitting compressed bytes (which is impossible) or rewriting refs.

Eligible codec stacks:

  • [BytesCodec] — C-order layout, the axis is axis 0.
  • [TransposeCodec(order=...), BytesCodec] — stored layout is the logical array permuted by order. The eligible axis is order[0] of the logical array (e.g. the last axis when order=(N-1, ..., 0), i.e. F-order).

Out of scope and still raising SubChunkIndexingError:

  • Slices that span more than one source chunk along the eligible axis (would produce non-contiguous bytes or an irregular chunk grid).
  • step != 1 (would also break contiguity).
  • Sub-chunk indexing on any axis other than the eligible one (interleaved bytes).
  • Integer indexing into a multi-row chunk (chunk-aligned-only, per Support indexing by slicing along chunk boundaries? #51).
  • Any codec stack with compressors, checksums, or value-transforming array-array codecs.

Detection lives in _uncompressed_sub_chunk_axis(metadata) which returns the eligible axis or None. The byte-math is in _compute_sub_chunk_axis_selection alongside the other _compute_* helpers in indexing.py.

The dispatch is deterministic: a single if axis == sub_chunk_axis and _is_sub_chunk_slice(...) choice in apply_selection's per-axis loop, falling through to the existing chunk-aligned path otherwise.

Builds on #994 (chunk-aligned indexing). Addresses part of #86step != 1 along the eligible axis remains a follow-up.

Test plan

  • 11 new cases in TestSubChunkSlicingUncompressed: 4 C-order success cases (parametrized over 1D/2D shapes, single- and multi-chunk surviving along other axes), 3 raise cases (compressed, crossing chunk boundary, non-eligible axis), 2 F-order success cases via TransposeCodec(order=(1, 0)), 1 F-order axis-0 raise case (storage's fast axis).
  • No regressions across test_manifests/, test_xarray.py, test_integration.py, test_writers/ (344 passing).
  • mypy clean (using TypeGuard[slice] on _is_sub_chunk_slice to drop a runtime isinstance assert).
  • Pre-commit hooks clean.

Acceptance criteria

  • Closes part of #86 (leaves step != 1 along the eligible axis as a follow-up)
  • Tests added
  • Tests passing
  • No test coverage regression
  • Full type hint coverage
  • Changes are documented in docs/releases.md
  • New functionality has documentation (docs/data_structures.md, __getitem__ docstring)

When an array has only a BytesCodec (no compressors, no value transforms),
a slice that lands within a single source chunk along axis 0 can be
expressed as a rewritten byte offset and length on every surviving
chunk reference — no data loaded. Picks up part of issue zarr-developers#86: now you
can isel a single timestep from a multi-row chunk produced by, e.g.,
the netCDF3 parser without rechunking.

The dispatch is deterministic: axis 0, uncompressed, slice contained
in one source chunk, step == 1 → byte-shift path; everything else →
existing chunk-aligned path, which raises SubChunkIndexingError for
misaligned selections. step != 1 and sub-chunk indexing on other axes
remain out of scope here and still raise.
Replace the hardcoded axis-0 check with a detector that reads the codec
stack and returns the eligible axis (or None). [BytesCodec] gives C-order
with axis 0; [TransposeCodec(order=...), BytesCodec] gives whatever axis
order[0] points to in the logical array — for the reversed-order F-order
case that's the last axis. The byte stride is computed commutatively as
prod(chunks of the other axes) * itemsize, so the same formula works
for any chosen axis.

Also fix a latent no-op short-circuit: when the eligible axis happens
to have a single source chunk, its chunk-grid selector slice(0, 1, 1)
would equal slice(0, dim, 1) and skip the pending byte adjustment.
Suppress the short-circuit when a sub-chunk adjustment is queued.
The sub-chunk branch in apply_selection had inline arithmetic mixed with
the surrounding dispatch logic. Pull it out alongside the other
_compute_* helpers so apply_selection's per-axis loop reads as a
straight two-way choice between chunk-aligned and sub-chunk paths.
Removes the explicit isinstance assert at the call site: TypeGuard[slice]
tells mypy that when the function returns True, its int|slice argument
is in fact a slice. _compute_sub_chunk_axis_selection then accepts the
narrowed value directly. Pure typing-level change, no runtime effect.
@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.04%. Comparing base (abd7d0f) to head (d18b331).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
virtualizarr/manifests/indexing.py 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #996      +/-   ##
==========================================
+ Coverage   89.98%   90.04%   +0.06%     
==========================================
  Files          33       33              
  Lines        2066     2089      +23     
==========================================
+ Hits         1859     1881      +22     
- Misses        207      208       +1     
Files with missing lines Coverage Δ
virtualizarr/manifests/array.py 85.00% <ø> (ø)
virtualizarr/manifests/indexing.py 94.33% <95.83%> (+1.00%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Reads a 2D netCDF3 file, .isel's within the single source chunk on axis
0, writes to an in-memory icechunk store, reads back via xr.open_zarr,
and asserts byte-equality against a direct xr.open_dataset + .isel on
the same file. Exercises the sub-chunk byte-adjustment path through
the parser, accessor, and store layers in one go.
Three small cleanups around test_integration.py:

- Move the netcdf3_file fixture from test_parsers/conftest.py to the root
  conftest.py and turn it into a factory that accepts a Dataset, so both
  the existing parser test and the new integration test can share it.
- Replace pytest.mark.skipif(not has_icechunk, ...) with the existing
  @requires_icechunk decorator from virtualizarr.tests.
- Drop @requires_zarr_python and its definition: zarr is a hard project
  dependency, so the decorator is a no-op import-skip that misleadingly
  suggests zarr is optional.
has_tifffile, requires_tifffile = _importorskip("tifffile")
has_imagecodecs, requires_imagecodecs = _importorskip("imagecodecs")
has_hdf5plugin, requires_hdf5plugin = _importorskip("hdf5plugin")
has_zarr_python, requires_zarr_python = _importorskip("zarr")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by cleanup - zarr is a required dep so this is outdated


@requires_scipy
@requires_icechunk
def test_subchunk_slice_netcdf3_through_icechunk_roundtrip(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test tests the actual use case

@TomNicholas TomNicholas merged commit 9f90d2e into zarr-developers:main May 18, 2026
17 checks passed
@rsignell

Copy link
Copy Markdown
Collaborator

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants