Sub-chunk slicing for uncompressed ManifestArrays (part of #86) by TomNicholas · Pull Request #996 · zarr-developers/VirtualiZarr

TomNicholas · 2026-05-18T18:12:43Z

Summary

Lets you sub-divide a chunk by slicing an uncompressed ManifestArray along its largest-stride storage axis. The result is a new chunk reference with a bumped byte offset and a smaller length — no data loaded, no rechunking.

The motivating case is xarray.Dataset.isel(time=slice(...)) on a virtual dataset produced by a parser like the netCDF3 one, where a single source "chunk" covers many timesteps and the user wants to pick out a subset without splitting compressed bytes (which is impossible) or rewriting refs.

Eligible codec stacks:

[BytesCodec] — C-order layout, the axis is axis 0.
[TransposeCodec(order=...), BytesCodec] — stored layout is the logical array permuted by order. The eligible axis is order[0] of the logical array (e.g. the last axis when order=(N-1, ..., 0), i.e. F-order).

Out of scope and still raising SubChunkIndexingError:

Slices that span more than one source chunk along the eligible axis (would produce non-contiguous bytes or an irregular chunk grid).
step != 1 (would also break contiguity).
Sub-chunk indexing on any axis other than the eligible one (interleaved bytes).
Integer indexing into a multi-row chunk (chunk-aligned-only, per Support indexing by slicing along chunk boundaries? #51).
Any codec stack with compressors, checksums, or value-transforming array-array codecs.

Detection lives in _uncompressed_sub_chunk_axis(metadata) which returns the eligible axis or None. The byte-math is in _compute_sub_chunk_axis_selection alongside the other _compute_* helpers in indexing.py.

The dispatch is deterministic: a single if axis == sub_chunk_axis and _is_sub_chunk_slice(...) choice in apply_selection's per-axis loop, falling through to the existing chunk-aligned path otherwise.

Builds on #994 (chunk-aligned indexing). Addresses part of #86 — step != 1 along the eligible axis remains a follow-up.

Test plan

11 new cases in TestSubChunkSlicingUncompressed: 4 C-order success cases (parametrized over 1D/2D shapes, single- and multi-chunk surviving along other axes), 3 raise cases (compressed, crossing chunk boundary, non-eligible axis), 2 F-order success cases via TransposeCodec(order=(1, 0)), 1 F-order axis-0 raise case (storage's fast axis).
No regressions across test_manifests/, test_xarray.py, test_integration.py, test_writers/ (344 passing).
mypy clean (using TypeGuard[slice] on _is_sub_chunk_slice to drop a runtime isinstance assert).
Pre-commit hooks clean.

Acceptance criteria

Closes part of #86 (leaves step != 1 along the eligible axis as a follow-up)
Tests added
Tests passing
No test coverage regression
Full type hint coverage
Changes are documented in docs/releases.md
New functionality has documentation (docs/data_structures.md, __getitem__ docstring)

When an array has only a BytesCodec (no compressors, no value transforms), a slice that lands within a single source chunk along axis 0 can be expressed as a rewritten byte offset and length on every surviving chunk reference — no data loaded. Picks up part of issue zarr-developers#86: now you can isel a single timestep from a multi-row chunk produced by, e.g., the netCDF3 parser without rechunking. The dispatch is deterministic: axis 0, uncompressed, slice contained in one source chunk, step == 1 → byte-shift path; everything else → existing chunk-aligned path, which raises SubChunkIndexingError for misaligned selections. step != 1 and sub-chunk indexing on other axes remain out of scope here and still raise.

Replace the hardcoded axis-0 check with a detector that reads the codec stack and returns the eligible axis (or None). [BytesCodec] gives C-order with axis 0; [TransposeCodec(order=...), BytesCodec] gives whatever axis order[0] points to in the logical array — for the reversed-order F-order case that's the last axis. The byte stride is computed commutatively as prod(chunks of the other axes) * itemsize, so the same formula works for any chosen axis. Also fix a latent no-op short-circuit: when the eligible axis happens to have a single source chunk, its chunk-grid selector slice(0, 1, 1) would equal slice(0, dim, 1) and skip the pending byte adjustment. Suppress the short-circuit when a sub-chunk adjustment is queued.

The sub-chunk branch in apply_selection had inline arithmetic mixed with the surrounding dispatch logic. Pull it out alongside the other _compute_* helpers so apply_selection's per-axis loop reads as a straight two-way choice between chunk-aligned and sub-chunk paths.

Removes the explicit isinstance assert at the call site: TypeGuard[slice] tells mypy that when the function returns True, its int|slice argument is in fact a slice. _compute_sub_chunk_axis_selection then accepts the narrowed value directly. Pure typing-level change, no runtime effect.

codecov · 2026-05-18T18:16:05Z

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.04%. Comparing base (abd7d0f) to head (d18b331).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
virtualizarr/manifests/indexing.py	95.83%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #996      +/-   ##
==========================================
+ Coverage   89.98%   90.04%   +0.06%     
==========================================
  Files          33       33              
  Lines        2066     2089      +23     
==========================================
+ Hits         1859     1881      +22     
- Misses        207      208       +1

Files with missing lines	Coverage Δ
virtualizarr/manifests/array.py	`85.00% <ø> (ø)`
virtualizarr/manifests/indexing.py	`94.33% <95.83%> (+1.00%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Reads a 2D netCDF3 file, .isel's within the single source chunk on axis 0, writes to an in-memory icechunk store, reads back via xr.open_zarr, and asserts byte-equality against a direct xr.open_dataset + .isel on the same file. Exercises the sub-chunk byte-adjustment path through the parser, accessor, and store layers in one go.

Three small cleanups around test_integration.py: - Move the netcdf3_file fixture from test_parsers/conftest.py to the root conftest.py and turn it into a factory that accepts a Dataset, so both the existing parser test and the new integration test can share it. - Replace pytest.mark.skipif(not has_icechunk, ...) with the existing @requires_icechunk decorator from virtualizarr.tests. - Drop @requires_zarr_python and its definition: zarr is a hard project dependency, so the decorator is a no-op import-skip that misleadingly suggests zarr is optional.

TomNicholas · 2026-05-18T18:52:31Z

 has_tifffile, requires_tifffile = _importorskip("tifffile")
 has_imagecodecs, requires_imagecodecs = _importorskip("imagecodecs")
 has_hdf5plugin, requires_hdf5plugin = _importorskip("hdf5plugin")
-has_zarr_python, requires_zarr_python = _importorskip("zarr")


Drive-by cleanup - zarr is a required dep so this is outdated

TomNicholas · 2026-05-18T18:52:54Z

+
+@requires_scipy
+@requires_icechunk
+def test_subchunk_slice_netcdf3_through_icechunk_roundtrip(


This integration test tests the actual use case

rsignell · 2026-05-18T20:03:10Z

🚀

TomNicholas added 4 commits May 18, 2026 13:31

TomNicholas temporarily deployed to test-release May 18, 2026 18:13 — with GitHub Actions Inactive

TomNicholas added 2 commits May 18, 2026 14:41

TomNicholas temporarily deployed to test-release May 18, 2026 18:51 — with GitHub Actions Inactive

TomNicholas commented May 18, 2026

View reviewed changes

Merge branch 'main' into subchunk_uncompressed_axis0

d18b331

TomNicholas temporarily deployed to test-release May 18, 2026 19:47 — with GitHub Actions Inactive

TomNicholas merged commit 9f90d2e into zarr-developers:main May 18, 2026
17 checks passed

TomNicholas mentioned this pull request May 18, 2026

Arbitrary chunking of uncompressed files (e.g. netCDF3) #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-chunk slicing for uncompressed ManifestArrays (part of #86)#996

Sub-chunk slicing for uncompressed ManifestArrays (part of #86)#996
TomNicholas merged 7 commits into
zarr-developers:mainfrom
TomNicholas:subchunk_uncompressed_axis0

TomNicholas commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

TomNicholas May 18, 2026

Uh oh!

TomNicholas May 18, 2026

Uh oh!

Uh oh!

rsignell commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TomNicholas commented May 18, 2026

Summary

Test plan

Acceptance criteria

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomNicholas May 18, 2026

Choose a reason for hiding this comment

Uh oh!

TomNicholas May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rsignell commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 18, 2026 •

edited

Loading