Skip to content

feat: rectilinear chunks in Zarr backend#11279

Draft
maxrjones wants to merge 17 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid
Draft

feat: rectilinear chunks in Zarr backend#11279
maxrjones wants to merge 17 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid

Conversation

@maxrjones
Copy link
Copy Markdown
Contributor

Description

This PR accompanies zarr-developers/zarr-python#3802, adding support for rectilinear zarr chunks in Xarray.

The user-facing difference between this PR and zarr-developers/zarr-python#3369 / #10880 is that rectilinear chunks are gated behind zarr.config.set({'array.rectilinear_chunks': True}) (or ZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives zarr-python developers an opportunity to gracefully finalize the API, which is especially valuable given that rectilinear chunks are the largest feature addition in zarr-python since Zarr V3/sharding.

What changed

  • _determine_zarr_chunks now passes through variable (non-uniform) chunk sizes when writing to Zarr V3 with the unified ChunkGrid API, instead of raising an error.
  • Reading correctly reconstructs chunk information from both RegularChunkGrid and RectilinearChunkGrid metadata.
  • safe_chunks and align_chunks validation is skipped for rectilinear (tuple-of-tuples) chunks, since those checks assume uniform chunk sizes.
  • Error messages for chunk validation failures now distinguish between Zarr V2 and V3 and point users toward the rectilinear chunks extension.

To-do

  • expand test coverage for error messages when using V2 or config flag is off, and a multi-dimensional test case
  • decide whether to continue silently bypassing safe_chunks/align_chunks or add validations
  • remove upstream version pin

Checklist

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR. Tools: Claude Code

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Apr 2, 2026
@headtr1ck
Copy link
Copy Markdown
Collaborator

Is this a duplicate of #10880?

@maxrjones
Copy link
Copy Markdown
Contributor Author

Is this a duplicate of #10880?

This would supersede #10880. It implements the same feature, but using a different upstream implementation (zarr-developers/zarr-python#3802), which will likely be merged into Zarr-Python in the coming days. zarr-developers/zarr-python#3802 supersedes zarr-developers/zarr-python#3369, which #10880 was built on top.

Copy link
Copy Markdown
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to look into this a bit more, but for now:

is skipped for rectilinear (tuple-of-tuples) chunks since those checks assume uniform chunk sizes.

That's what the current checks do, but their purpose is to support safely appending data without write conflicts between execution workers (dask / cubed / etc). Do we maybe need different checks that verify that zarr chunks do not overlap with multiple execution chunks?

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
pixi.toml Outdated
dask = { git = "https://github.com/dask/dask" }
distributed = { git = "https://github.com/dask/distributed" }
zarr = { git = "https://github.com/zarr-developers/zarr-python" }
zarr = { git = "https://github.com/maxrjones/zarr-python", branch = "poc/unified-chunk-grid" }
Copy link
Copy Markdown
Contributor

@dcherian dcherian Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zarr = { git = "https://github.com/maxrjones/zarr-python", branch = "poc/unified-chunk-grid" }

Now that it's on main, we can apply the run-upstream label (which i will do now)

@dcherian dcherian added the run-upstream Run upstream CI label Apr 8, 2026
eendebakpt and others added 7 commits April 8, 2026 17:31
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* add `zizmor` to the hooks

* set the default permissions to minimum

* don't persist credentials

* pin `actions/checkout`

* pin `xarray-contrib/ci-trigger`

* pin `actions/upload-artifact`

* pin `actions/download-artifact`

* pin `pypa/gh-action-pypi-publish`

* pin `actions/setup-python`

* pin `prefix-dev/setup-pixi`

* pin `codecov/codecov-action`

* pin `scientific-python/issue-from-pytest-log-action`

* pin `mamba-org/setup-micromamba`

* pin `WyriHaximus/github-action-get-previous-tag`

* pin `EnricoMi/publish-unit-test-result-action`

* pin `actions/labeler`

* pin `actions/cache`

* actions cooldown for dependabot

* avoid potential template injections

* broken condition

* ignore the `pull_request_target` warning

(because `actions/labeler` actually needs it)

* ignore zizmor's dangerous-triggers warning for publish-test-results

* fetch the `codecov` token from a github environment

* correct the pin for `setup-pixi`

* split the nightly wheels ci into build and publish jobs

* remove the codecov env and ignore the zizmor warning instead

* back to the codecov env, but disable deployments

* correct the pin for `actions/setup-python`

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>

---------

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>
This PR modifies the few places we relied on a generic `np.timedelta64` dtype to explicitly specify the time resolution:
- It removes `NAT_TYPES` and relies instead on checking the `dtype.kind` in `computation.nanops._maybe_null_out`.
- It infers the time `unit` using `np.datetime_data` from the input `dtype` to determine the `unit` on the returned `fill_value` in `core.dtypes.maybe_promote`.
- It explicitly constructs a zero-valued `np.timedelta64` or `np.datetime64` object for use downstream in `plot.utils._determine_cmap_params`.
Bumps the actions group with 2 updates: [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) and [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish).


Updates `prefix-dev/setup-pixi` from 0.9.4 to 0.9.5
- [Release notes](https://github.com/prefix-dev/setup-pixi/releases)
- [Commits](prefix-dev/setup-pixi@a0af7a2...1b2de7f)

Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@ed0c539...cef2210)

---
updated-dependencies:
- dependency-name: prefix-dev/setup-pixi
  dependency-version: 0.9.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: pypa/gh-action-pypi-publish
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…a#11282)

* remove the accidentally copied pypi publish step

* separate test execution from issue creation

* upload the artifact only if the tests failed

* correct source of the log-file path

* debug: print github output [skip-rtd]

* typo [skip-rtd]

* correct the path to the log file

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>

---------

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>
* use var._root._h5py to get h5py module in h5netcdf backend instead of importing it
* fix ros3 tests to use DANDI endpoint and include hdf5 version switch
* fix phony_dims for ros3 test
* fix import check for ros3 availability
* try using property to get around import issue
* add whats-new.rst entry
@github-actions github-actions bot added topic-indexing topic-plotting Automation Github bots, testing workflows, release automation labels Apr 8, 2026
@maxrjones maxrjones marked this pull request as draft April 9, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Automation Github bots, testing workflows, release automation io run-upstream Run upstream CI topic-backends topic-indexing topic-plotting topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants