Commit 88ed0ac
perf: cache lexicographic chunk coords in sharding codec (zarr-developers#4012)
* perf: cache lexicographic chunk coords in sharding codec
The subchunk_write_order feature (zarr-developers#3826) regressed sharded write
performance: _encode_partial_single rebuilt the full per-shard chunk
coordinate grid on every write via
`np.array(list(_subchunk_order_iter(..., "lexicographic")))`, and
`to_dict_vectorized` rebuilt a tuple key per row with `tuple(coords.ravel())`.
For a single-chunk write into a shard with tens of thousands of chunks this
roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk,
matching the -44% CodSpeed regression).
Add cached `_lexicographic_order` (array) and `_lexicographic_order_keys`
(tuples) helpers in indexing.py, mirroring `_morton_order`/`_morton_order_keys`,
and pass the cached keys into `to_dict_vectorized` instead of deriving them
row-by-row. This restores write throughput to the pre-zarr-developers#3826 baseline while
preserving identical chunk ordering (verified equal to np.ndindex across
shapes including 0-d and empty).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(deps): bump the actions group across 1 directory with 8 updates (zarr-developers#176)
Bumps the actions group with 8 updates in the / directory:
| Package | From | To |
| --- | --- | --- |
| [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` |
| [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` |
| [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` |
| [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` |
| [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` |
| [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` |
| [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` |
| [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` |
Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6
- [Release notes](https://github.com/prefix-dev/setup-pixi/releases)
- [Commits](prefix-dev/setup-pixi@1b2de7f...5185adf)
Updates `codecov/codecov-action` from 6.0.0 to 6.0.1
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@57e3a13...e79a696)
Updates `github/issue-metrics` from 4.2.2 to 4.2.7
- [Release notes](https://github.com/github/issue-metrics/releases)
- [Commits](github-community-projects/issue-metrics@c9e9838...1e38d5e)
Updates `j178/prek-action` from 2.0.3 to 2.0.4
- [Release notes](https://github.com/j178/prek-action/releases)
- [Commits](j178/prek-action@6ad8027...bdca6f1)
Updates `actions/upload-artifact` from 7.0.0 to 7.0.1
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v7...043fb46)
Updates `actions/download-artifact` from 7.0.0 to 8.0.1
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v7...3e5f45b)
Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@v1.13.0...cef2210)
Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6
- [Release notes](https://github.com/zizmorcore/zizmor-action/releases)
- [Commits](zizmorcore/zizmor-action@b1d7e1f...5f14fd0)
---
updated-dependencies:
- dependency-name: prefix-dev/setup-pixi
dependency-version: 0.9.6
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
- dependency-name: codecov/codecov-action
dependency-version: 6.0.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
- dependency-name: github/issue-metrics
dependency-version: 4.2.7
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
- dependency-name: j178/prek-action
dependency-version: 2.0.4
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
- dependency-name: actions/upload-artifact
dependency-version: 7.0.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
- dependency-name: actions/download-artifact
dependency-version: 8.0.1
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: actions
- dependency-name: pypa/gh-action-pypi-publish
dependency-version: 1.14.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: actions
- dependency-name: zizmorcore/zizmor-action
dependency-version: 0.5.6
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: actions
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* refactor(sharding): derive coords inside to_dict_vectorized
Address review feedback: `_ShardReader.to_dict_vectorized` took the
lexicographic coordinate array and key tuples as parameters, even though
the reader already knows its own `chunks_per_shard` and both structures
are `lru_cache`d. Thread nothing in — fetch them inside the method via
`_lexicographic_order`/`_lexicographic_order_keys`. Same cache, so no
perf change; the call site collapses to `to_dict_vectorized()`.
Add a unit test covering the method directly across 0-d, 1-d, and 2-d
shard grids: present chunks map to their stored bytes, empty chunks to
None, and every lexicographic coordinate appears as a key.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Update src/zarr/core/indexing.py
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
* refactor(sharding): drop redundant lexicographic_order_iter
Address review feedback from @ilan-gold and @chuckwondo on the
`lexicographic_order_iter` helper.
`lexicographic_order_iter` returned a *lazy* iterator over an
*eagerly-built, cached* tuple (`_lexicographic_order_keys`), which
chuckwondo rightly flagged as confusing — and its output is byte-for-byte
identical to the pre-existing, genuinely-lazy `c_order_iter`
(verified across 0-d, empty, and N-d shapes). So the name promised
laziness the implementation didn't provide, over a sequence we could
already produce.
Remove the wrapper and use the cached `_lexicographic_order_keys` tuple
directly at the two `dict.fromkeys` call sites and in
`_subchunk_order_iter`. This keeps the eager/cached coordinate tuples —
which is the actual optimization: `dict.fromkeys` over the cached tuple
is ~1.4x faster than over lazy `c_order_iter` at 32^3 (≈900us vs ≈1300us),
because the cache amortizes tuple construction across repeated writes to
same-shaped shards. Switching to `c_order_iter` would have reintroduced
that cost, so it is deliberately not used here.
Also drop the now-dead `tuple()` wrap in `morton_order_iter` (its
argument is typed `tuple[int, ...]` and every caller passes one), per
ilan-gold.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(indexing): prefer lexicographic_order_iter, soft-deprecate c_order_iter
`c_order_iter` names a memory layout ("C order") rather than what the
iterator actually yields. Reintroduce `lexicographic_order_iter` as the
clearer name for the same row-major coordinate sequence, and make
`c_order_iter` a thin alias that delegates to it, with a docstring note
steering new code to the preferred name. No runtime warning — these are
internal helpers.
`lexicographic_order_iter` keeps the eager/cached implementation
(iter over the lru_cached `_lexicographic_order_keys` tuple), which is
~1.4x faster than the old lazy `itertools.product` on the
`dict.fromkeys` shard-write path and is the optimization this branch
exists to deliver. The alias therefore changes `c_order_iter` from lazy
to eager/cached; all in-repo callers (_ShardReader.__iter__,
_is_total_shard, _subchunk_order_iter, and two tests) are migrated to
`lexicographic_order_iter`, so nothing in-tree relies on the old laziness.
Output is unchanged: lexicographic_order_iter, the c_order_iter alias,
and np.ndindex all agree across 0-d, empty, and N-d shapes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(indexing): make lexicographic_order_iter the lazy primitive
Per review from @mkitti: invert the relationship between the lazy
iterator and the eagerly-collected tuple. `lexicographic_order_iter` is
now a genuine lazy generator over the chunk-grid coordinates, and
`_lexicographic_order_keys` collects it into a cached tuple — the eager
version is "collect the lazy one", not the other way around.
Previously lexicographic_order_iter returned iter() over the cached
tuple, so any consumer that only needed a prefix still paid to
materialize the entire grid. _is_total_shard does exactly that — an
early-exit `all(coord in set for coord in ...)` — and on a cold cache
for a 32^3 shard whose first coordinate is absent this dropped from
~15.8ms to ~24us (the lazy generator builds one coordinate and bails).
The hot path is unchanged: the two dict.fromkeys sites consume the full
grid and use the cached `_lexicographic_order_keys` tuple directly
(~0.9ms at 32^3), so the regression fix this branch delivers is intact.
This also resolves @chuckwondo's point — the iterator is now actually
lazy rather than a thin wrapper over eager data.
Co-authored-by: Mark Kittisopikul <mkitti@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(indexing): make morton_order_iter the lazy primitive too
Per @mkitti: the morton pair was backwards in the same way the
lexicographic pair was. Invert it to match — `morton_order_iter` is now
the lazy generator primitive and `_morton_order_keys` collects it into a
cached tuple, mirroring `lexicographic_order_iter` /
`_lexicographic_order_keys`.
No behavioral change for the in-tree consumers (all fully consume the
sequence) and the Z-order is identical; this keeps the two coordinate-
order families symmetric and gives morton the same lazy/early-exit
option lexicographic now has.
Co-authored-by: Mark Kittisopikul <mkitti@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(indexing): expose chunk-order coordinates as cached sequences
Replace the morton/lexicographic order iterators (and the c_order_iter
alias) with two cached, numpy-backed sequences:
`morton_order_coords(shape)` and `lexicographic_order_coords(shape)`,
each returning the grid coordinates in that order as a tuple of
coordinate tuples.
This addresses several points from review:
- The earlier "lazy primitive" inversion de-optimized the hot write
path: `morton_order_iter` rebuilt every coordinate tuple from the
array on each call, and that path runs in `_encode_shard_dict` on
every shard write (~16ms/write at 32^3 chunks-per-shard). The coords
are a finite set of known length reused in full, so they are an
indexable sequence built once and cached, not a lazily-rebuilt
generator. (per @mkitti)
- `lexicographic_order_iter` was never genuinely lazy — `_lexicographic_order`
materializes the whole `np.indices` grid up front — so the early-exit
framing was inaccurate. (per @Copilot, @chuckwondo)
- Two functions differing only in caching vs laziness was redundant
(per @ilan-gold); there is now one sequence per order. `_ShardReader.__iter__`
wraps it in `iter()`, the only site that needs an iterator.
- `_is_total_shard` no longer iterates the order at all: `all_chunk_coords`
is always a subset of the shard grid (guaranteed by `validate`'s
shard/chunk divisibility check), so a count check proves totality. A
subset assertion documents the invariant.
Coordinates are Python int tuples because every consumer uses them as
dict keys / set members, which numpy arrays cannot be (unhashable,
mutable); the numpy array is kept only for the vectorized index lookup
in `to_dict_vectorized`. The per-shape cache holds ~prod(chunks_per_shard)
tuples (~0.07% of shard size for multi-GB shards with (64,64,64) chunks),
capped at 16 shapes per order.
Co-authored-by: Mark Kittisopikul <mkitti@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(bench): add warm-cache shard-write benchmark
The existing test_sharded_morton_write_single_chunk clears the chunk-order
cache before every iteration, so it only measures the cold grid-build cost.
That made it blind to a regression where the per-shard coordinate tuples
were rebuilt on every write instead of being reused from the cache — the
cold benchmark could not distinguish the two (both pay the build each
iteration).
Add test_sharded_morton_write_single_chunk_warm_cache, which warms the
cache once and then times repeated same-shape writes — the amortized
regime the cache exists to optimize (many shards of one shape per array).
Verified it discriminates: with the cached sequence it is ~4x faster than
the cold benchmark, and a rebuild-every-write regression shows up as a ~4x
slowdown here while staying invisible to the cold benchmark.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: update changelog for full-shard write coverage
The fix caches the per-shard coordinate grid for every shard write, not
only partial writes, and the win is amortized across repeated writes to
same-shaped shards. Reword the note accordingly; keep it user-facing
(the internal indexing helper refactor is not part of the public API).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* perf: build order-coord tuples via .tolist(); document dual representation
`morton_order_coords` / `lexicographic_order_coords` built their tuple-of-
tuples with a row-by-row `tuple(int(x) for x in row)` comprehension. Using
`map(tuple, arr.tolist())` instead does the int conversion in a single
C-level call, producing byte-identical native-int tuples ~8-9x faster
(~16ms -> ~1.9ms cold build at 32^3). It is a per-shape cached build, so
this only speeds the first write to each shard shape, but it is free.
Also document in `to_dict_vectorized` why the chunk coordinates are needed
in two forms — a numpy array for the vectorized index lookup and hashable
tuples for the dict keys — since numpy rows are unhashable and a tuple list
can't be used for the vectorized modulo/advanced-indexing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* perf,test: address code-review findings in the sharding coord cache
- Drop the O(n_chunks) assert in _is_total_shard. It built a fresh
set(lexicographic_order_coords(...)) on every partial read/write to
check an invariant `validate` already guarantees, regressing the very
partial-access hot path this PR optimizes (~673us vs ~112ns at 32^3
chunks-per-shard) and vanishing under -O. The invariant is documented
in the comment; the count check alone proves totality.
- Cache the colexicographic subchunk order. The colex branch of
_subchunk_order_iter rebuilt the grid via uncached np.ndindex on every
write while its morton/lexicographic siblings hit the cache; add
colexicographic_order_coords (cached, derived from lexicographic_order_coords
of the reversed shape) and use it.
- Fix two benchmark docstrings: the cold benchmark now clears the
lexicographic caches too (the write path builds that grid via
dict.fromkeys / to_dict_vectorized, so a morton-only clear left it warm
and under-reported the cold cost); the warm benchmark docstring now
describes what it actually exercises (repeated writes to one shard,
which reuse the cache identically to writes across same-shaped shards).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Mark Kittisopikul <mkitti@users.noreply.github.com>1 parent 97d781b commit 88ed0ac
6 files changed
Lines changed: 229 additions & 69 deletions
File tree
- changes
- src/zarr
- codecs
- core
- tests
- benchmarks
- test_codecs
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
| 50 | + | |
50 | 51 | | |
51 | | - | |
| 52 | + | |
| 53 | + | |
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
| |||
261 | 263 | | |
262 | 264 | | |
263 | 265 | | |
264 | | - | |
| 266 | + | |
265 | 267 | | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
| 268 | + | |
270 | 269 | | |
271 | 270 | | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
280 | 280 | | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
281 | 294 | | |
282 | 295 | | |
283 | 296 | | |
284 | | - | |
| 297 | + | |
285 | 298 | | |
286 | | - | |
| 299 | + | |
287 | 300 | | |
288 | | - | |
| 301 | + | |
289 | 302 | | |
290 | 303 | | |
291 | 304 | | |
| |||
533 | 546 | | |
534 | 547 | | |
535 | 548 | | |
| 549 | + | |
536 | 550 | | |
537 | 551 | | |
538 | | - | |
| 552 | + | |
539 | 553 | | |
540 | | - | |
| 554 | + | |
541 | 555 | | |
542 | | - | |
| 556 | + | |
543 | 557 | | |
544 | 558 | | |
545 | 559 | | |
| |||
565 | 579 | | |
566 | 580 | | |
567 | 581 | | |
568 | | - | |
569 | | - | |
570 | | - | |
| 582 | + | |
571 | 583 | | |
572 | 584 | | |
573 | 585 | | |
| |||
610 | 622 | | |
611 | 623 | | |
612 | 624 | | |
613 | | - | |
614 | | - | |
| 625 | + | |
615 | 626 | | |
616 | 627 | | |
617 | 628 | | |
618 | 629 | | |
619 | 630 | | |
620 | 631 | | |
621 | 632 | | |
622 | | - | |
623 | | - | |
624 | | - | |
625 | | - | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
626 | 637 | | |
627 | 638 | | |
628 | 639 | | |
| |||
692 | 703 | | |
693 | 704 | | |
694 | 705 | | |
695 | | - | |
696 | | - | |
697 | | - | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
698 | 713 | | |
699 | 714 | | |
700 | 715 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1521 | 1521 | | |
1522 | 1522 | | |
1523 | 1523 | | |
1524 | | - | |
1525 | | - | |
1526 | | - | |
| 1524 | + | |
| 1525 | + | |
| 1526 | + | |
1527 | 1527 | | |
1528 | 1528 | | |
1529 | 1529 | | |
1530 | 1530 | | |
1531 | 1531 | | |
1532 | 1532 | | |
1533 | | - | |
| 1533 | + | |
1534 | 1534 | | |
1535 | 1535 | | |
1536 | | - | |
| 1536 | + | |
1537 | 1537 | | |
1538 | 1538 | | |
1539 | 1539 | | |
| |||
1544 | 1544 | | |
1545 | 1545 | | |
1546 | 1546 | | |
1547 | | - | |
1548 | | - | |
| 1547 | + | |
| 1548 | + | |
1549 | 1549 | | |
1550 | 1550 | | |
1551 | 1551 | | |
| |||
1554 | 1554 | | |
1555 | 1555 | | |
1556 | 1556 | | |
1557 | | - | |
| 1557 | + | |
1558 | 1558 | | |
1559 | 1559 | | |
1560 | 1560 | | |
1561 | | - | |
| 1561 | + | |
1562 | 1562 | | |
1563 | 1563 | | |
1564 | 1564 | | |
| |||
1576 | 1576 | | |
1577 | 1577 | | |
1578 | 1578 | | |
1579 | | - | |
1580 | | - | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
1581 | 1589 | | |
1582 | 1590 | | |
1583 | | - | |
1584 | | - | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
1585 | 1607 | | |
1586 | 1608 | | |
1587 | | - | |
1588 | | - | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
| 1621 | + | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
1589 | 1629 | | |
1590 | 1630 | | |
1591 | 1631 | | |
| |||
0 commit comments