|
| 1 | +# Glossary |
| 2 | + |
| 3 | +This page defines key terms used throughout the zarr-python documentation and API. |
| 4 | + |
| 5 | +## Array Structure |
| 6 | + |
| 7 | +### Array |
| 8 | + |
| 9 | +An N-dimensional typed array stored in a Zarr [store](#store). An array's |
| 10 | +[metadata](#metadata) defines its shape, data type, chunk layout, and codecs. |
| 11 | + |
| 12 | +### Chunk |
| 13 | + |
| 14 | +The fundamental unit of data in a Zarr array. An array is divided into chunks |
| 15 | +along each dimension according to the [chunk grid](#chunk-grid), which is currently |
| 16 | +part of Zarr's private API. Each chunk is independently compressed and encoded |
| 17 | +through the array's [codec](#codec) pipeline. |
| 18 | + |
| 19 | +When [sharding](#shard) is used, "chunk" refers to the inner chunks within each |
| 20 | +shard, because those are the compressible units. The chunks are the smallest units |
| 21 | +that can be read independently. |
| 22 | + |
| 23 | +!!! warning "Convention specific to zarr-python" |
| 24 | + The use of "chunk" to mean the inner sub-chunk within a shard is a convention |
| 25 | + adopted by zarr-python's `Array` API. In the Zarr V3 specification and in other |
| 26 | + Zarr implementations, "chunk" may refer to the top-level grid cells (which |
| 27 | + zarr-python calls "shards" when the sharding codec is used). Be aware of this |
| 28 | + distinction when working across libraries. |
| 29 | + |
| 30 | +**API**: [`Array.chunks`][zarr.Array.chunks] returns the chunk shape. When |
| 31 | +sharding is used, this is the inner chunk shape. |
| 32 | + |
| 33 | +### Chunk Grid |
| 34 | + |
| 35 | +The partitioning of an array's elements into [chunks](#chunk). In Zarr V3, the |
| 36 | +chunk grid is defined in the array [metadata](#metadata) and determines the |
| 37 | +boundaries of each storage object. |
| 38 | + |
| 39 | +When sharding is used, the chunk grid defines the [shard](#shard) boundaries, |
| 40 | +not the inner chunk boundaries. The inner chunk shape is defined within the |
| 41 | +[sharding codec](#shard). |
| 42 | + |
| 43 | +**API**: The `chunk_grid` field in array metadata contains the storage-level |
| 44 | +grid. |
| 45 | + |
| 46 | +### Shard |
| 47 | + |
| 48 | +A storage object that contains one or more [chunks](#chunk). Sharding reduces the |
| 49 | +number of objects in a [store](#store) by grouping chunks together, which |
| 50 | +improves performance on file systems and object storage. |
| 51 | + |
| 52 | +Within each shard, chunks are compressed independently and can be read |
| 53 | +individually. However, writing requires updating the full shard for consistency, |
| 54 | +making shards the unit of writing and chunks the unit of reading. |
| 55 | + |
| 56 | +Sharding is implemented as a [codec](#codec) (the sharding indexed codec). |
| 57 | +When sharding is used: |
| 58 | + |
| 59 | +- The [chunk grid](#chunk-grid) in metadata defines the shard boundaries |
| 60 | +- The sharding codec's `chunk_shape` defines the inner chunk size |
| 61 | +- Each shard contains `shard_shape / chunk_shape` chunks per dimension |
| 62 | + |
| 63 | +**API**: [`Array.shards`][zarr.Array.shards] returns the shard shape, or `None` |
| 64 | +if sharding is not used. [`Array.chunks`][zarr.Array.chunks] returns the inner |
| 65 | +chunk shape. |
| 66 | + |
| 67 | +## Storage |
| 68 | + |
| 69 | +### Store |
| 70 | + |
| 71 | +A key-value storage backend that holds Zarr data and metadata. Stores implement |
| 72 | +the [`zarr.abc.store.Store`][] interface. Examples include local file systems, |
| 73 | +cloud object storage (S3, GCS, Azure), zip files, and in-memory dictionaries. |
| 74 | + |
| 75 | +Each [chunk](#chunk) or [shard](#shard) is stored as a single value (object or |
| 76 | +file) in the store, addressed by a key derived from its grid coordinates. |
| 77 | + |
| 78 | +### Metadata |
| 79 | + |
| 80 | +The JSON document (`zarr.json`) that describes an [array](#array) or group. For |
| 81 | +arrays, metadata includes the shape, data type, [chunk grid](#chunk-grid), fill |
| 82 | +value, and [codec](#codec) pipeline. Metadata is stored alongside the data in |
| 83 | +the [store](#store). Zarr-Python does not yet expose its internal metadata |
| 84 | +representation as part of its public API. |
| 85 | + |
| 86 | +## Codecs |
| 87 | + |
| 88 | +### Codec |
| 89 | + |
| 90 | +A transformation applied to array data during reading and writing. Codecs are |
| 91 | +chained into a pipeline and come in three types: |
| 92 | + |
| 93 | +- **Array-to-array**: Transforms like transpose that rearrange array elements |
| 94 | +- **Array-to-bytes**: Serialization that converts an array to a byte sequence |
| 95 | + (exactly one required) |
| 96 | +- **Bytes-to-bytes**: Compression or checksums applied to the serialized bytes |
| 97 | + |
| 98 | +The [sharding indexed codec](#shard) is a special array-to-bytes codec that |
| 99 | +groups multiple [chunks](#chunk) into a single storage object. |
| 100 | + |
| 101 | +## API Properties |
| 102 | + |
| 103 | +The following properties are available on [`zarr.Array`][]: |
| 104 | + |
| 105 | +| Property | Description | |
| 106 | +|----------|-------------| |
| 107 | +| `.chunks` | Chunk shape — the inner chunk shape when sharding is used | |
| 108 | +| `.shards` | Shard shape, or `None` if no sharding | |
| 109 | +| `.nchunks` | Total number of independently compressible units across the array | |
| 110 | +| `.cdata_shape` | Number of independently compressible units per dimension | |
0 commit comments