Skip to content

Commit 7c78574

Browse files
maxrjonesd-v-b
andauthored
docs: add glossary (zarr-developers#3767)
* docs: add glossary * Add glossary * Update docs/user-guide/glossary.md Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com> * Add caveat --------- Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
1 parent fccf372 commit 7c78574

File tree

3 files changed

+115
-0
lines changed

3 files changed

+115
-0
lines changed

docs/user-guide/glossary.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Glossary
2+
3+
This page defines key terms used throughout the zarr-python documentation and API.
4+
5+
## Array Structure
6+
7+
### Array
8+
9+
An N-dimensional typed array stored in a Zarr [store](#store). An array's
10+
[metadata](#metadata) defines its shape, data type, chunk layout, and codecs.
11+
12+
### Chunk
13+
14+
The fundamental unit of data in a Zarr array. An array is divided into chunks
15+
along each dimension according to the [chunk grid](#chunk-grid), which is currently
16+
part of Zarr's private API. Each chunk is independently compressed and encoded
17+
through the array's [codec](#codec) pipeline.
18+
19+
When [sharding](#shard) is used, "chunk" refers to the inner chunks within each
20+
shard, because those are the compressible units. The chunks are the smallest units
21+
that can be read independently.
22+
23+
!!! warning "Convention specific to zarr-python"
24+
The use of "chunk" to mean the inner sub-chunk within a shard is a convention
25+
adopted by zarr-python's `Array` API. In the Zarr V3 specification and in other
26+
Zarr implementations, "chunk" may refer to the top-level grid cells (which
27+
zarr-python calls "shards" when the sharding codec is used). Be aware of this
28+
distinction when working across libraries.
29+
30+
**API**: [`Array.chunks`][zarr.Array.chunks] returns the chunk shape. When
31+
sharding is used, this is the inner chunk shape.
32+
33+
### Chunk Grid
34+
35+
The partitioning of an array's elements into [chunks](#chunk). In Zarr V3, the
36+
chunk grid is defined in the array [metadata](#metadata) and determines the
37+
boundaries of each storage object.
38+
39+
When sharding is used, the chunk grid defines the [shard](#shard) boundaries,
40+
not the inner chunk boundaries. The inner chunk shape is defined within the
41+
[sharding codec](#shard).
42+
43+
**API**: The `chunk_grid` field in array metadata contains the storage-level
44+
grid.
45+
46+
### Shard
47+
48+
A storage object that contains one or more [chunks](#chunk). Sharding reduces the
49+
number of objects in a [store](#store) by grouping chunks together, which
50+
improves performance on file systems and object storage.
51+
52+
Within each shard, chunks are compressed independently and can be read
53+
individually. However, writing requires updating the full shard for consistency,
54+
making shards the unit of writing and chunks the unit of reading.
55+
56+
Sharding is implemented as a [codec](#codec) (the sharding indexed codec).
57+
When sharding is used:
58+
59+
- The [chunk grid](#chunk-grid) in metadata defines the shard boundaries
60+
- The sharding codec's `chunk_shape` defines the inner chunk size
61+
- Each shard contains `shard_shape / chunk_shape` chunks per dimension
62+
63+
**API**: [`Array.shards`][zarr.Array.shards] returns the shard shape, or `None`
64+
if sharding is not used. [`Array.chunks`][zarr.Array.chunks] returns the inner
65+
chunk shape.
66+
67+
## Storage
68+
69+
### Store
70+
71+
A key-value storage backend that holds Zarr data and metadata. Stores implement
72+
the [`zarr.abc.store.Store`][] interface. Examples include local file systems,
73+
cloud object storage (S3, GCS, Azure), zip files, and in-memory dictionaries.
74+
75+
Each [chunk](#chunk) or [shard](#shard) is stored as a single value (object or
76+
file) in the store, addressed by a key derived from its grid coordinates.
77+
78+
### Metadata
79+
80+
The JSON document (`zarr.json`) that describes an [array](#array) or group. For
81+
arrays, metadata includes the shape, data type, [chunk grid](#chunk-grid), fill
82+
value, and [codec](#codec) pipeline. Metadata is stored alongside the data in
83+
the [store](#store). Zarr-Python does not yet expose its internal metadata
84+
representation as part of its public API.
85+
86+
## Codecs
87+
88+
### Codec
89+
90+
A transformation applied to array data during reading and writing. Codecs are
91+
chained into a pipeline and come in three types:
92+
93+
- **Array-to-array**: Transforms like transpose that rearrange array elements
94+
- **Array-to-bytes**: Serialization that converts an array to a byte sequence
95+
(exactly one required)
96+
- **Bytes-to-bytes**: Compression or checksums applied to the serialized bytes
97+
98+
The [sharding indexed codec](#shard) is a special array-to-bytes codec that
99+
groups multiple [chunks](#chunk) into a single storage object.
100+
101+
## API Properties
102+
103+
The following properties are available on [`zarr.Array`][]:
104+
105+
| Property | Description |
106+
|----------|-------------|
107+
| `.chunks` | Chunk shape — the inner chunk shape when sharding is used |
108+
| `.shards` | Shard shape, or `None` if no sharding |
109+
| `.nchunks` | Total number of independently compressible units across the array |
110+
| `.cdata_shape` | Number of independently compressible units per dimension |

docs/user-guide/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ Take your skills to the next level:
3535
- **[Extending](extending.md)** - Extend functionality with custom code
3636
- **[Consolidated Metadata](consolidated_metadata.md)** - Advanced metadata management
3737

38+
## Reference
39+
40+
- **[Glossary](glossary.md)** - Definitions of key terms (chunks, shards, codecs, etc.)
41+
3842
## Need Help?
3943

4044
- Browse the [API Reference](../api/zarr/index.md) for detailed function documentation

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ nav:
2727
- user-guide/gpu.md
2828
- user-guide/consolidated_metadata.md
2929
- user-guide/experimental.md
30+
- user-guide/glossary.md
3031
- Examples:
3132
- user-guide/examples/custom_dtype.md
3233
- API Reference:

0 commit comments

Comments
 (0)