Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions chunk-key-encodings/suffix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# ZEP: `suffix` Chunk Key Encoding

## Summary

This document proposes a new Zarr v3 chunk-key-encoding extension named `suffix`. This encoding appends a user-defined string (the "suffix") to the key generated by a base chunk key encoding. The primary motivation is to allow chunk keys to have file extensions (e.g., `.tiff`, `.zip`), making them directly usable by operating systems and other software that identify file types by their extension.

---

## Motivation

Modern scientific workflows often involve a variety of tools. While Zarr provides excellent chunked, N-dimensional data access, individual chunks can sometimes be valid, standalone files in other formats. A prime example is a Zarr array sharded into TIFF files. Each shard is both a chunk in the Zarr hierarchy and a complete TIFF file.

Currently, Zarr chunk keys (like `c/0/0`) lack file extensions. This prevents a user or application from easily identifying and opening these chunks with standard tools (e.g., an image viewer). To work around this, data must be duplicated or accessed exclusively through a Zarr library.

The `suffix` encoding solves this problem by adding a file extension to the chunk key. This creates a dual-access system:
1. **Zarr Access**: The data remains a fully compliant Zarr array, accessible via the Zarr protocol.
2. **Direct File Access**: The individual chunk files can be directly opened, viewed, or processed by any tool that recognizes their file extension.

This enhances interoperability and simplifies workflows that bridge Zarr and traditional file-based tools without requiring data duplication.

---

## Specification

* **Name**: `suffix`
* **Version**: `0.1`
* **Identifier**: (A unique URI to be assigned upon formal adoption)

### Configuration

The configuration for this encoding is a JSON object with one required and one optional member.

* `"suffix"`: **(Required)** A string that will be appended to the encoded chunk key.
* `"base-encoding"`: **(Optional)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended. If omitted, the store's `default` chunk key encoding is used.
Copy link
Copy Markdown
Member

@normanrz normanrz Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `"base-encoding"`: **(Optional)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended. If omitted, the store's `default` chunk key encoding is used.
* `"base_encoding"`: **(Optional)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended. If omitted, the store's `default` chunk key encoding is used.

Zarr uses snake_case in the JSON metadata.

I would make this required, because it is also required in the top-level array metadata. It helps implementations to make this explicit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, should be required.

Comment thread
mkitti marked this conversation as resolved.
Outdated

#### Example 1: Simple Suffix

This configuration appends `.tiff` to the key generated by the `default` chunk key encoding.

```json
{
"name": "suffix",
"configuration": {
"suffix": ".tiff"
}
}
```

#### Example 2: Suffix with a Custom Base Encoding

This configuration first encodes the chunk key using the `v2` naming scheme and then appends `.shard.zip`.

```json
{
"name": "suffix",
"configuration": {
"suffix": ".shard.zip",
"base-encoding": {
Comment thread
mkitti marked this conversation as resolved.
Outdated
"name": "v2"
}
}
}
```

---

## Encoding and Decoding Logic

The implementation logic is a simple wrapper around an existing chunk key encoding.

### Encoding

1. Take the chunk coordinate tuple as input (e.g., `(1, 2)`).
2. Encode the coordinates using the specified **`base-encoding`** (or the `default` encoding if not specified). This might transform `(1, 2)` into `"c/1/2"`.
3. Append the `suffix` from the configuration to the result of the base encoding.

The final key is `base_encoded_key + suffix` (e.g., `"c/1/2.tiff"`).

### Decoding

1. Take the full chunk key string as input (e.g., `"c/1/2.tiff"`).
2. Verify that the key ends with the configured `suffix`. If not, it is an invalid key for this encoding.
3. Remove the `suffix` from the end of the key string to get the base key (e.g., `"c/1/2"`).
4. Decode the remaining base key using the specified **`base-encoding`** to retrieve the original chunk coordinate tuple `(1, 2)`.
Comment thread
mkitti marked this conversation as resolved.
Outdated