Skip to content

Commit c635cac

Browse files
fix: classify Zarr v3 codecs so structural codecs don't hide viewers
The v3 path treated every codecs[].name as a compression codec, so the always-present bytes serialization codec made viewers with a real codec allowlist (e.g. Neuroglancer) falsely incompatible. Add classifyCodec and compare only compression codecs against compression_codecs; ignore structural codecs; warn (not error) on unrecognized codecs.
1 parent ecd670c commit c635cac

7 files changed

Lines changed: 218 additions & 10 deletions

File tree

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ The checks performed, in order:
205205
| Check | Metadata field | Manifest field | Level | Result if mismatch |
206206
| --- | --- | --- | --- | --- |
207207
| OME-Zarr version | `version` or `multiscales[0].version` | `ome_zarr_versions` | **Compatibility** | **Error** — viewer cannot load this version |
208-
| Compression codec | `compressor.id` | `compression_codecs` | **Compatibility** | **Error** if codec not listed; **Warning** if viewer declares no codecs (unknown support) |
208+
| Compression codec | `compressor.id` (Zarr v2) or compression codecs in `codecs[]` (Zarr v3) | `compression_codecs` | **Compatibility** | **Error** if codec not listed; **Warning** if viewer declares no codecs (unknown support) |
209209
| Axes metadata | `axes` | `axes` | **Support** | **Warning** — axis names/units may be ignored |
210210
| Channel support | `axes` contains c/channel | `channels` | **Support** | **Warning** — multi-channel data may not render correctly |
211211
| Timepoint support | `axes` contains t/time | `timepoints` | **Support** | **Warning** — time-series data may not render correctly |
@@ -216,6 +216,8 @@ The checks performed, in order:
216216
| Translation offsets | `multiscales[].datasets[].coordinateTransformations` type `translation` | `translation` | **Support** | **Warning** — coordinate offsets may be ignored |
217217
| bioformats2raw layout | `bioformats2raw_layout` | `bioformats2raw_layout` | **Support** | **Warning** — layout may not be traversed correctly |
218218

219+
> **Note on Zarr v3 codecs:** A Zarr v3 array declares an ordered codec pipeline (`codecs[]`) containing array_to_array transforms (e.g. `transpose`), an array_to_bytes serialization codec (`bytes`, `sharding_indexed` — always present), and bytes_to_bytes codecs (compression such as `blosc`/`zstd`, plus checksums such as `crc32c`). Only the *compression* codecs are compared against `compression_codecs`; serialization, transform, and checksum codecs are ignored. An unrecognized codec produces a **Warning** (compatibility unknown) rather than an error, so a novel codec never silently hides a viewer. See `classifyCodec`.
220+
219221
> **Note on `rfcs_supported`:** Although `rfcs_supported` is a hard compatibility requirement (it determines whether a viewer can parse RFC-mandated metadata structures), no validation check is currently implemented. OME-NGFF metadata does not yet expose which RFCs a dataset requires — this is a spec-level gap. When the spec defines a `rfcs_required` field, the validator will compare it against `viewer.capabilities.rfcs_supported` and produce an error on mismatch.
220222

221223
A viewer is considered **data-compatible** (`dataCompatible: true`) when there are zero errors — it should be shown to the user. `dataFeaturesSupported` is `false` when there are warnings, indicating the viewer can open the data but may not display all features.

src/codecs.test.ts

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import { describe, it, expect } from 'vitest';
2+
import { classifyCodec } from './codecs.js';
3+
4+
describe('classifyCodec', () => {
5+
describe('compression codecs', () => {
6+
it.each(['blosc', 'gzip', 'zstd', 'lz4', 'lzma', 'zlib'])(
7+
'classifies %s as compression',
8+
(name) => {
9+
expect(classifyCodec(name)).toBe('compression');
10+
}
11+
);
12+
13+
it('classifies numcodecs-namespaced compressors as compression', () => {
14+
expect(classifyCodec('numcodecs.blosc')).toBe('compression');
15+
expect(classifyCodec('numcodecs.zstd')).toBe('compression');
16+
});
17+
});
18+
19+
describe('structural codecs', () => {
20+
it.each(['bytes', 'endian', 'transpose', 'sharding_indexed', 'crc32c'])(
21+
'classifies %s as structural',
22+
(name) => {
23+
expect(classifyCodec(name)).toBe('structural');
24+
}
25+
);
26+
});
27+
28+
describe('unknown codecs', () => {
29+
it('classifies an unrecognized codec name as unknown', () => {
30+
expect(classifyCodec('some-future-codec')).toBe('unknown');
31+
});
32+
});
33+
});

src/codecs.ts

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
/**
2+
* Zarr codec classification, dependency-free so it stays portable alongside the
3+
* canonical OME-Zarr metadata types in `omezarr.ts`.
4+
*
5+
* A Zarr v3 array declares an ordered codec pipeline (`codecs[]`) made up of
6+
* three transform kinds (per the Zarr v3 spec):
7+
* - array_to_array e.g. transpose
8+
* - array_to_bytes serialization; exactly one required, e.g. bytes,
9+
* sharding_indexed
10+
* - bytes_to_bytes e.g. blosc, gzip, zstd (compression) and crc32c (checksum)
11+
*
12+
* Only *compression* codecs determine whether a viewer can read the data: a
13+
* viewer that cannot decompress the bytes cannot open the dataset. Serialization
14+
* codecs, transposes, and checksums are not compression and must not be compared
15+
* against a viewer's declared `compression_codecs` (doing so is the original
16+
* false-incompatible bug — every v3 array carries a `bytes` serialization codec).
17+
*
18+
* Zarr v2 is simpler: a single `compressor.id` that is always an actual
19+
* compression codec, so it does not need classification.
20+
*/
21+
22+
/** Classification of a codec for compression-compatibility purposes. */
23+
export type CodecCompression = "compression" | "structural" | "unknown";
24+
25+
/**
26+
* Codecs that compress data. A viewer must be able to decompress these to read
27+
* the dataset, so they are compared against the viewer's `compression_codecs`.
28+
* Both the bare Zarr v3 names and the `numcodecs.*`-namespaced variants emitted
29+
* by some writers are recognized.
30+
*/
31+
const COMPRESSION_CODECS = new Set([
32+
"blosc",
33+
"gzip",
34+
"zstd",
35+
"lz4",
36+
"lzma",
37+
"zlib",
38+
"numcodecs.blosc",
39+
"numcodecs.gzip",
40+
"numcodecs.zstd",
41+
"numcodecs.lz4",
42+
"numcodecs.lzma",
43+
"numcodecs.zlib",
44+
]);
45+
46+
/**
47+
* Codecs that do not compress data: array_to_array transforms (transpose,
48+
* bitround), array_to_bytes serialization codecs (bytes, sharding_indexed,
49+
* vlen-utf8, json2), and bytes_to_bytes checksums (crc32c). A viewer does not
50+
* need to declare support for these to read compressed data, so they are not
51+
* compared against `compression_codecs`. Kept in sync with zarrita's codec
52+
* registry by codecs.zarrita.test.ts.
53+
*/
54+
const STRUCTURAL_CODECS = new Set([
55+
"bytes",
56+
"endian", // legacy name for the array_to_bytes codec
57+
"transpose",
58+
"sharding_indexed",
59+
"crc32c",
60+
"vlen-utf8",
61+
"json2",
62+
"bitround",
63+
]);
64+
65+
/**
66+
* Classify a Zarr codec by its relevance to compression compatibility.
67+
*
68+
* Returns `"unknown"` for any codec not in the static registries above, so the
69+
* caller can surface uncertainty (a warning) rather than guess — guessing
70+
* "compression" re-introduces the false-incompatible bug for novel serialization
71+
* codecs, while guessing "structural" silently hides genuine incompatibility.
72+
*/
73+
export function classifyCodec(name: string): CodecCompression {
74+
if (COMPRESSION_CODECS.has(name)) return "compression";
75+
if (STRUCTURAL_CODECS.has(name)) return "structural";
76+
return "unknown";
77+
}

src/index.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ export { validateViewer, isCompatible } from "./validator.js";
6565
// Re-export logo utility
6666
export { getLogoUrl } from "./logo.js";
6767

68+
// Re-export codec classification
69+
export { classifyCodec } from "./codecs.js";
70+
export type { CodecCompression } from "./codecs.js";
71+
6872
// Re-export types for consumers
6973
export type {
7074
ViewerManifest,

src/validator.test.ts

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,69 @@ describe('validateViewer', () => {
207207
expect(result.warnings[0].message).toContain('compatibility unknown');
208208
});
209209

210+
it('is compatible when v3 data uses only a structural serialization codec', () => {
211+
// Regression: every Zarr v3 array carries a `bytes` (array_to_bytes)
212+
// serialization codec. It must not be compared against the viewer's
213+
// compression codec list, or every real v3 dataset hides such viewers.
214+
const viewer = createViewer({
215+
ome_zarr_versions: [0.4, 0.5],
216+
compression_codecs: ['blosc', 'zstd', 'zlib', 'lz4', 'gzip']
217+
});
218+
const metadata = createMetadata({
219+
version: '0.5',
220+
codecs: [{ name: 'bytes' }]
221+
});
222+
223+
const result = validateViewer(viewer, metadata);
224+
225+
expect(result.dataCompatible).toBe(true);
226+
expect(result.errors).toHaveLength(0);
227+
expect(result.warnings.filter(w => w.capability === 'compression_codecs')).toHaveLength(0);
228+
});
229+
230+
it('ignores structural codecs when checking a supported compression codec', () => {
231+
const viewer = createViewer({ ome_zarr_versions: [0.4, 0.5], compression_codecs: ['blosc'] });
232+
const metadata = createMetadata({
233+
version: '0.5',
234+
codecs: [{ name: 'transpose' }, { name: 'bytes' }, { name: 'blosc' }, { name: 'crc32c' }]
235+
});
236+
237+
const result = validateViewer(viewer, metadata);
238+
239+
expect(result.dataCompatible).toBe(true);
240+
expect(result.errors).toHaveLength(0);
241+
});
242+
243+
it('warns but stays compatible when data uses an unrecognized codec the viewer does not list', () => {
244+
const viewer = createViewer({ ome_zarr_versions: [0.4, 0.5], compression_codecs: ['blosc'] });
245+
const metadata = createMetadata({
246+
version: '0.5',
247+
codecs: [{ name: 'bytes' }, { name: 'some-future-codec' }]
248+
});
249+
250+
const result = validateViewer(viewer, metadata);
251+
252+
expect(result.dataCompatible).toBe(true);
253+
expect(result.errors).toHaveLength(0);
254+
expect(result.warnings).toHaveLength(1);
255+
expect(result.warnings[0].capability).toBe('compression_codecs');
256+
expect(result.warnings[0].message).toContain('some-future-codec');
257+
expect(result.warnings[0].message).toContain('compatibility unknown');
258+
});
259+
260+
it('does not warn for an unrecognized codec the viewer explicitly lists', () => {
261+
const viewer = createViewer({ ome_zarr_versions: [0.4, 0.5], compression_codecs: ['blosc', 'some-future-codec'] });
262+
const metadata = createMetadata({
263+
version: '0.5',
264+
codecs: [{ name: 'bytes' }, { name: 'some-future-codec' }]
265+
});
266+
267+
const result = validateViewer(viewer, metadata);
268+
269+
expect(result.dataCompatible).toBe(true);
270+
expect(result.warnings.filter(w => w.capability === 'compression_codecs')).toHaveLength(0);
271+
});
272+
210273
it('returns warning when viewer has empty codec list but data uses compression', () => {
211274
const viewer = createViewer({ compression_codecs: [] });
212275
const metadata = createMetadata({ compressor: { id: 'blosc' } });

src/validator.ts

Lines changed: 36 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import type {
66
ValidationError,
77
ValidationWarning,
88
} from "./types.js";
9+
import { classifyCodec } from "./codecs.js";
910

1011
function hasTransformationType(
1112
multiscales: MultiscaleMetadata[],
@@ -75,27 +76,42 @@ export function validateViewer(
7576
});
7677
}
7778

78-
// Collect codecs from Zarr v2 (compressor.id) or Zarr v3 (codecs[].name)
79-
const dataCodecs: string[] = [];
79+
// Collect the codecs the viewer must be able to decompress. Zarr v2 exposes a
80+
// single compressor via compressor.id (always an actual compression codec);
81+
// Zarr v3 exposes an ordered pipeline via codecs[] that also contains
82+
// serialization codecs, transforms, and checksums, which must NOT be compared
83+
// against the viewer's compression codec list. See classifyCodec().
84+
const compressionCodecs: string[] = [];
85+
const unknownCodecs: string[] = [];
8086
if (metadata.compressor?.id) {
81-
dataCodecs.push(metadata.compressor.id);
82-
} else if (metadata.codecs && metadata.codecs.length > 0) {
83-
dataCodecs.push(...metadata.codecs.map((c) => c.name));
87+
compressionCodecs.push(metadata.compressor.id);
88+
} else if (metadata.codecs) {
89+
for (const codec of metadata.codecs) {
90+
switch (classifyCodec(codec.name)) {
91+
case "compression":
92+
compressionCodecs.push(codec.name);
93+
break;
94+
case "unknown":
95+
unknownCodecs.push(codec.name);
96+
break;
97+
// "structural" codecs are not relevant to compression compatibility.
98+
}
99+
}
84100
}
85101

86-
if (dataCodecs.length > 0) {
102+
if (compressionCodecs.length > 0) {
87103
if (
88104
!viewer.capabilities.compression_codecs ||
89105
viewer.capabilities.compression_codecs.length === 0
90106
) {
91107
// Viewer doesn't declare codec support - can't guarantee compatibility
92-
const codecList = dataCodecs.join("', '");
108+
const codecList = compressionCodecs.join("', '");
93109
warnings.push({
94110
capability: "compression_codecs",
95111
message: `Data uses codec '${codecList}' but viewer doesn't declare codec support - compatibility unknown`,
96112
});
97113
} else {
98-
for (const codec of dataCodecs) {
114+
for (const codec of compressionCodecs) {
99115
if (!viewer.capabilities.compression_codecs.includes(codec)) {
100116
errors.push({
101117
capability: "compression_codecs",
@@ -108,6 +124,18 @@ export function validateViewer(
108124
}
109125
}
110126

127+
// Codecs we cannot classify can be neither confirmed compatible nor ruled
128+
// incompatible. Warn (rather than error) unless the viewer explicitly lists
129+
// the codec, so a novel codec never silently hides a viewer or passes as fine.
130+
for (const codec of unknownCodecs) {
131+
if (!viewer.capabilities.compression_codecs?.includes(codec)) {
132+
warnings.push({
133+
capability: "compression_codecs",
134+
message: `Data uses unrecognized codec '${codec}' - compatibility unknown`,
135+
});
136+
}
137+
}
138+
111139
// TODO: Check rfcs_supported (hard compatibility requirement)
112140
// Blocker: OME-NGFF metadata does not expose which RFCs a dataset requires.
113141
// After determing how to implement this check, compare metadata.rfcs_required against

tsconfig.lib.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
"src/validator.ts",
1414
"src/logo.ts",
1515
"src/types.ts",
16-
"src/omezarr.ts"
16+
"src/omezarr.ts",
17+
"src/codecs.ts"
1718
],
1819
"exclude": ["src/app.ts"]
1920
}

0 commit comments

Comments
 (0)