Skip to content

Commit c79425e

Browse files
d-v-bclaude
andauthored
deprecate blosc enums (zarr-developers#3963)
* docs(spec): deprecate BloscShuffle and BloscCname enums Design for steering BloscCodec users toward literal-string parameters, with the enum classes kept importable but deprecated on member access. Canonical: https://gist.github.com/d-v-b/9fd3fe92f82a24c929129f42a6f11f60 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(codecs): deprecate BloscShuffle/BloscCname enums Member access on BloscShuffle / BloscCname now emits DeprecationWarning and returns the equivalent string. BloscCodec stores cname/shuffle as literal strings; passing a real enum.Enum instance to __init__ warns. BloscShuffle.from_int returns a str. Internal call sites in migrate_v3 continue to work because BloscCodec accepts both forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: use literal strings for BloscCodec shuffle parameter The BloscShuffle and BloscCname enums are deprecated; update doc examples to the recommended literal-string form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(changes): add changelog entry for blosc enum deprecation The 0000 filename is a placeholder; rename to the PR number when the pull request is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: drop local copy of blosc enum deprecation spec The design doc is published as a public gist linked from the PR description; the in-tree copy is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(codecs): use plain backticks in blosc docstrings Replace Sphinx :class: roles and double-backticks with single backticks in the new docstrings added by the blosc enum deprecation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codecs): cover blosc error branches Add tests for the ValueError paths in _parse_cname / _parse_shuffle and the AttributeError path in the deprecated-enum metaclass, which codecov reported as uncovered. Drop a few "type: ignore" markers that mypy now flags as unused after the init signature widened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codecs): use typing.cast instead of type-ignore in blosc tests mypy's per-file (pre-commit) and project-wide views disagree on whether the deliberately-wrong arguments need a type ignore. Using typing.cast keeps both views happy and is more explicit about what each test is asserting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(codecs): rename blosc literal aliases to Blosc*Literal Rename Shuffle -> BloscShuffleLiteral and CName -> BloscCnameLiteral. The bare Shuffle name collided with numcodecs.Shuffle re-exported from zarr.codecs, which would have caused mkdocstrings cross-refs in the BloscCodec docstring to resolve to the wrong symbol. The Literal suffix also clearly distinguishes the type alias from the deprecated BloscShuffle / BloscCname enum classes. Update the BloscCodec docstring to reference the new names in the Attributes and Parameters sections (Convention A from cast_value.py), with literal values enumerated in prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rename blosc constants * test(codecs): address review feedback on blosc deprecation tests - Parametrize the BloscShuffle / BloscCname member-access warning tests into a single test_blosc_enum_member_access_warns. - Parametrize the cname / shuffle reject-unknown tests into a single test_blosc_codec_rejects_unknown driven by **kwargs. - Parametrize the AttributeError-on-unknown-member tests into a single test_blosc_enum_attribute_error_for_unknown_member. - Add a docstring to every new test explaining what behavior it verifies, so reviewers don't have to read the body to understand the intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codecs): parametrize blosc cname/shuffle coverage and JSON roundtrip Backfill missing coverage: previously every test in the blosc suite used only "lz4" or "zstd" for cname and only "bitshuffle" or "shuffle" for shuffle. Add four parametrized tests driven by BLOSC_CNAME / BLOSC_SHUFFLE: - accepts_all_cnames / accepts_all_shuffles: every value in the runtime tuple is accepted by BloscCodec and round-trips on the stored attribute. Catches drift between the BloscCnameLiteral / BloscShuffleLiteral type aliases and their runtime BLOSC_* counterparts. - json_roundtrip_all_cnames / json_roundtrip_all_shuffles: BloscCodec to_dict / from_dict preserves every value. Codec fields are fully specified so the test doesn't trip over tunable-attribute state, which is not part of the JSON form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codecs): collapse blosc JSON roundtrip tests into one parametrize Replace the cname/shuffle JSON-roundtrip pair with a single parametrized test driven by [("cname", v) for v in BLOSC_CNAME] + [("shuffle", v) for v in BLOSC_SHUFFLE]. They asserted the same property (to_dict / from_dict preserves every literal) on two independent axes, so a single test covers both with clear per-case IDs (e.g. cname-lz4, shuffle-bitshuffle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codecs): cross-product blosc JSON roundtrip over cname x shuffle Stacked parametrize over BLOSC_CNAME and BLOSC_SHUFFLE so the JSON roundtrip exercises every (cname, shuffle) pair (18 cases instead of 9 in a disjoint union). Drops the **kwargs/dict[str, Any] indirection the disjoint form needed, since the cross-product form passes typed arguments directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Rename 0000.removal.md to 3963.removal.md --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ad90884 commit c79425e

5 files changed

Lines changed: 276 additions & 74 deletions

File tree

changes/3963.removal.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
The ``BloscShuffle`` and ``BloscCname`` enums (``zarr.codecs.BloscShuffle``,
2+
``zarr.codecs.BloscCname``) are now deprecated. Pass the equivalent literal
3+
string (e.g. ``"zstd"``, ``"bitshuffle"``) when constructing a ``BloscCodec``.
4+
The enum classes remain importable but emit ``DeprecationWarning`` on member
5+
access, and will be removed in a future release. ``BloscCodec.cname`` and
6+
``BloscCodec.shuffle`` are now plain strings rather than enum members.

docs/quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ z = zarr.create_array(
5858
compressors=zarr.codecs.BloscCodec(
5959
cname="zstd",
6060
clevel=3,
61-
shuffle=zarr.codecs.BloscShuffle.shuffle
61+
shuffle="shuffle"
6262
)
6363
)
6464

docs/user-guide/arrays.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ Different compressors can be provided via the `compressors` keyword
201201
argument accepted by all array creation functions. For example:
202202

203203
```python exec="true" session="arrays" source="above" result="ansi"
204-
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=3, shuffle=zarr.codecs.BloscShuffle.bitshuffle)
204+
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=3, shuffle='bitshuffle')
205205
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
206206
z = zarr.create_array(store='data/example-5.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
207207
z[:] = data
@@ -298,7 +298,7 @@ Here is an example using a delta filter with the Blosc compressor:
298298
from zarr.codecs.numcodecs import Delta
299299

300300
filters = [Delta(dtype='int32')]
301-
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
301+
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle='shuffle')
302302
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
303303
z = zarr.create_array(store='data/example-9.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), filters=filters, compressors=compressors)
304304
print(z.info_complete())

src/zarr/codecs/blosc.py

Lines changed: 119 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
11
from __future__ import annotations
22

33
import asyncio
4+
import warnings
45
from dataclasses import dataclass, field, replace
56
from enum import Enum
67
from functools import cached_property
7-
from typing import TYPE_CHECKING, Final, Literal, NotRequired, TypedDict
8+
from typing import TYPE_CHECKING, ClassVar, Final, Literal, NotRequired, TypedDict
89

910
import numcodecs
1011
from numcodecs.blosc import Blosc
1112
from packaging.version import Version
1213

1314
from zarr.abc.codec import BytesBytesCodec
1415
from zarr.core.buffer.cpu import as_numpy_array_wrapper
15-
from zarr.core.common import JSON, NamedRequiredConfig, parse_enum, parse_named_configuration
16+
from zarr.core.common import JSON, NamedRequiredConfig, parse_named_configuration
1617
from zarr.core.dtype.common import HasItemSize
1718

1819
if TYPE_CHECKING:
@@ -21,19 +22,21 @@
2122
from zarr.core.array_spec import ArraySpec
2223
from zarr.core.buffer import Buffer
2324

24-
Shuffle = Literal["noshuffle", "shuffle", "bitshuffle"]
25+
BloscShuffleLiteral = Literal["noshuffle", "shuffle", "bitshuffle"]
2526
"""The shuffle values permitted for the blosc codec"""
2627

27-
SHUFFLE: Final = ("noshuffle", "shuffle", "bitshuffle")
28+
BLOSC_SHUFFLE: Final = ("noshuffle", "shuffle", "bitshuffle")
2829

29-
CName = Literal["lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd"]
30-
"""The codec identifiers used in the blosc codec """
30+
BloscCnameLiteral = Literal["lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd"]
31+
"""The codec identifiers used in the blosc codec"""
32+
33+
BLOSC_CNAME: Final = ("lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd")
3134

3235

3336
class BloscConfigV2(TypedDict):
3437
"""Configuration for the V2 Blosc codec"""
3538

36-
cname: CName
39+
cname: BloscCnameLiteral
3740
clevel: int
3841
shuffle: int
3942
blocksize: int
@@ -43,9 +46,9 @@ class BloscConfigV2(TypedDict):
4346
class BloscConfigV3(TypedDict):
4447
"""Configuration for the V3 Blosc codec"""
4548

46-
cname: CName
49+
cname: BloscCnameLiteral
4750
clevel: int
48-
shuffle: Shuffle
51+
shuffle: BloscShuffleLiteral
4952
blocksize: int
5053
typesize: int
5154

@@ -56,38 +59,66 @@ class BloscJSON_V3(NamedRequiredConfig[Literal["blosc"], BloscConfigV3]):
5659
"""
5760

5861

59-
class BloscShuffle(Enum):
62+
class _DeprecatedStrEnumMeta(type):
6063
"""
61-
Enum for shuffle filter used by blosc.
64+
Metaclass for the legacy `BloscShuffle` / `BloscCname` classes. Accessing
65+
a member name (e.g. `BloscShuffle.bitshuffle`) emits a `DeprecationWarning`
66+
and returns the equivalent string.
6267
"""
6368

64-
noshuffle = "noshuffle"
65-
shuffle = "shuffle"
66-
bitshuffle = "bitshuffle"
69+
_members: dict[str, str]
6770

68-
@classmethod
69-
def from_int(cls, num: int) -> BloscShuffle:
70-
blosc_shuffle_int_to_str = {
71+
def __getattr__(cls, name: str) -> str:
72+
members: dict[str, str] = type.__getattribute__(cls, "_members")
73+
if name in members:
74+
warnings.warn(
75+
f"{cls.__name__}.{name} is deprecated; pass the string {members[name]!r} instead.",
76+
DeprecationWarning,
77+
stacklevel=2,
78+
)
79+
return members[name]
80+
raise AttributeError(name)
81+
82+
83+
class BloscShuffle(metaclass=_DeprecatedStrEnumMeta):
84+
"""
85+
Deprecated. Pass a literal string (`"noshuffle"`, `"shuffle"`, or
86+
`"bitshuffle"`) directly to `BloscCodec` instead.
87+
"""
88+
89+
_members: ClassVar[dict[str, str]] = {
90+
"noshuffle": "noshuffle",
91+
"shuffle": "shuffle",
92+
"bitshuffle": "bitshuffle",
93+
}
94+
95+
@staticmethod
96+
def from_int(num: int) -> BloscShuffleLiteral:
97+
mapping: dict[int, BloscShuffleLiteral] = {
7198
0: "noshuffle",
7299
1: "shuffle",
73100
2: "bitshuffle",
74101
}
75-
if num not in blosc_shuffle_int_to_str:
102+
if num not in mapping:
76103
raise ValueError(f"Value must be between 0 and 2. Got {num}.")
77-
return BloscShuffle[blosc_shuffle_int_to_str[num]]
104+
return mapping[num]
78105

79106

80-
class BloscCname(Enum):
107+
class BloscCname(metaclass=_DeprecatedStrEnumMeta):
81108
"""
82-
Enum for compression library used by blosc.
109+
Deprecated. Pass a literal string (one of `"lz4"`, `"lz4hc"`,
110+
`"blosclz"`, `"snappy"`, `"zlib"`, `"zstd"`) directly to
111+
`BloscCodec` instead.
83112
"""
84113

85-
lz4 = "lz4"
86-
lz4hc = "lz4hc"
87-
blosclz = "blosclz"
88-
zstd = "zstd"
89-
snappy = "snappy"
90-
zlib = "zlib"
114+
_members: ClassVar[dict[str, str]] = {
115+
"lz4": "lz4",
116+
"lz4hc": "lz4hc",
117+
"blosclz": "blosclz",
118+
"snappy": "snappy",
119+
"zstd": "zstd",
120+
"zlib": "zlib",
121+
}
91122

92123

93124
# See https://zarr.readthedocs.io/en/stable/user-guide/performance.html#configuring-blosc
@@ -118,6 +149,34 @@ def parse_blocksize(data: JSON) -> int:
118149
raise TypeError(f"Value should be an int. Got {type(data)} instead.")
119150

120151

152+
def _coerce_enum_input(value: object, param_name: str) -> object:
153+
"""
154+
If `value` is a real `enum.Enum` instance, emit a deprecation warning
155+
and return `value.value`. Otherwise return `value` unchanged.
156+
"""
157+
if isinstance(value, Enum):
158+
warnings.warn(
159+
f"Passing an enum to BloscCodec(..., {param_name}=...) is deprecated; "
160+
"pass the equivalent literal string instead.",
161+
DeprecationWarning,
162+
stacklevel=3,
163+
)
164+
return value.value
165+
return value
166+
167+
168+
def _parse_cname(data: object) -> BloscCnameLiteral:
169+
if isinstance(data, str) and data in BLOSC_CNAME:
170+
return data # type: ignore[return-value]
171+
raise ValueError(f"cname must be one of {list(BLOSC_CNAME)!r}. Got {data!r}.")
172+
173+
174+
def _parse_shuffle(data: object) -> BloscShuffleLiteral:
175+
if isinstance(data, str) and data in BLOSC_SHUFFLE:
176+
return data # type: ignore[return-value]
177+
raise ValueError(f"shuffle must be one of {list(BLOSC_SHUFFLE)!r}. Got {data!r}.")
178+
179+
121180
@dataclass(frozen=True)
122181
class BloscCodec(BytesBytesCodec):
123182
"""
@@ -133,12 +192,14 @@ class BloscCodec(BytesBytesCodec):
133192
Always False for Blosc codec, as compression produces variable-sized output.
134193
typesize : int
135194
The data type size in bytes used for shuffle filtering.
136-
cname : BloscCname
137-
The compression algorithm being used (lz4, lz4hc, blosclz, snappy, zlib, or zstd).
195+
cname : BloscCnameLiteral
196+
The compression algorithm being used; one of "lz4", "lz4hc",
197+
"blosclz", "snappy", "zlib", or "zstd".
138198
clevel : int
139199
The compression level (0-9).
140-
shuffle : BloscShuffle
141-
The shuffle filter mode (noshuffle, shuffle, or bitshuffle).
200+
shuffle : BloscShuffleLiteral
201+
The shuffle filter mode; one of "noshuffle", "shuffle", or
202+
"bitshuffle".
142203
blocksize : int
143204
The size of compressed blocks in bytes (0 for automatic).
144205
@@ -148,13 +209,16 @@ class BloscCodec(BytesBytesCodec):
148209
The data type size in bytes. This affects how the shuffle filter processes
149210
the data. If None, defaults to 1 and the attribute is marked as tunable.
150211
Default: 1.
151-
cname : BloscCname or {'lz4', 'lz4hc', 'blosclz', 'snappy', 'zlib', 'zstd'}, optional
152-
The compression algorithm to use. Default: 'zstd'.
212+
cname : BloscCnameLiteral, optional
213+
The compression algorithm to use; one of "lz4", "lz4hc", "blosclz",
214+
"snappy", "zlib", or "zstd". Default is "zstd". Passing a `BloscCname`
215+
enum is deprecated.
153216
clevel : int, optional
154217
The compression level, from 0 (no compression) to 9 (maximum compression).
155218
Higher values provide better compression at the cost of speed. Default: 5.
156-
shuffle : BloscShuffle or {'noshuffle', 'shuffle', 'bitshuffle'}, optional
157-
The shuffle filter to apply before compression:
219+
shuffle : BloscShuffleLiteral or None, optional
220+
The shuffle filter to apply before compression; one of "noshuffle",
221+
"shuffle", or "bitshuffle":
158222
159223
- 'noshuffle': No shuffling
160224
- 'shuffle': Byte shuffling (better for typesize > 1)
@@ -183,57 +247,51 @@ class BloscCodec(BytesBytesCodec):
183247
>>> codec.typesize
184248
1
185249
>>> codec.shuffle
186-
<BloscShuffle.bitshuffle: 'bitshuffle'>
250+
'bitshuffle'
187251
188252
Create a codec with specific compression settings:
189253
190254
>>> codec = BloscCodec(cname='zstd', clevel=9, shuffle='shuffle')
191255
>>> codec.cname
192-
<BloscCname.zstd: 'zstd'>
193-
194-
See Also
195-
--------
196-
BloscShuffle : Enum for shuffle filter options
197-
BloscCname : Enum for compression algorithm options
256+
'zstd'
198257
"""
199258

200259
# This attribute tracks parameters were set to None at init time, and thus tunable
201260
_tunable_attrs: set[Literal["typesize", "shuffle"]] = field(init=False)
202261
is_fixed_size = False
203262

204263
typesize: int
205-
cname: BloscCname
264+
cname: BloscCnameLiteral
206265
clevel: int
207-
shuffle: BloscShuffle
266+
shuffle: BloscShuffleLiteral
208267
blocksize: int
209268

210269
def __init__(
211270
self,
212271
*,
213272
typesize: int | None = None,
214-
cname: BloscCname | CName = BloscCname.zstd,
273+
cname: BloscCname | BloscCnameLiteral = "zstd",
215274
clevel: int = 5,
216-
shuffle: BloscShuffle | Shuffle | None = None,
275+
shuffle: BloscShuffle | BloscShuffleLiteral | None = None,
217276
blocksize: int = 0,
218277
) -> None:
219278
object.__setattr__(self, "_tunable_attrs", set())
220279

221-
# If typesize was set to None, replace it with a valid typesize
222-
# and flag the typesize attribute as safe to replace later
223280
if typesize is None:
224281
typesize = 1
225282
self._tunable_attrs.update({"typesize"})
226283

227-
# If shuffle was set to None, replace it with a valid shuffle
228-
# and flag the shuffle attribute as safe to replace later
229284
if shuffle is None:
230-
shuffle = BloscShuffle.bitshuffle
285+
shuffle = "bitshuffle"
231286
self._tunable_attrs.update({"shuffle"})
232287

288+
cname = _coerce_enum_input(cname, "cname") # type: ignore[assignment]
289+
shuffle = _coerce_enum_input(shuffle, "shuffle") # type: ignore[assignment]
290+
233291
typesize_parsed = parse_typesize(typesize)
234-
cname_parsed = parse_enum(cname, BloscCname)
292+
cname_parsed = _parse_cname(cname)
235293
clevel_parsed = parse_clevel(clevel)
236-
shuffle_parsed = parse_enum(shuffle, BloscShuffle)
294+
shuffle_parsed = _parse_shuffle(shuffle)
237295
blocksize_parsed = parse_blocksize(blocksize)
238296

239297
object.__setattr__(self, "typesize", typesize_parsed)
@@ -252,9 +310,9 @@ def to_dict(self) -> dict[str, JSON]:
252310
"name": "blosc",
253311
"configuration": {
254312
"typesize": self.typesize,
255-
"cname": self.cname.value,
313+
"cname": self.cname,
256314
"clevel": self.clevel,
257-
"shuffle": self.shuffle.value,
315+
"shuffle": self.shuffle,
258316
"blocksize": self.blocksize,
259317
},
260318
}
@@ -276,20 +334,20 @@ def evolve_from_array_spec(self, array_spec: ArraySpec) -> Self:
276334
if "shuffle" in self._tunable_attrs:
277335
new_codec = replace(
278336
new_codec,
279-
shuffle=(BloscShuffle.bitshuffle if item_size == 1 else BloscShuffle.shuffle),
337+
shuffle=("bitshuffle" if item_size == 1 else "shuffle"),
280338
)
281339

282340
return new_codec
283341

284342
@cached_property
285343
def _blosc_codec(self) -> Blosc:
286-
map_shuffle_str_to_int = {
287-
BloscShuffle.noshuffle: 0,
288-
BloscShuffle.shuffle: 1,
289-
BloscShuffle.bitshuffle: 2,
344+
map_shuffle_str_to_int: dict[BloscShuffleLiteral, int] = {
345+
"noshuffle": 0,
346+
"shuffle": 1,
347+
"bitshuffle": 2,
290348
}
291349
config_dict: BloscConfigV2 = {
292-
"cname": self.cname.name, # type: ignore[typeddict-item]
350+
"cname": self.cname,
293351
"clevel": self.clevel,
294352
"shuffle": map_shuffle_str_to_int[self.shuffle],
295353
"blocksize": self.blocksize,

0 commit comments

Comments
 (0)