consider removing default compressor / filters / serializer from config

Our config right now contains this logic for defining a default encoding scheme for a given data type:

https://github.com/zarr-developers/zarr-python/blob/af55fcfaefa42b5ef556b1b5be33dcdd06a7fd0b/src/zarr/core/config.py#L85-L107

This approach is problematic because it requires dividing our data types into separate categories which are not very well defined -- is a fixed-length utf32 data type a "string" or "numeric" type?

Given the changes coming in #2874, I propose the following alteration to our approach here:

- Pull this stuff out of the config entirely.

- Confine all this logic to a single function for automatically picking a chunk encoding based on a data type + a requested chunk encoding. This function should also check for incompatibility between a data type and a requested chunk encoding. For example, if someone requests a variable-length string data type but does not specify vlen-utf8 as a serializer, then they should get a clear, early error.

These would be breaking changes, but our current approach is, IMO, unworkable.

	"v2_default_filters": {
	"numeric": None,
	"string": [{"id": "vlen-utf8"}],
	"bytes": [{"id": "vlen-bytes"}],
	"raw": None,
	},
	"v3_default_filters": {"numeric": [], "string": [], "bytes": []},
	"v3_default_serializer": {
	"numeric": {"name": "bytes", "configuration": {"endian": "little"}},
	"string": {"name": "vlen-utf8"},
	"bytes": {"name": "vlen-bytes"},
	},
	"v3_default_compressors": {
	"numeric": [
	{"name": "zstd", "configuration": {"level": 0, "checksum": False}},
	],
	"string": [
	{"name": "zstd", "configuration": {"level": 0, "checksum": False}},
	],
	"bytes": [
	{"name": "zstd", "configuration": {"level": 0, "checksum": False}},
	],
	},

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

consider removing default compressor / filters / serializer from config #3104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

consider removing default compressor / filters / serializer from config #3104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions