Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ build:
- asdf install uv latest
- asdf global uv latest
- uv sync --group docs
- uv run mkdocs build --site-dir $READTHEDOCS_OUTPUT/html
- uv run zensical build --site-dir $READTHEDOCS_OUTPUT/html
68 changes: 64 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,69 @@

# cast-value.py

Python implementation of the `cast_value` codec for Zarr.
A python implementation of the `cast_value` codec for [Zarr](https://zarr.dev/),
with [zarr-python](https://zarr.readthedocs.io/en/stable/) integration.

## `cast_value` codec
## what

The `cast_value` codec defines an operation for safely converting an array from
one numeric data type to another.
The `cast_value` codec defines how to _safely_ convert arrays between integer
and float data types. In Zarr terminology, this codec is an "array -> array"
codec, which means its input and output are both arrays.

You can find the
[specification for this codec](https://github.com/zarr-developers/zarr-extensions/tree/main/codecs/cast_value)
in the
[zarr-extensions repository](https://github.com/zarr-developers/zarr-extensions).

## why

This codec is commonly used to for lossy data compression: when decoded data
should be high-precision floats, but the absolute range of the values fits
within the range of a smaller integer data type, then encoding the floats as
ints before writing data can vastly shrink the stored values.

For example, if your data is a sequence of `float64` values like
`[100.1, 120.3, 125.5]`, storing those values as `uint8`, e.g.
`[100, 120, 125]`, offers 8-fold reduction in storage size, provided the
precision lost due to rounding is acceptable.

## how

```python
# import the codec that uses the rust backend
from cast_value import CastValueRustV1

# Create an in-memory zarr array with float64 dtype, stored as uint8.
# The cast_value codec handles the conversion: float64 -> uint8 on write,
# uint8 -> float64 on read.

codec = CastValueRustV1(
data_type="uint8",
rounding="nearest-even",
out_of_range="clamp",
scalar_map={
"encode": [(np.nan, 0), (np.inf, 1), (-np.inf, 2)],
"decode": [(0, np.nan), (1, np.inf), (2, -np.inf)],
},
)
# Create array and write float64 data — values are rounded and clamped to [0, 255]
data = np.array([np.nan, np.inf, -np.inf, 3.3, 4])
arr = zarr.create_array(data=data, store=zarr.storage.MemoryStore(), filters=codec)

# Read it back — comes back as float64, but with uint8 precision
result = arr[:]

print(f"Array dtype: {arr.dtype}")
print(f"Values written: {data}")
print(f"Values read: {result}")

"""
Array dtype: float64
Values written: [ nan inf -inf 3.3 4. ]
Values read: [ nan inf -inf 3. 4.]
"""
```

# who

Davis Bennett (@d-v-b)
22 changes: 21 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
# ::: cast_value.example
# Python API

## Zarr Codec

::: cast_value.CastValueRustV1

::: cast_value.CastValueNumpyV1

::: cast_value.cast_array

::: cast_value.ScalarMapJSON

::: cast_value.ScalarMapEntry

::: cast_value.ScalarMapEntries

::: cast_value.RoundingMode

::: cast_value.OutOfRangeMode

::: cast_value.NumericScalar
61 changes: 52 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,61 @@
# cast-value

Here you can document whatever you'd like on your main page. Common choices
include installation instructions, a minimal usage example, BibTex citations,
and contribution guidelines.
Python implementation of the
[`cast_value` codec](https://github.com/zarr-developers/zarr-extensions/tree/main/codecs/cast_value)
for [Zarr V3](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html).

See [this link](https://squidfunk.github.io/mkdocs-material/reference/) for all
the easy references and components you can use with mkdocs-material, or feel
free to go through through
[from the top](https://squidfunk.github.io/mkdocs-material/).
The `cast_value` codec converts array elements between numeric data types during
encoding and decoding, with configurable rounding, out-of-range handling, and
explicit scalar mappings.

## Installation

You can install this package via running:
```bash
pip install cast-value
```

For the optional Rust-accelerated backend:

```bash
pip install cast_value
pip install cast-value[rs]
```

## Quick start

```python
import numpy as np
import zarr
from cast_value import CastValueNumpyV1

zarr.registry.register_codec("cast_value", CastValueNumpyV1)

codec = CastValueNumpyV1(
data_type="uint8",
rounding="nearest-even",
out_of_range="clamp",
)

arr = zarr.create(
shape=(100,),
dtype="float64",
chunks=(10,),
store=zarr.storage.MemoryStore(),
codecs=[codec, zarr.codecs.BytesCodec()],
fill_value=0.0,
)

arr[:] = np.linspace(0, 300, 100)
print(arr[:10]) # [0. 3. 6. 9. 12. 15. 18. 21. 24. 27.]
```

## Backends

Two backends are available:

- **`CastValueNumpyV1`** — Pure Python + NumPy. Always available.
- **`CastValueRustV1`** — Rust via
[cast-value-rs](https://pypi.org/project/cast-value-rs/). Faster for
non-default rounding modes and SIMD-accelerated float-to-integer casts with
clamping, with more efficient memory usage.

Both implement the same codec interface and produce identical results.
2 changes: 1 addition & 1 deletion examples/benchmarks/bench_numpy_vs_rust.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import numpy as np
from cast_value_rs import cast_array as rs_cast_array

from cast_value.core import cast_array as numpy_cast_array
from cast_value.impl._numpy import cast_array as numpy_cast_array

SIZE = 1_000_000
WARMUP = 3
Expand Down
8 changes: 5 additions & 3 deletions examples/zarr_integration/zarr_cast_value.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,20 @@

import numpy as np
import zarr
import zarr.registry
import zarr.storage

from cast_value.zarr_compat.v1 import CastValueRust
from cast_value.zarr_compat.v1 import CastValueRustV1

# Register the codec so zarr can discover it by name
zarr.registry.register_codec("cast_value", CastValueRust)
zarr.registry.register_codec("cast_value", CastValueRustV1)


def main() -> None:
# Create an in-memory zarr array with float64 dtype, stored as uint8.
# The cast_value codec handles the conversion: float64 -> uint8 on write,
# uint8 -> float64 on read.
codec = CastValueRust(
codec = CastValueRustV1(
data_type="uint8",
rounding="nearest-even",
out_of_range="clamp",
Expand Down
12 changes: 6 additions & 6 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ site_name: cast-value
site_url: https://cast-value.readthedocs.io/
site_author: "Davis Bennett"

repo_name: "zarr-developers/cast_value"
repo_url: "https://github.com/zarr-developers/cast-value"
repo_name: "zarr-developers/cast-value.py"
repo_url: "https://github.com/zarr-developers/cast-value.py"

theme:
name: material
Expand All @@ -16,8 +16,6 @@ theme:
- navigation.tracking
- toc.follow
palette:
# See options to customise your color scheme here:
# https://squidfunk.github.io/mkdocs-material/setup/changing-the-colors/
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
Expand All @@ -34,17 +32,19 @@ plugins:
mkdocstrings:
handlers:
python:
paths: [.]
paths: [src]
inventories:
- https://docs.python.org/3/objects.inv
- https://docs.pydantic.dev/latest/objects.inv
options:
docstring_style: numpy
members_order: source
separate_signature: true
filters: ["!^_"]
show_root_heading: true
show_if_no_docstring: true
show_signature_annotations: true
signature_crossrefs: true
scoped_crossrefs: true
search: {}

nav:
Expand Down
4 changes: 2 additions & 2 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ def docs(session: nox.Session) -> None:
session.install("-e.", *doc_deps)

if session.interactive:
session.run("mkdocs", "serve", "--clean", *session.posargs)
session.run("zensical", "serve", *session.posargs)
else:
session.run("mkdocs", "build", "--clean", *session.posargs)
session.run("zensical", "build", *session.posargs)


@nox.session(default=False)
Expand Down
7 changes: 2 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,8 @@ dev = [
{ include-group = "bench" },
]
docs = [
"markdown>=3.9",
"mdx-include>=1.4.2",
"mkdocs-material>=9.1.19",
"mkdocs>=1.1.2",
"zensical",
"mkdocstrings-python>=1.18.2",
"pyyaml>=6.0.1",
]


Expand Down Expand Up @@ -106,6 +102,7 @@ report.exclude_also = [

[tool.ruff]
show-fixes = true
exclude = ["src/cast_value/_version.py"]

[tool.ruff.lint]
extend-select = [
Expand Down
23 changes: 22 additions & 1 deletion src/cast_value/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,26 @@
from __future__ import annotations

from ._version import version as __version__
from .impl._numpy import cast_array
from .types import (
NumericScalar,
OutOfRangeMode,
RoundingMode,
ScalarMapEntries,
ScalarMapEntry,
ScalarMapJSON,
)
from .zarr_compat.v1 import CastValueNumpyV1, CastValueRustV1

__all__ = ["__version__"]
__all__ = [
"CastValueNumpyV1",
"CastValueRustV1",
"NumericScalar",
"OutOfRangeMode",
"RoundingMode",
"ScalarMapEntries",
"ScalarMapEntry",
"ScalarMapJSON",
"__version__",
"cast_array",
]
Empty file added src/cast_value/impl/__init__.py
Empty file.
Loading
Loading