Skip to content

Commit 22a27cd

Browse files
Copilotshaypal5
andcommitted
Document tobytes() copy behavior in NumPy array hashing (#343)
* Initial plan * Document tobytes(order="C") behavior in _hash_numpy_array docstring Co-authored-by: shaypal5 <917954+shaypal5@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: shaypal5 <917954+shaypal5@users.noreply.github.com>
1 parent 698b9c0 commit 22a27cd

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

src/cachier/config.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,26 @@ def _is_numpy_array(value: Any) -> bool:
2929
def _hash_numpy_array(hasher: "hashlib._Hash", value: Any) -> None:
3030
"""Update hasher with NumPy array metadata and buffer content.
3131
32+
The array content is converted to bytes using C-order (row-major) layout
33+
to ensure consistent hashing regardless of memory layout. This operation
34+
may create a copy if the array is not already C-contiguous (e.g., for
35+
transposed arrays, sliced views, or Fortran-ordered arrays), which has
36+
performance implications for large arrays.
37+
3238
Parameters
3339
----------
3440
hasher : hashlib._Hash
3541
The hasher to update.
3642
value : Any
3743
A NumPy ndarray instance.
3844
45+
Notes
46+
-----
47+
The ``tobytes(order="C")`` call ensures deterministic hash values by
48+
normalizing the memory layout, but may incur a memory copy for
49+
non-contiguous arrays. For optimal performance with large arrays,
50+
consider using C-contiguous arrays when possible.
51+
3952
"""
4053
hasher.update(b"numpy.ndarray")
4154
hasher.update(value.dtype.str.encode("utf-8"))

0 commit comments

Comments
 (0)