Write chunks with negative zero values and a zero fill value#3216
Write chunks with negative zero values and a zero fill value#3216d-v-b merged 8 commits intozarr-developers:mainfrom
Conversation
d4c1205 to
2745b68
Compare
2745b68 to
7d6d74b
Compare
|
Oh, oops, thanks 😅 |
7d6d74b to
c4904e3
Compare
|
this test failure seems significant: https://github.com/zarr-developers/zarr-python/actions/runs/16172926021/job/45650861381?pr=3216#step:8:420 |
Yes looks like this approach doesn't work for complex number types |
|
what if we view the array as raw bytes (should be cheap) and compare the raw bytes? >>> import numpy as np
>>> np.array([0.0]) == np.array([-0.0])
array([ True])
>>> np.array([0.0]).view('V') == np.array([-0.0]).view('V')
array([False]) |
|
I wonder if that would somehow break with floating point subnormal-s and the like. Will have to experiment 🤔 |
Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
|
Took me a bit, but finally got around to it. Subnormals are fine, and behave as expected; the only difference between Python's float equality and bitwise float equality is that signed zeroes compare as un-equal when comparing their bits, and that nan numbers can sometimes compare as equal when comparing their bits; the former is exactly what we want, and the latter won't occur since the code path is triggered only for signed zero fill values. >>> import numpy as np
>>> np.array(1e-323).view('V') == np.array(0.0).view('V'), 1e-323 == 0.0
(array(False), False)
>>> np.array(1e-324).view('V') == np.array(0.0).view('V'), 1e-324 == 0.0
(array(True), True)
>>> np.array(-1e-323).view('V') == np.array(-0.0).view('V'), -1e-323 == -0.0
(array(False), False)
>>> np.array(-1e-324).view('V') == np.array(-0.0).view('V'), -1e-324 == -0.0
(array(True), True)
>>> np.array(-0.0).view('V') == np.array(0.0).view('V'), 0.0 == -0.0
(array(False), True)
>>> np.inf * 0.0
nan
>>> np.array(np.nan).view('V') == np.array(np.nan).view('V'), np.nan == np.nan
(array(True), False)
>>> np.array(np.inf * 0.0).view('V') == np.array(np.nan).view('V'), np.inf * 0.0 == np.nan
(array(False), False) |
This is actually potentially super useful, because the zarr v3 spec distinguishes between different types of nans, even though numpy does not. In order to ensure that arrays round-trip correctly through zarr python, we need to generate exactly the specific nan defined in the metadata. I did a quick check and numpy will preserve the underlying byte representation of different nans, so this should be possible. np.array([b'\x00\x00\x00\x00\x00\x00\xFF\xFF'], dtype='|V8').view('float').view('V')
array([b'\x00\x00\x00\x00\x00\x00\xFF\xFF'], dtype='|V8') |
|
Oh, that's curious! Probably not something I can quite incorporate in the code here... unless we make all floating point arrays use bitwise comparison for empty chunks.. 🤔 |
0a596eb to
919be15
Compare
f01d3c0 to
dba8b0b
Compare
|
That |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3216 +/- ##
=======================================
Coverage 94.54% 94.54%
=======================================
Files 78 78
Lines 9419 9423 +4
=======================================
+ Hits 8905 8909 +4
Misses 514 514
🚀 New features to boost your workflow:
|
|
|
||
| # initialize the with the negated fill value (-0.0 for +0.0, +0.0 for -0.0) | ||
| arr[:] = -fill_value | ||
| assert arr.nchunks_initialized == arr.nchunks |
There was a problem hiding this comment.
this test is fine but ideally we would be testing the altered function explicitly, instead of indirectly via array creation + chunk writing. this is not a blocker for this PR, just something to sort out down the road
There was a problem hiding this comment.
That test is basically copied from test_write_empty_chunks_behavior right above it. But yeah, it might be worth to have both a unit and an integration test in this case (:
d-v-b
left a comment
There was a problem hiding this comment.
thanks for this fix @bojidar-bg!
…ues and a zero fill value
…fill value (#3349) Co-authored-by: Bojidar Marinov <bojidar.marinov.bg@gmail.com>
Fixes #3144.
Using
np.any(self._data)was inspired by how Zarr v2 checks for equality with a falsey fill value.TODO:
docs/user-guide/*.rstchanges/