fix: NaN pushdown correctly pushes down NaNs correctness issue #2351
fix: NaN pushdown correctly pushes down NaNs correctness issue #2351blackmwk merged 8 commits intoapache:mainfrom
Conversation
df83b15 to
f964398
Compare
|
Could be a candidate for #2303 |
|
Thanks for catching this, @xanderbailey! The A few things I noticed: NULL propagation semantics
The Iceberg spec (and the Java implementation in
With null propagation, One approach: build the array without null propagation, e.g. iterate and treat null slots explicitly as "not NaN". Or apply Non-float fallbackThe Test assertionsThe test currently spot-checks indices 0, 1, and 3 but skips the NULL at index 2. It might be worth asserting the full truth table from the spec — something like: // Input: [Some(1.0), Some(NaN), None, Some(0.0)]
// is_nan: NaN→true, value→false, NULL→false (not null-propagating per spec)
let result = apply_predicate_to_batch(Reference::new("qux").is_nan(), schema.clone(), batch);
assert!(!result.is_null(0));
assert!(!result.value(0)); // 1.0 is not NaN
assert!(!result.is_null(1));
assert!(result.value(1)); // NaN is NaN
assert!(!result.is_null(2));
assert!(!result.value(2)); // NULL → false, not NULL
assert!(!result.is_null(3));
assert!(!result.value(3)); // 0.0 is not NaN
// not_nan: NaN→false, value→true, NULL→true (not null-propagating per spec)
let result = apply_predicate_to_batch(Reference::new("qux").is_not_nan(), schema, batch);
assert!(!result.is_null(0));
assert!(result.value(0)); // 1.0 is not NaN
assert!(!result.is_null(1));
assert!(!result.value(1)); // NaN is NaN
assert!(!result.is_null(2));
assert!(result.value(2)); // NULL → true, not NULL
assert!(!result.is_null(3));
assert!(result.value(3)); // 0.0 is not NaNThis locks in the non-null-propagating semantics from the spec table above, and would catch the current |
| }; | ||
|
|
||
| let values = match nulls { | ||
| Some(nulls) => &values & nulls.inner(), |
There was a problem hiding this comment.
nulls.inner() is an "is not null" mask
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @xanderbailey for this fix!
Which issue does this PR close?
Not tied to a specific issue - found during an audit of pushdown filter gaps.
What changes are included in this PR?
PredicateConverter::is_nanandnot_nanwere never actually implemented -is_nanreturnedalways_true(matches every row) andnot_nanreturnedalways_false(matches no rows).Every other predicate in
PredicateConverterprojects the column from the batch and runs an arrow compute kernel, but these two just returned constants.This adds a
compute_is_nanhelper that downcasts toFloat32Array/Float64Arrayand checks each value withf.is_nan(), preserving nulls. Non-float types return all false.is_nanandnot_nannow use it the same wayis_null/not_nullusearrow::is_null/is_not_null.Are these changes tested?
Yes, test added