Add definitely_all_null() for cheap all-null detection#8475
Add definitely_all_null() for cheap all-null detection#8475joseph-isaacs wants to merge 5 commits into
definitely_all_null() for cheap all-null detection#8475Conversation
Foundation for representing all-null arrays as Constant(null) and removing Validity::AllInvalid (#8443). - ConstantArray::null(dtype, len) constructs the canonical all-null array: a single null scalar repeated, with no values buffer or validity child. - ArrayRef::all_null() is a cheap, non-executing, conservative check for "entirely null": true for a constant-null array or a statically all-invalid validity (including a constant-false validity array, the representation all-null arrays will use once AllInvalid is gone). It runs no compute, so a false result means "not provably all-null", not "has valid values". Compute entry points will call all_null() to short-circuit an entirely-null input to Constant(null) and skip canonicalization. Signed-off-by: Joseph Isaacs <joe.isaacs@live.co.uk> https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw
Validity::not had no callers anywhere in the workspace: a repo-wide audit of every `.not()` site found only Mask, BitBuffer, and ArrayRef receivers, with no UFCS Validity::not call and no `impl Not for Validity`. It is also the only place that constructs Validity::AllInvalid without a length in scope (AllValid -> AllInvalid). Removing it eliminates the one structural blocker to deleting the AllInvalid variant (#8443): every remaining producer already has a length, so no length-threading through the Validity algebra is required. Signed-off-by: Joseph Isaacs <joe.isaacs@live.co.uk> https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw
…8443) Step 3 of removing Validity::AllInvalid. Exercises the foundation helpers: - list filter and struct take now return ConstantArray::null(...) for the all-null result instead of constructing an all-null concrete array. - is_null / is_not_null gain a cheap ArrayRef::all_null() short-circuit for entirely-null concrete inputs (the constant-input case is already handled). All changes are logically behavior-preserving: an all-null result is the same values and null mask whether encoded as Constant(null) or a concrete array. This also confirms the previously-unused all_null() and ConstantArray::null helpers now have real (non-test) call sites. Signed-off-by: Joseph Isaacs <joe.isaacs@live.co.uk> https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw
The check is conservative and non-executing: a false result means "not provably all-null", not "has valid values". Rename to definitely_all_null to make that contract explicit and mirror the existing Validity::definitely_no_nulls. Signed-off-by: Joseph Isaacs <joe.isaacs@live.co.uk> https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw
…null (#8443) Replace the explicit `matches!(validity, Validity::AllInvalid)` check in the fill_null precondition with `array.definitely_all_null()?`. Behavior-preserving and slightly more general: it also short-circuits a constant-null input or a constant-false validity array (the representations all-null arrays move to), without matching the variant directly. Prepares the consumer for the eventual .validity() pivot. Signed-off-by: Joseph Isaacs <joe.isaacs@live.co.uk> https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw
Merging this PR will degrade performance by 29.83%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
Summary
This PR introduces a new
definitely_all_null()method toArrayRefthat performs cheap, static detection of entirely-null arrays without executing compute. This enables short-circuiting in compute kernels to avoid unnecessary work and canonicalization when processing all-null inputs.The method returns
trueonly when all-null-ness can be proven without computation:Validity::AllInvalidfalsevalidity arrayA
falseresult means "not provably all-null" (conservative), not "contains valid values", so callers must fall back to their normal path.Changes
New
definitely_all_null()method (vortex-array/src/array/erased.rs):Removed unused
Validity::not()method (vortex-array/src/validity.rs):Updated compute kernels to use the new method:
is_null.rs: Short-circuit entirely-null inputs to return all-trueis_not_null.rs: Short-circuit entirely-null inputs to return all-falsestruct/compute/take.rs: Return canonical all-null constant array instead of manually constructing withValidity::AllInvalidlist/compute/filter.rs: Return canonical all-null constant array instead of manually constructingAdded
ConstantArray::null()helper (vortex-array/src/arrays/constant/array.rs):Testing
Added comprehensive unit tests in
vortex-array/src/array/erased.rs:definitely_all_null_detects_constant_null: Verifies detection of constant-null and constant-non-null arraysdefinitely_all_null_via_validity: Verifies detection viaAllInvalidvalidity, constant-falsevalidity arrays, and non-nullable/all-valid casesExisting tests pass with the updated compute kernels using the new short-circuit path.
https://claude.ai/code/session_01Q8K741TL4zABgsL1N4kLWw