Skip to content

ScalarFnArray Validity causes eager BoolArray compute #8471

@joseph-isaacs

Description

@joseph-isaacs

When a scalar function does not define its own validity (validity_opt returns None — currently Kleene and/or), ScalarFn::validity uses the generic fallback is_not_null(self). This is self-referential: the validity of the ScalarFn array X = and(c0, c1) is defined as is_not_null(X), but
evaluating is_not_null(X) needs X's validity — which is again is_not_null(X). Represented as a lazy array DAG this is a genuine call cycle (X.validity()is_not_null(X) → per-row eval → X.is_invalid()X.validity() → …) and overflows the stack. The None branch in
vortex-array/src/arrays/scalar_fn/vtable/validity.rs avoids the cycle only by materializing the result with execute_expr: once X is a concrete array its validity is a real buffer, so is_not_null reads it directly instead of re-entering the validity() vtable.

The cleaner fix is to break the cycle at its source by implementing is_not_null (and the related null-predicate functions) as array kernels. A kernel that computes the not-null mask directly from an input array — canonicalizing/reading its concrete validity rather than routing back through that array's
validity() vtable — means is_not_null(X) no longer depends on X.validity(), so the fallback can be represented lazily without recursion and the special-cased eager execute_expr branch can be removed entirely. (As an interim measure, and/or were given explicit Kleene validity expressions over their
operands' masks and null-filled values, which sidesteps the cycle per-operator; the is_not_null kernel would generalize this to every function that relies on the fallback.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions