@@ -25,6 +25,11 @@ binary values. Row-backed constants become the representation for non-null list,
2525struct, variant, and other complex values where nested scalar materialization is expensive or
2626requires array-level storage.
2727
28+ Scalar functions already operate over ` ArrayRef ` inputs. This RFC does not change that calling
29+ convention. It only requires literals and constants to enter scalar-function execution as
30+ ` ConstantArray(len = row_count, value = ...) ` , with complex values represented by row-backed
31+ constants.
32+
2833## Motivation
2934
3035Vortex currently uses ` ScalarValue::Tuple(Vec<Option<ScalarValue>>) ` for list, fixed-size-list, and
@@ -56,6 +61,8 @@ expensive nested scalar object.
5661- Avoid requiring an ` ExecutionCtx ` to construct or validate a ` Scalar ` .
5762- Allow in-memory complex constants to hold device-resident array buffers without copying them into
5863 host scalar values.
64+ - Define a scalar-function broadcasting model where every input has logical length ` row_count ` , and
65+ constants are represented as ` ConstantArray ` s.
5966- Preserve compatibility with existing scalar literal and constant-array encodings.
6067
6168## Non-Goals
@@ -64,6 +71,7 @@ expensive nested scalar object.
6471- This RFC does not require every scalar-like API to move to arrays in one change.
6572- This RFC does not define a device-resident ` Scalar ` .
6673- This RFC does not require canonicalizing complex constants during expression deserialization.
74+ - This RFC does not redesign scalar function child execution or ` Columnar ` .
6775
6876## Design
6977
@@ -228,6 +236,28 @@ Row-backed constants should canonicalize by broadcasting the singleton row struc
228236The key rule is that row-backed constants should not be converted into recursive ` ScalarValue::Tuple `
229237except when an API explicitly asks for a ` Scalar ` .
230238
239+ ### Scalar functions and broadcasting
240+
241+ Scalar functions already take ` ArrayRef ` inputs. This RFC keeps that interface unchanged.
242+
243+ The required calling convention is:
244+
245+ - every input array has logical length ` args.row_count() `
246+ - scalar broadcasting is represented by ` ConstantArray(len = args.row_count(), value = ...) `
247+ - row-backed broadcasting is represented by
248+ ` ConstantArray(len = args.row_count(), value = Row(singleton_array)) `
249+ - naked length-1 arrays are not normal scalar function inputs, except for private sub-executions
250+ such as evaluating an all-constant expression once
251+
252+ This means scalar functions do support broadcasting, but broadcasting is encoded in the array
253+ representation rather than in per-kernel length checks. A scalar function should not need to handle
254+ both ` len == 1 ` and ` len == row_count ` inputs. Its inputs are always length ` row_count ` ; some are
255+ physically constant.
256+
257+ Scalar functions may still use constant fast paths by checking whether an input is a ` ConstantArray ` .
258+ This RFC only broadens what a constant can contain. It does not require changing the scalar-function
259+ execution API, adding argument materialization helpers, or changing ` Columnar ` .
260+
231261### Serialization of ConstantArray
232262
233263The existing serialized form for ` vortex.constant ` should remain readable:
0 commit comments