Skip to content

Commit 59dd543

Browse files
committed
Scalar Values
Signed-off-by: Nicholas Gates <nick@nickgates.com>
1 parent 69bcb09 commit 59dd543

1 file changed

Lines changed: 30 additions & 0 deletions

File tree

rfcs/0028-scalar-values.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@ binary values. Row-backed constants become the representation for non-null list,
2525
struct, variant, and other complex values where nested scalar materialization is expensive or
2626
requires array-level storage.
2727

28+
Scalar functions already operate over `ArrayRef` inputs. This RFC does not change that calling
29+
convention. It only requires literals and constants to enter scalar-function execution as
30+
`ConstantArray(len = row_count, value = ...)`, with complex values represented by row-backed
31+
constants.
32+
2833
## Motivation
2934

3035
Vortex currently uses `ScalarValue::Tuple(Vec<Option<ScalarValue>>)` for list, fixed-size-list, and
@@ -56,6 +61,8 @@ expensive nested scalar object.
5661
- Avoid requiring an `ExecutionCtx` to construct or validate a `Scalar`.
5762
- Allow in-memory complex constants to hold device-resident array buffers without copying them into
5863
host scalar values.
64+
- Define a scalar-function broadcasting model where every input has logical length `row_count`, and
65+
constants are represented as `ConstantArray`s.
5966
- Preserve compatibility with existing scalar literal and constant-array encodings.
6067

6168
## Non-Goals
@@ -64,6 +71,7 @@ expensive nested scalar object.
6471
- This RFC does not require every scalar-like API to move to arrays in one change.
6572
- This RFC does not define a device-resident `Scalar`.
6673
- This RFC does not require canonicalizing complex constants during expression deserialization.
74+
- This RFC does not redesign scalar function child execution or `Columnar`.
6775

6876
## Design
6977

@@ -228,6 +236,28 @@ Row-backed constants should canonicalize by broadcasting the singleton row struc
228236
The key rule is that row-backed constants should not be converted into recursive `ScalarValue::Tuple`
229237
except when an API explicitly asks for a `Scalar`.
230238

239+
### Scalar functions and broadcasting
240+
241+
Scalar functions already take `ArrayRef` inputs. This RFC keeps that interface unchanged.
242+
243+
The required calling convention is:
244+
245+
- every input array has logical length `args.row_count()`
246+
- scalar broadcasting is represented by `ConstantArray(len = args.row_count(), value = ...)`
247+
- row-backed broadcasting is represented by
248+
`ConstantArray(len = args.row_count(), value = Row(singleton_array))`
249+
- naked length-1 arrays are not normal scalar function inputs, except for private sub-executions
250+
such as evaluating an all-constant expression once
251+
252+
This means scalar functions do support broadcasting, but broadcasting is encoded in the array
253+
representation rather than in per-kernel length checks. A scalar function should not need to handle
254+
both `len == 1` and `len == row_count` inputs. Its inputs are always length `row_count`; some are
255+
physically constant.
256+
257+
Scalar functions may still use constant fast paths by checking whether an input is a `ConstantArray`.
258+
This RFC only broadens what a constant can contain. It does not require changing the scalar-function
259+
execution API, adding argument materialization helpers, or changing `Columnar`.
260+
231261
### Serialization of ConstantArray
232262

233263
The existing serialized form for `vortex.constant` should remain readable:

0 commit comments

Comments
 (0)