11# Execution
22
3- Vortex defers computation wherever possible. When an expression is applied to an array, the
4- result is not computed immediately. Instead, a ` ScalarFnArray ` is constructed that captures the
5- operation and its inputs as a new array node. The actual computation happens later, when the
6- array is materialized -- either by a scan, a query engine, or an explicit ` execute() ` call.
3+ Vortex defers computation wherever possible. Instead of immediately materializing intermediate
4+ results, it represents them as arrays that still describe work to be done, such as ` FilterArray `
5+ and [ ` ScalarFnArray ` ] ( ../../concepts/expressions.md ) . The actual computation happens later, when
6+ the array is materialized -- either by a scan, a query engine, or an explicit ` execute() ` call.
77
88## Why Defer
99
@@ -32,19 +32,201 @@ pub trait Executable: Sized {
3232}
3333```
3434
35- The ` ExecutionCtx ` carries the session and provides access to registered encodings during
36- execution. Arrays can be executed into different target types:
35+ The ` ExecutionCtx ` carries the session and accumulates a trace of execution steps for debugging.
36+ Arrays can be executed into different target types:
3737
3838- ** ` Canonical ` ** -- fully materializes the array into its canonical form.
3939- ** ` Columnar ` ** -- like ` Canonical ` , but with a variant for constant arrays to avoid
4040 unnecessary expansion.
4141- ** Specific array types** (` PrimitiveArray ` , ` BoolArray ` , ` StructArray ` , etc.) -- executes
4242 to canonical form and unwraps to the expected type, panicking if the dtype does not match.
4343
44+ ## Constant Short-Circuiting
45+
46+ Executing to ` Columnar ` rather than ` Canonical ` enables an important optimization: if the
47+ array is constant (a scalar repeated to a given length), execution returns the ` ConstantArray `
48+ directly rather than expanding it. This avoids allocating and filling a buffer with repeated
49+ values.
50+
51+ ``` rust
52+ pub enum Columnar {
53+ Canonical (Canonical ),
54+ Constant (ConstantArray ),
55+ }
56+ ```
57+
58+ Almost all compute functions can make use of constant input values, and many query engines
59+ support constant vectors directly, avoiding unnecessary expansion.
60+
61+ ## Execution Overview
62+
63+ The ` execute_until<M: Matcher> ` method on ` ArrayRef ` drives execution. The scheduler is
64+ iterative: it rewrites and executes arrays in small steps until the current array matches the
65+ requested target form.
66+
67+ At a high level, each iteration works like this:
68+
69+ 1 . ` optimize(current) ` runs metadata-only rewrites to fixpoint:
70+ ` reduce ` lets an array simplify itself, and ` reduce_parent ` lets a child rewrite its parent.
71+ 2 . If optimization does not finish execution, each child gets a chance to ` execute_parent ` ,
72+ meaning "execute my parent's operation using my representation".
73+ 3 . If no child can do that, the array's own ` execute ` method returns the next ` ExecutionStep ` .
74+
75+ This keeps execution iterative rather than recursive, and it gives optimization rules another
76+ chance to fire after every structural or computational step.
77+
78+ ## The Four Layers
79+
80+ The execution model has four layers, but they are not all invoked in the same way. Layers 1 and
81+ 2 make up ` optimize ` , which runs to fixpoint before and after execution steps. Layers 3 and 4
82+ run only after optimization has stalled.
83+
84+ ```
85+ execute_until(root):
86+ current = optimize(root) # Layers 1-2 to fixpoint
87+
88+ loop:
89+ if current matches target:
90+ return / reattach to parent
91+
92+ Layer 3: try execute_parent on each child
93+ if one succeeds:
94+ current = optimize(result)
95+ continue
96+
97+ Layer 4: call execute(current)
98+ ExecuteChild(i, pred) -> focus child[i], then optimize
99+ Done -> current = optimize(result)
100+ ```
101+
102+ ### Layer 1: ` reduce ` -- self-rewrite rules
103+
104+ An encoding applies ` ArrayReduceRule ` rules to itself. These are structural simplifications
105+ that look only at the array's own metadata and children types, not buffer contents.
106+
107+ Examples:
108+ - A ` FilterArray ` with an all-true mask reduces to its child.
109+ - A ` FilterArray ` with an all-false mask reduces to an empty canonical array.
110+ - A ` ScalarFnArray ` whose children are all constants evaluates once and returns a ` ConstantArray ` .
111+
112+ ### Layer 2: ` reduce_parent ` -- child-driven rewrite rules
113+
114+ Each child is given the opportunity to rewrite its parent via ` ArrayParentReduceRule ` . The child
115+ matches on the parent's type via a ` Matcher ` and can return a replacement. This is still
116+ metadata-only.
117+
118+ Examples:
119+ - A ` FilterArray ` child of another ` FilterArray ` merges the two masks into one.
120+ - A ` PrimitiveArray ` inside a ` MaskedArray ` absorbs the mask into its own validity field.
121+ - A ` DictArray ` child of a ` ScalarFnArray ` pushes the scalar function into the dictionary
122+ values, applying the function to ` N ` unique values instead of ` M >> N ` total rows.
123+ - A ` RunEndArray ` child of a ` ScalarFnArray ` pushes the function into the run values.
124+
125+ ### Layer 3: ` execute_parent ` -- parent kernels
126+
127+ Each child is given the opportunity to execute its parent in a fused manner via
128+ ` ExecuteParentKernel ` . Unlike reduce rules, parent kernels may read buffers and perform real
129+ computation.
130+
131+ An encoding declares its parent kernels in a ` ParentKernelSet ` , specifying which parent types
132+ each kernel handles via a ` Matcher ` :
133+
134+ ``` rust
135+ pub trait ExecuteParentKernel <V : VTable > {
136+ type Parent : Matcher ; // which parent types this kernel handles
137+
138+ fn execute_parent (
139+ & self ,
140+ array : & V :: Array , // the child
141+ parent : <Self :: Parent as Matcher >:: Match <'_ >, // the matched parent
142+ child_idx : usize ,
143+ ctx : & mut ExecutionCtx ,
144+ ) -> VortexResult <Option <ArrayRef >>;
145+ }
146+ ```
147+
148+ Examples:
149+ - A ` RunEndArray ` child of a ` SliceArray ` performs a binary search on the run ends to produce a
150+ new ` RunEndArray ` with adjusted offsets, or a ` ConstantArray ` if the slice falls within a
151+ single run.
152+ - A ` PrimitiveArray ` child of a ` FilterArray ` applies the filter mask directly over its buffer,
153+ producing a filtered ` PrimitiveArray ` in one pass.
154+
155+ ### Layer 4: ` execute ` -- the encoding's own decode step
156+
157+ If no reduce rule or parent kernel handled the array, the encoding's ` VTable::execute ` method
158+ is called. This is the encoding's chance to decode itself one step closer to canonical form.
159+
160+ Instead of recursively executing children inline, ` execute ` returns an ` ExecutionResult `
161+ containing an ` ExecutionStep ` that tells the scheduler what to do next:
162+
163+ ``` rust
164+ pub enum ExecutionStep {
165+ /// Ask the scheduler to execute child[idx] until it matches the predicate,
166+ /// then replace the child and re-enter execution for this array.
167+ ExecuteChild (usize , DonePredicate ),
168+
169+ /// Execution is complete. The array in the ExecutionResult is the result.
170+ Done ,
171+ }
172+ ```
173+
174+ ## The Execution Loop
175+
176+ The full ` execute_until<M: Matcher> ` loop uses an explicit work stack to manage
177+ parent-child relationships without recursion:
178+
179+ ```
180+ execute_until<M>(root):
181+ stack = []
182+ current = optimize(root)
183+
184+ loop:
185+ ┌─────────────────────────────────────────────────────┐
186+ │ Is current "done"? │
187+ │ (matches M if at root, or matches the stack │
188+ │ frame's DonePredicate if inside a child) │
189+ ├──────────────────────┬──────────────────────────────┘
190+ │ yes │ no
191+ │ │
192+ │ stack empty? │ Already canonical?
193+ │ ├─ yes → return │ ├─ yes → pop stack (can't make more progress)
194+ │ └─ no → pop frame, │ └─ no → continue to execution steps
195+ │ replace child, │
196+ │ optimize, loop │
197+ │ ▼
198+ │ ┌────────────────────────────────────┐
199+ │ │ Try execute_parent on each child │
200+ │ │ (Layer 3 parent kernels) │
201+ │ ├────────┬───────────────────────────┘
202+ │ │ Some │ None
203+ │ │ │
204+ │ │ ▼
205+ │ │ ┌─────────────────────────────────┐
206+ │ │ │ Call execute (Layer 4) │
207+ │ │ │ Returns ExecutionResult │
208+ │ │ ├────────┬────────────────────────┘
209+ │ │ │ │
210+ │ │ │ ExecuteChild(i, pred)?
211+ │ │ │ ├─ yes → push (array, i, pred)
212+ │ │ │ │ current = child[i]
213+ │ │ │ │ optimize, loop
214+ │ │ │ └─ Done → current = result
215+ │ │ │ loop
216+ │ │ │
217+ │ ▼ ▼
218+ │ optimize result, loop
219+ └──────────────────────────
220+ ```
221+
222+ Note that ` optimize ` runs after every transformation. This is what enables cross-step
223+ optimizations: after a child is decoded, new ` reduce_parent ` rules may now match that were
224+ previously blocked.
225+
44226## Incremental Execution
45227
46228Execution is incremental: each call to ` execute ` moves the array one step closer to canonical
47- form, not necessarily all the way. This gives each child the opportunity to optimize before the
229+ form, not necessarily all the way. This gives each child the opportunity to optimize before the
48230next iteration of execution.
49231
50232For example, consider a ` DictArray ` whose codes are a sliced ` RunEndArray ` . Dict-RLE is a common
@@ -60,59 +242,82 @@ If execution jumped straight to canonicalizing the dict's children, it would exp
60242codes through the slice, missing the Dict-RLE optimization entirely. Incremental execution
61243avoids this:
62244
63- 1 . First iteration: the slice executes and returns a new ` RunEndArray ` whose offsets have been binary searched.
245+ 1 . First iteration: the slice ` execute_parent ` (parent kernel on RunEnd for Slice) performs a
246+ binary search on run ends, returning a new ` RunEndArray ` with adjusted offsets.
64247
652482 . Second iteration: the ` RunEndArray ` codes child now matches the Dict-RLE pattern. Its
66249 ` execute_parent ` provides a fused kernel that expands runs while performing dictionary
67250 lookups in a single pass, returning the canonical array directly.
68251
69- The execution loop runs until the array is canonical or constant:
70-
71- 1 . ** Child optimization** -- each child is given the opportunity to optimize its parent's
72- execution by calling ` execute_parent ` on the child's vtable. If a child can handle the
73- parent more efficiently, it returns the result directly.
74-
75- 2 . ** Incremental execution** -- if no child provides an optimized path, the array's own
76- ` execute ` vtable method is called. This executes children one step and returns a new
77- array that is closer to canonical form, or executes the array itself.
78-
79- ## Constant Short-Circuiting
252+ ## Walkthrough: Executing a RunEnd-Encoded Array
80253
81- Executing to ` Columnar ` rather than ` Canonical ` enables an important optimization: if the
82- array is constant (a scalar repeated to a given length), execution returns the ` ConstantArray `
83- directly rather than expanding it. This avoids allocating and filling a buffer with repeated
84- values.
254+ To make the execution flow concrete, here is a step-by-step trace of executing a
255+ ` RunEndArray ` to ` Canonical ` :
85256
86- ``` rust
87- pub enum Columnar {
88- Canonical (Canonical ),
89- Constant (ConstantArray ),
90- }
257+ ```
258+ Input: RunEndArray { ends: [3, 7, 10], values: [A, B, C], len: 10 }
259+ Goal: Canonical (PrimitiveArray or similar)
260+
261+ Iteration 1:
262+ reduce? → None (no self-rewrite rules match)
263+ reduce_parent? → None (no parent, this is root)
264+ execute_parent? → None (no parent)
265+ execute → ends are not Primitive yet?
266+ ExecuteChild(0, Primitive::matches)
267+ Stack: [(RunEnd, child_idx=0, Primitive::matches)]
268+ Focus on: ends
269+
270+ Iteration 2:
271+ Current: ends array
272+ Already Primitive? → yes, done.
273+ Pop stack → replace child 0 in RunEnd, optimize.
274+
275+ Iteration 3:
276+ reduce? → None
277+ reduce_parent? → None
278+ execute_parent? → None
279+ execute → values are not Canonical yet?
280+ ExecuteChild(1, AnyCanonical::matches)
281+ Stack: [(RunEnd, child_idx=1, AnyCanonical::matches)]
282+ Focus on: values
283+
284+ Iteration 4:
285+ Current: values array
286+ Already Canonical? → yes, done.
287+ Pop stack → replace child 1 in RunEnd, optimize.
288+
289+ Iteration 5:
290+ reduce? → None
291+ reduce_parent? → None
292+ execute_parent? → None
293+ execute → all children ready, decode runs:
294+ [A, A, A, B, B, B, B, C, C, C]
295+ Done → return PrimitiveArray
296+
297+ → Result: PrimitiveArray [A, A, A, B, B, B, B, C, C, C]
91298```
92299
93- Almost all compute functions can make use of constant input values, and many query engines support constant vectors
94- avoiding unnecessary expansion.
95-
96- ## ScalarFnArray
300+ ## Implementing an Encoding: Where Does My Logic Go?
97301
98- A ` ScalarFnArray ` holds a scalar function (the operation to perform), a list of child arrays
99- (the inputs), and the expected output dtype and length. It is itself a valid Vortex array and
100- can be nested, sliced, and passed through the same APIs as any other array.
302+ When adding a new encoding or optimizing an existing one, the key question is whether the
303+ transformation needs to read buffer data:
101304
102- When an expression like ` {x: $, y: $ + 1} ` is applied to a bit-packed integer array, the
103- result is a tree of ` ScalarFnArray ` nodes rather than a materialized struct:
305+ | If you need to... | Put it in | Example |
306+ | -------------------| -----------| ---------|
307+ | Rewrite the array by looking only at its own structure | ` reduce ` (Layer 1) | ` FilterArray ` removes itself when the mask is all true |
308+ | Rewrite the parent by looking at your type and the parent's structure | ` reduce_parent ` (Layer 2) | ` DictArray ` pushes a scalar function into its values |
309+ | Execute the parent's operation using your compressed representation | ` execute_parent ` / parent kernel (Layer 3) | ` PrimitiveArray ` applies a filter mask directly over its buffer |
310+ | Decode yourself toward canonical form | ` execute ` (Layer 4) | ` RunEndArray ` expands runs into a ` PrimitiveArray ` |
104311
105- ```
106- scalar_fn(struct.pack):
107- children:
108- - bitpacked(...) # x: passed through unchanged
109- - scalar_fn(binary.add): # y: deferred addition
110- children:
111- - bitpacked(...) # original array
112- - constant(1) # literal 1
113- ```
312+ Rules of thumb:
114313
115- Nothing is computed until the tree is executed.
314+ - Prefer ` reduce ` over ` execute ` when possible. Reduce rules are cheaper because they are
315+ metadata-only and run before any buffers are touched.
316+ - Parent rules and parent kernels enable the "child sees parent" pattern. A child encoding often
317+ knows how to handle its parent's operation more efficiently than the parent knows how to handle
318+ the child.
319+ - Treat ` execute ` as the fallback. If no reduce rule or parent kernel applies, the encoding
320+ decodes itself and uses ` ExecuteChild ` to request child execution when needed.
116321
117322## Future Work
118323
0 commit comments