Skip to content

Commit 66b0f40

Browse files
authored
Expand execution docs (#7158)
## Summary This PR extends the execution flow docs, both on the website and in the codebase itself. The general points I've tried to clear up are: 1. How are things called and how all types interact. 2. "Where should feature X go" which has been a pain point in the past 3. Tried to have some docs for more types, even if its just a pointer to some other type. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
1 parent 97e6c45 commit 66b0f40

File tree

8 files changed

+365
-54
lines changed

8 files changed

+365
-54
lines changed

docs/developer-guide/internals/execution.md

Lines changed: 252 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Execution
22

3-
Vortex defers computation wherever possible. When an expression is applied to an array, the
4-
result is not computed immediately. Instead, a `ScalarFnArray` is constructed that captures the
5-
operation and its inputs as a new array node. The actual computation happens later, when the
6-
array is materialized -- either by a scan, a query engine, or an explicit `execute()` call.
3+
Vortex defers computation wherever possible. Instead of immediately materializing intermediate
4+
results, it represents them as arrays that still describe work to be done, such as `FilterArray`
5+
and [`ScalarFnArray`](../../concepts/expressions.md). The actual computation happens later, when
6+
the array is materialized -- either by a scan, a query engine, or an explicit `execute()` call.
77

88
## Why Defer
99

@@ -32,19 +32,201 @@ pub trait Executable: Sized {
3232
}
3333
```
3434

35-
The `ExecutionCtx` carries the session and provides access to registered encodings during
36-
execution. Arrays can be executed into different target types:
35+
The `ExecutionCtx` carries the session and accumulates a trace of execution steps for debugging.
36+
Arrays can be executed into different target types:
3737

3838
- **`Canonical`** -- fully materializes the array into its canonical form.
3939
- **`Columnar`** -- like `Canonical`, but with a variant for constant arrays to avoid
4040
unnecessary expansion.
4141
- **Specific array types** (`PrimitiveArray`, `BoolArray`, `StructArray`, etc.) -- executes
4242
to canonical form and unwraps to the expected type, panicking if the dtype does not match.
4343

44+
## Constant Short-Circuiting
45+
46+
Executing to `Columnar` rather than `Canonical` enables an important optimization: if the
47+
array is constant (a scalar repeated to a given length), execution returns the `ConstantArray`
48+
directly rather than expanding it. This avoids allocating and filling a buffer with repeated
49+
values.
50+
51+
```rust
52+
pub enum Columnar {
53+
Canonical(Canonical),
54+
Constant(ConstantArray),
55+
}
56+
```
57+
58+
Almost all compute functions can make use of constant input values, and many query engines
59+
support constant vectors directly, avoiding unnecessary expansion.
60+
61+
## Execution Overview
62+
63+
The `execute_until<M: Matcher>` method on `ArrayRef` drives execution. The scheduler is
64+
iterative: it rewrites and executes arrays in small steps until the current array matches the
65+
requested target form.
66+
67+
At a high level, each iteration works like this:
68+
69+
1. `optimize(current)` runs metadata-only rewrites to fixpoint:
70+
`reduce` lets an array simplify itself, and `reduce_parent` lets a child rewrite its parent.
71+
2. If optimization does not finish execution, each child gets a chance to `execute_parent`,
72+
meaning "execute my parent's operation using my representation".
73+
3. If no child can do that, the array's own `execute` method returns the next `ExecutionStep`.
74+
75+
This keeps execution iterative rather than recursive, and it gives optimization rules another
76+
chance to fire after every structural or computational step.
77+
78+
## The Four Layers
79+
80+
The execution model has four layers, but they are not all invoked in the same way. Layers 1 and
81+
2 make up `optimize`, which runs to fixpoint before and after execution steps. Layers 3 and 4
82+
run only after optimization has stalled.
83+
84+
```
85+
execute_until(root):
86+
current = optimize(root) # Layers 1-2 to fixpoint
87+
88+
loop:
89+
if current matches target:
90+
return / reattach to parent
91+
92+
Layer 3: try execute_parent on each child
93+
if one succeeds:
94+
current = optimize(result)
95+
continue
96+
97+
Layer 4: call execute(current)
98+
ExecuteChild(i, pred) -> focus child[i], then optimize
99+
Done -> current = optimize(result)
100+
```
101+
102+
### Layer 1: `reduce` -- self-rewrite rules
103+
104+
An encoding applies `ArrayReduceRule` rules to itself. These are structural simplifications
105+
that look only at the array's own metadata and children types, not buffer contents.
106+
107+
Examples:
108+
- A `FilterArray` with an all-true mask reduces to its child.
109+
- A `FilterArray` with an all-false mask reduces to an empty canonical array.
110+
- A `ScalarFnArray` whose children are all constants evaluates once and returns a `ConstantArray`.
111+
112+
### Layer 2: `reduce_parent` -- child-driven rewrite rules
113+
114+
Each child is given the opportunity to rewrite its parent via `ArrayParentReduceRule`. The child
115+
matches on the parent's type via a `Matcher` and can return a replacement. This is still
116+
metadata-only.
117+
118+
Examples:
119+
- A `FilterArray` child of another `FilterArray` merges the two masks into one.
120+
- A `PrimitiveArray` inside a `MaskedArray` absorbs the mask into its own validity field.
121+
- A `DictArray` child of a `ScalarFnArray` pushes the scalar function into the dictionary
122+
values, applying the function to `N` unique values instead of `M >> N` total rows.
123+
- A `RunEndArray` child of a `ScalarFnArray` pushes the function into the run values.
124+
125+
### Layer 3: `execute_parent` -- parent kernels
126+
127+
Each child is given the opportunity to execute its parent in a fused manner via
128+
`ExecuteParentKernel`. Unlike reduce rules, parent kernels may read buffers and perform real
129+
computation.
130+
131+
An encoding declares its parent kernels in a `ParentKernelSet`, specifying which parent types
132+
each kernel handles via a `Matcher`:
133+
134+
```rust
135+
pub trait ExecuteParentKernel<V: VTable> {
136+
type Parent: Matcher; // which parent types this kernel handles
137+
138+
fn execute_parent(
139+
&self,
140+
array: &V::Array, // the child
141+
parent: <Self::Parent as Matcher>::Match<'_>, // the matched parent
142+
child_idx: usize,
143+
ctx: &mut ExecutionCtx,
144+
) -> VortexResult<Option<ArrayRef>>;
145+
}
146+
```
147+
148+
Examples:
149+
- A `RunEndArray` child of a `SliceArray` performs a binary search on the run ends to produce a
150+
new `RunEndArray` with adjusted offsets, or a `ConstantArray` if the slice falls within a
151+
single run.
152+
- A `PrimitiveArray` child of a `FilterArray` applies the filter mask directly over its buffer,
153+
producing a filtered `PrimitiveArray` in one pass.
154+
155+
### Layer 4: `execute` -- the encoding's own decode step
156+
157+
If no reduce rule or parent kernel handled the array, the encoding's `VTable::execute` method
158+
is called. This is the encoding's chance to decode itself one step closer to canonical form.
159+
160+
Instead of recursively executing children inline, `execute` returns an `ExecutionResult`
161+
containing an `ExecutionStep` that tells the scheduler what to do next:
162+
163+
```rust
164+
pub enum ExecutionStep {
165+
/// Ask the scheduler to execute child[idx] until it matches the predicate,
166+
/// then replace the child and re-enter execution for this array.
167+
ExecuteChild(usize, DonePredicate),
168+
169+
/// Execution is complete. The array in the ExecutionResult is the result.
170+
Done,
171+
}
172+
```
173+
174+
## The Execution Loop
175+
176+
The full `execute_until<M: Matcher>` loop uses an explicit work stack to manage
177+
parent-child relationships without recursion:
178+
179+
```
180+
execute_until<M>(root):
181+
stack = []
182+
current = optimize(root)
183+
184+
loop:
185+
┌─────────────────────────────────────────────────────┐
186+
│ Is current "done"? │
187+
│ (matches M if at root, or matches the stack │
188+
│ frame's DonePredicate if inside a child) │
189+
├──────────────────────┬──────────────────────────────┘
190+
│ yes │ no
191+
│ │
192+
│ stack empty? │ Already canonical?
193+
│ ├─ yes → return │ ├─ yes → pop stack (can't make more progress)
194+
│ └─ no → pop frame, │ └─ no → continue to execution steps
195+
│ replace child, │
196+
│ optimize, loop │
197+
│ ▼
198+
│ ┌────────────────────────────────────┐
199+
│ │ Try execute_parent on each child │
200+
│ │ (Layer 3 parent kernels) │
201+
│ ├────────┬───────────────────────────┘
202+
│ │ Some │ None
203+
│ │ │
204+
│ │ ▼
205+
│ │ ┌─────────────────────────────────┐
206+
│ │ │ Call execute (Layer 4) │
207+
│ │ │ Returns ExecutionResult │
208+
│ │ ├────────┬────────────────────────┘
209+
│ │ │ │
210+
│ │ │ ExecuteChild(i, pred)?
211+
│ │ │ ├─ yes → push (array, i, pred)
212+
│ │ │ │ current = child[i]
213+
│ │ │ │ optimize, loop
214+
│ │ │ └─ Done → current = result
215+
│ │ │ loop
216+
│ │ │
217+
│ ▼ ▼
218+
│ optimize result, loop
219+
└──────────────────────────
220+
```
221+
222+
Note that `optimize` runs after every transformation. This is what enables cross-step
223+
optimizations: after a child is decoded, new `reduce_parent` rules may now match that were
224+
previously blocked.
225+
44226
## Incremental Execution
45227

46228
Execution is incremental: each call to `execute` moves the array one step closer to canonical
47-
form, not necessarily all the way. This gives each child the opportunity to optimize before the
229+
form, not necessarily all the way. This gives each child the opportunity to optimize before the
48230
next iteration of execution.
49231

50232
For example, consider a `DictArray` whose codes are a sliced `RunEndArray`. Dict-RLE is a common
@@ -60,59 +242,82 @@ If execution jumped straight to canonicalizing the dict's children, it would exp
60242
codes through the slice, missing the Dict-RLE optimization entirely. Incremental execution
61243
avoids this:
62244

63-
1. First iteration: the slice executes and returns a new `RunEndArray` whose offsets have been binary searched.
245+
1. First iteration: the slice `execute_parent` (parent kernel on RunEnd for Slice) performs a
246+
binary search on run ends, returning a new `RunEndArray` with adjusted offsets.
64247

65248
2. Second iteration: the `RunEndArray` codes child now matches the Dict-RLE pattern. Its
66249
`execute_parent` provides a fused kernel that expands runs while performing dictionary
67250
lookups in a single pass, returning the canonical array directly.
68251

69-
The execution loop runs until the array is canonical or constant:
70-
71-
1. **Child optimization** -- each child is given the opportunity to optimize its parent's
72-
execution by calling `execute_parent` on the child's vtable. If a child can handle the
73-
parent more efficiently, it returns the result directly.
74-
75-
2. **Incremental execution** -- if no child provides an optimized path, the array's own
76-
`execute` vtable method is called. This executes children one step and returns a new
77-
array that is closer to canonical form, or executes the array itself.
78-
79-
## Constant Short-Circuiting
252+
## Walkthrough: Executing a RunEnd-Encoded Array
80253

81-
Executing to `Columnar` rather than `Canonical` enables an important optimization: if the
82-
array is constant (a scalar repeated to a given length), execution returns the `ConstantArray`
83-
directly rather than expanding it. This avoids allocating and filling a buffer with repeated
84-
values.
254+
To make the execution flow concrete, here is a step-by-step trace of executing a
255+
`RunEndArray` to `Canonical`:
85256

86-
```rust
87-
pub enum Columnar {
88-
Canonical(Canonical),
89-
Constant(ConstantArray),
90-
}
257+
```
258+
Input: RunEndArray { ends: [3, 7, 10], values: [A, B, C], len: 10 }
259+
Goal: Canonical (PrimitiveArray or similar)
260+
261+
Iteration 1:
262+
reduce? → None (no self-rewrite rules match)
263+
reduce_parent? → None (no parent, this is root)
264+
execute_parent? → None (no parent)
265+
execute → ends are not Primitive yet?
266+
ExecuteChild(0, Primitive::matches)
267+
Stack: [(RunEnd, child_idx=0, Primitive::matches)]
268+
Focus on: ends
269+
270+
Iteration 2:
271+
Current: ends array
272+
Already Primitive? → yes, done.
273+
Pop stack → replace child 0 in RunEnd, optimize.
274+
275+
Iteration 3:
276+
reduce? → None
277+
reduce_parent? → None
278+
execute_parent? → None
279+
execute → values are not Canonical yet?
280+
ExecuteChild(1, AnyCanonical::matches)
281+
Stack: [(RunEnd, child_idx=1, AnyCanonical::matches)]
282+
Focus on: values
283+
284+
Iteration 4:
285+
Current: values array
286+
Already Canonical? → yes, done.
287+
Pop stack → replace child 1 in RunEnd, optimize.
288+
289+
Iteration 5:
290+
reduce? → None
291+
reduce_parent? → None
292+
execute_parent? → None
293+
execute → all children ready, decode runs:
294+
[A, A, A, B, B, B, B, C, C, C]
295+
Done → return PrimitiveArray
296+
297+
→ Result: PrimitiveArray [A, A, A, B, B, B, B, C, C, C]
91298
```
92299

93-
Almost all compute functions can make use of constant input values, and many query engines support constant vectors
94-
avoiding unnecessary expansion.
95-
96-
## ScalarFnArray
300+
## Implementing an Encoding: Where Does My Logic Go?
97301

98-
A `ScalarFnArray` holds a scalar function (the operation to perform), a list of child arrays
99-
(the inputs), and the expected output dtype and length. It is itself a valid Vortex array and
100-
can be nested, sliced, and passed through the same APIs as any other array.
302+
When adding a new encoding or optimizing an existing one, the key question is whether the
303+
transformation needs to read buffer data:
101304

102-
When an expression like `{x: $, y: $ + 1}` is applied to a bit-packed integer array, the
103-
result is a tree of `ScalarFnArray` nodes rather than a materialized struct:
305+
| If you need to... | Put it in | Example |
306+
|-------------------|-----------|---------|
307+
| Rewrite the array by looking only at its own structure | `reduce` (Layer 1) | `FilterArray` removes itself when the mask is all true |
308+
| Rewrite the parent by looking at your type and the parent's structure | `reduce_parent` (Layer 2) | `DictArray` pushes a scalar function into its values |
309+
| Execute the parent's operation using your compressed representation | `execute_parent` / parent kernel (Layer 3) | `PrimitiveArray` applies a filter mask directly over its buffer |
310+
| Decode yourself toward canonical form | `execute` (Layer 4) | `RunEndArray` expands runs into a `PrimitiveArray` |
104311

105-
```
106-
scalar_fn(struct.pack):
107-
children:
108-
- bitpacked(...) # x: passed through unchanged
109-
- scalar_fn(binary.add): # y: deferred addition
110-
children:
111-
- bitpacked(...) # original array
112-
- constant(1) # literal 1
113-
```
312+
Rules of thumb:
114313

115-
Nothing is computed until the tree is executed.
314+
- Prefer `reduce` over `execute` when possible. Reduce rules are cheaper because they are
315+
metadata-only and run before any buffers are touched.
316+
- Parent rules and parent kernels enable the "child sees parent" pattern. A child encoding often
317+
knows how to handle its parent's operation more efficiently than the parent knows how to handle
318+
the child.
319+
- Treat `execute` as the fallback. If no reduce rule or parent kernel applies, the encoding
320+
decodes itself and uses `ExecuteChild` to request child execution when needed.
116321

117322
## Future Work
118323

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vortex-array/src/arrays/filter/kernel.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
// SPDX-License-Identifier: Apache-2.0
22
// SPDX-FileCopyrightText: Copyright the Vortex contributors
33

4+
//! Reduce and execute adaptors for filter operations.
5+
//!
6+
//! Encodings that know how to filter themselves implement [`FilterReduce`] (metadata-only)
7+
//! or [`FilterKernel`] (buffer-reading). The adaptors [`FilterReduceAdaptor`] and
8+
//! [`FilterExecuteAdaptor`] bridge these into the execution model as
9+
//! [`ArrayParentReduceRule`] and [`ExecuteParentKernel`] respectively.
10+
411
use vortex_error::VortexResult;
512
use vortex_mask::Mask;
613

@@ -70,6 +77,7 @@ fn precondition<V: VTable>(array: &V::Array, mask: &Mask) -> Option<ArrayRef> {
7077
None
7178
}
7279

80+
/// Adaptor that wraps a [`FilterReduce`] impl as an [`ArrayParentReduceRule`].
7381
#[derive(Default, Debug)]
7482
pub struct FilterReduceAdaptor<V>(pub V);
7583

@@ -93,6 +101,7 @@ where
93101
}
94102
}
95103

104+
/// Adaptor that wraps a [`FilterKernel`] impl as an [`ExecuteParentKernel`].
96105
#[derive(Default, Debug)]
97106
pub struct FilterExecuteAdaptor<V>(pub V);
98107

0 commit comments

Comments
 (0)