feat(vortex-row): row-oriented byte encoder (size + encode passes) by joseph-isaacs · Pull Request #8056 · vortex-data/vortex

joseph-isaacs · 2026-05-22T09:49:07Z

Summary

Adds vortex-row, a new crate that encodes one or more columnar Vortex arrays into a single
ListView<u8> whose per-row byte slices are lexicographically comparable. The byte order
matches tuple ordering of the input values under per-column sort options, so the output works
directly as a sort key / row key — the Vortex analogue of arrow-row.

This is the base of the row-encoding work: the byte codec, the two-pass converter, the
public API, tests, and benches. Codec hot-path performance tuning and the per-encoding
(Constant / Dict / Patched / RunEnd / BitPacked / FoR / Delta) fast-path kernels land in
follow-up PRs and are intentionally out of scope here.

Design

Encoding runs as two scalar functions wired behind the RowEncoder API:

Size pass — RowSize. Walks the N input columns once, classifies each column as
fixed- or variable-width, accumulates the fixed-width prefix per row, and lazily collects
per-row variable lengths. Returns Struct { fixed: u32, var: u32 } so callers read per-row
widths without materializing the constant fixed slot as a per-row buffer.
Encode pass — RowEncode. Uses those sizes to compute totals, allocate one contiguous
elements buffer, build per-row absolute offsets, then writes each column left-to-right into
its per-row slot via a write cursor that doubles as the ListView sizes array — so no
separate finalize step is needed.

The converter is effectively 2 passes for the pure-fixed-width case and 3 when
variable-length columns require the prefix-sum offsets pass.

Per-column ordering is controlled by RowSortField { descending, nulls_first }: descending
reverses the encoded value bytes, and a leading sentinel byte (0x00 / 0x01 / 0x02) places
nulls before or after non-nulls independently of sort direction.

Public API

RowEncoder — primary entry point: new / with_options, encode, row_sizes, options.
convert_columns / compute_row_sizes — convenience helpers around RowEncoder.
RowEncodingOptions + RowSortField — per-column sort configuration.
initialize(session) — registers RowSize and RowEncode on a VortexSession so row
encoding is reachable through the expression layer.

`convert_columns` and where the API lives

convert_columns(cols: &[ArrayRef], fields: &[RowSortField], ctx) -> VortexResult<ListViewArray>
is the one-shot entry point; RowEncoder is the reusable form (build once with options, encode
many). Each public item is defined in a single file:

Item	File
`RowEncoder`, `convert_columns(_with_options)`, `compute_row_sizes(_with_options)`	`src/encoder.rs`
`RowEncode` scalar fn + the 5-phase encode driver	`src/encode.rs`
`RowSize` scalar fn + the size/classify pass (`compute_sizes`)	`src/size.rs`
`RowEncodingOptions`, `RowSortField`	`src/options.rs`
per-dtype byte codec (`field_size` / `field_encode`)	`src/codec.rs`
`initialize(session)` + re-exports	`src/lib.rs`

Implementation. convert_columns validates the columns (non-empty, equal lengths, one
RowSortField per column) and delegates to a RowEncoder, which runs RowSize then
RowEncode: the size pass canonicalizes each column once and classifies it fixed- vs
variable-width, accumulating a constant fixed_per_row and — only when needed — per-row
var_lengths; the encode pass sums those into the total byte length, allocates one contiguous
elements buffer, computes per-row offsets (i * fixed_per_row plus a varlen prefix sum), then
writes each column left-to-right into its per-row slot using a cursor that becomes the ListView
sizes array, producing the final ListView<u8> with no separate finalize step.

Type coverage

Supported: nulls, booleans, integer/float primitives, decimals up to 128 bits, UTF-8 and
binary, structs, fixed-size lists, and extensions whose storage type is supported. Variant,
union, and variable-size list arrays are rejected — this crate does not define an ordering for
them.

Testing

cargo nextest run -p vortex-row — sort-order round-trip tests (bool, i64 asc/desc, u32,
f64, utf8, multi-column, nulls first/last, struct), the single-buffer ListView invariant, and
RowSize output shape.
cargo bench -p vortex-row — row_encode divan benchmarks against an arrow-row baseline
(primitive i64, utf8, struct_mixed).

Add an empty `vortex-row` crate with a minimal `initialize` stub so the following commits can layer in the row-encoder, codec, scalar functions, and per-encoding kernels without touching the workspace skeleton each time. The crate is wired into the workspace members list and workspace dependency table; `public-api.lock` is generated against the stub. Signed-off-by: Claude <noreply@anthropic.com>

Introduce the per-column sort-field options and the variadic-function options struct used by the upcoming RowSize / RowEncode scalar functions. `RowEncodeOptions::fields` uses a `SmallVec<[SortField; 4]>` so typical 1-4 column keys avoid a heap allocation. Includes a compact serialize / deserialize helper used later by the scalar-function metadata round-trip. Signed-off-by: Claude <noreply@anthropic.com>

Add the byte-encoding kernels for the fixed-width portion of the row encoder: Null, Bool, Primitive (12 PTypes), and Decimal (i8..i128). Each encoder writes a 1-byte sentinel followed by the value's row-comparable bytes (sign-flipped big-endian for signed ints, sign-aware mask for floats, etc.). The size pass is a constant `width-per-row` add for these types; the encode pass walks rows and writes into the shared output buffer at `offsets[i] + cursors[i]`. `row_width_for_dtype` classifies the column based purely on its DType. Scalar-level encoders (`encode_scalar_primitive` / `encode_scalar_bool` / `encode_scalar_null` / `encode_scalar` / `encoded_size_for_scalar`) are included for the same fixed-width subset; varlen and nested canonical variants bail with a clear "not yet supported" error and land in follow-up commits. The implementation is deliberately the simplest correct version: bounds-checked array indexing, no `copy_nonoverlapping`, no validity fast-path helper. Subsequent PRs evolve this toward the optimized form. Signed-off-by: Claude <noreply@anthropic.com>

Extend the codec to handle Utf8/Binary via VarBinView arrays. Each value encodes as a 1-byte sentinel followed by 32-byte chunks: every full chunk has a 0xFF continuation marker; the final partial chunk pads with zeros and writes the partial length (1..=32) as its trailing byte. `encode_varlen_value` uses the simple byte-at-a-time XOR loop here; a faster `copy_nonoverlapping` + stamped continuation version replaces it in PR 2. `encode_varbinview` uses `arr.with_iterator(...)` for both the nullable and non-nullable branches; a direct view walk for the no-nulls branch lands in PR 2 too. `row_width_for_dtype` now returns `Variable` for Utf8/Binary; the size pass and encode dispatchers route through `add_size_varbinview` / `encode_varbinview` correspondingly. The scalar encoder gains `encode_scalar_varlen` and the matching Utf8/Binary arms. Signed-off-by: Claude <noreply@anthropic.com>

Extend the codec to handle Struct, FixedSizeList, and Extension canonical variants. Each nested row encodes as `outer_sentinel | child bytes...`; for null rows the child bytes are zero-filled after the recursive encoders run so two null rows compare equal regardless of which non-null values would have been written by the children. `row_width_for_dtype` recurses through Struct fields and FSL elements to return `Fixed(w)` when every leaf is fixed; otherwise `Variable`. Extension delegates to its storage dtype. List remains `Variable` and ListView still bails (the row encoder's output is itself a ListView, so nested ListView isn't a near-term use case). Variant and Union bail explicitly. Signed-off-by: Claude <noreply@anthropic.com>

Add the size-pass machinery used by both RowSize and the upcoming RowEncode pipeline. `compute_sizes` walks the N input columns once, classifying each via `row_width_for_dtype` and accumulating fixed-width-prefix sums in `fixed_per_row` while pushing per-row sums of variable-length columns into a lazily allocated `var_lengths` vec. The classification result (`ColKind` + `SizePassResult`) is private to the crate; RowEncode consumes it in a later commit to choose between the arithmetic and cursor encode paths. `RowSize` returns a `Struct { fixed: U32, var: U32 }` so callers can read the per-row width without realizing the constant `fixed` slot as a per-row buffer (it's a `ConstantArray`); the `var` slot is a `ConstantArray(0)` when no varlen column is present. `dispatch_size` is the fallback-only path for PR 1 (canonicalize, then codec::field_size). The `RowSizeKernel` trait exists but is unused; per- encoding fast paths and the inventory registry arrive in PR 3. `initialize()` does NOT register RowSize yet - that lands once RowEncode is in place, so the session-registered pair appears together. Signed-off-by: Claude <noreply@anthropic.com>

Add the RowEncode variadic scalar function: encode N input columns into a single ListView<u8> in a five-phase pipeline. Phase 1: size pass via `compute_sizes`. Phase 2: allocate a zero-initialized output buffer sized to fit every row's encoded bytes; bail if the total exceeds u32::MAX. Phase 3: build per-row `listview_offsets`: i * fixed_per_row for the pure-fixed case, or i * fixed_per_row + exclusive cumsum of varlen lengths otherwise. Uses the simple `Vec::push` + `checked_add` loop. Phase 4: walk columns left-to-right and call `dispatch_encode` for every column (cursor path for all). Each call writes its per-row bytes at `offsets[i] + cursors[i]` and advances the cursor. Phase 5: build the ListView<u8> via the validating `try_new` constructor. `dispatch_encode` is the canonicalize-then-`codec::field_encode` fallback; in-crate kernel arms and the inventory registry land in PR 3. The `RowEncodeKernel` trait is defined but unused. PR 2 will iterate on this pipeline (skip zero-init, skip ListView validation, auto- vectorize the offsets loop, etc.). Signed-off-by: Claude <noreply@anthropic.com>

Wire the RowSize/RowEncode scalar functions to the user-facing API: - `convert_columns` accepts a slice of input arrays and per-column SortFields, constructs `RowEncodeOptions` + `VecExecutionArgs`, and returns the encoded `ListViewArray<u8>`. - `compute_row_sizes` returns just the per-row sizes (the `Struct { fixed: u32, var: u32 }` output of `RowSize`). - `initialize()` now registers `RowSize` and `RowEncode` on the given session so they are reachable via the expression layer. Tests cover sort-order round-trips for bool, primitive (i64 asc/desc, u32, f64), utf8, multi-column, nulls_first/last, struct sort-order, the single-buffer invariant of the ListView output, and the structural shape of `RowSize`. Tests that exercise per-encoding fast paths (`constant_path_matches_canonical`, `dict_path_matches_canonical`) land together with their respective kernels in PR 3. The bench file uses divan + mimalloc and reports throughput in GB/s of encoded output bytes for primitive_i64, utf8, and struct_mixed. Each has an `arrow_row` baseline and a `vortex` measurement. Per-encoding fast-path scenarios (constant/dict/patched/bitpacked/for/delta) gain their triplets in PR 3. Baseline measurements at this commit (sample-count=10): primitive_i64_vortex ~1.97 GB/s (vs arrow-row 4.12 GB/s) utf8_vortex ~0.87 GB/s (vs arrow-row 1.56 GB/s) struct_mixed_vortex ~0.95 GB/s (vs arrow-row 1.19 GB/s) PR 2 closes most of the gap by replacing the validating `ListViewArray::try_new` with `new_unchecked`, skipping the buffer zero-init, auto-vectorizing the offsets and varlen-block paths, etc. Signed-off-by: Claude <noreply@anthropic.com>

codspeed-hq · 2026-05-22T09:50:19Z

Merging this PR will degrade performance by 11.29%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 1506 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`baseline_lt[16, 65536]`	216.1 µs	243.7 µs	-11.29%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ji/row-pr1-base (e5c07bb) with develop (3ac6c77)}

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

claude added 8 commits May 17, 2026 22:00

joseph-isaacs changed the title ~~feat: vortex row crate~~ feat(vortex-row): row-oriented byte encoder (size + encode passes) Jun 4, 2026

joseph-isaacs added 2 commits June 4, 2026 14:19

t

10667d5

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

t

e5c07bb

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs force-pushed the ji/row-pr1-base branch from 48f59ce to e5c07bb Compare June 4, 2026 14:21

joseph-isaacs closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vortex-row): row-oriented byte encoder (size + encode passes)#8056

feat(vortex-row): row-oriented byte encoder (size + encode passes)#8056
joseph-isaacs wants to merge 10 commits into
developfrom
ji/row-pr1-base

joseph-isaacs commented May 22, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Public API

convert_columns and where the API lives

Type coverage

Testing

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 11.29%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joseph-isaacs commented May 22, 2026 •

edited

Loading

`convert_columns` and where the API lives

codspeed-hq Bot commented May 22, 2026 •

edited

Loading