perf[buffer]: iteration for fallible operations with validity by joseph-isaacs · Pull Request #8120 · vortex-data/vortex

joseph-isaacs · 2026-05-27T09:59:24Z

Currently use (and arrow) handle fallible operations with scalar (non-SIMD) code.

This PR add a trait and methods to have fast SIMD checked operations (includes cast) but verified else where that checked_add benefits

codspeed-hq · 2026-05-27T10:12:35Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 8 improved benchmarks
❌ 12 regressed benchmarks
✅ 1524 untouched benchmarks
🆕 3 new benchmarks
⏩ 11 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)]`	120.7 µs	183 µs	-34.05%
❌	Simulation	`encode_varbin[(1000, 2)]`	176.1 µs	237.1 µs	-25.74%
❌	Simulation	`take_10k_random`	196.9 µs	255.6 µs	-22.97%
❌	Simulation	`patched_take_10k_contiguous_patches`	230.6 µs	290.8 µs	-20.69%
❌	Simulation	`patched_take_10k_random`	242.6 µs	302.8 µs	-19.89%
❌	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	161.8 µs	198 µs	-18.28%
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	177.1 µs	213.5 µs	-17.06%
❌	Simulation	`bench_many_codes_few_values[1024]`	393.2 µs	465.6 µs	-15.55%
❌	Simulation	`decompress_rd[f64, (100000, 0.0)]`	845.5 µs	982.8 µs	-13.97%
❌	Simulation	`varbinview_large`	112.2 µs	130.4 µs	-13.97%
❌	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	273.8 µs	307.9 µs	-11.08%
❌	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	326.4 µs	365 µs	-10.58%
⚡	Simulation	`sum_i32_nullable_all_valid`	69.2 µs	35.4 µs	+95.64%
⚡	Simulation	`null_count_run_end[(10000, 4, 0.01)]`	125.4 µs	91.6 µs	+36.98%
⚡	Simulation	`encode_varbinview[(1000, 2)]`	189 µs	157.1 µs	+20.26%
⚡	Simulation	`chunked_varbinview_opt_into_canonical[(1000, 10)]`	229.3 µs	192.7 µs	+18.96%
⚡	Simulation	`and_bool_nullable`	93.7 µs	82.6 µs	+13.41%
⚡	Simulation	`baseline_lt[4, 1024]`	78.5 µs	69.9 µs	+12.23%
⚡	Simulation	`decompress_rd[f64, (100000, 0.01)]`	981.2 µs	890.4 µs	+10.2%
⚡	Simulation	`decompress_rd[f64, (100000, 0.1)]`	981.2 µs	890.4 µs	+10.19%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ji/fast-iter-valid (7ed76bc) with develop (679e2c5)²}

11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on develop (9f494a1) during the generation of this report, so 679e2c5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs · 2026-05-27T15:16:16Z

Open question is where to put this code?

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

robert3005 · 2026-05-27T23:14:22Z

Sounds like we want a crate in between the array and vortex-buffer or this could be a feature flag in vortex-buffer

joseph-isaacs · 2026-05-28T08:44:14Z

I was thinking vortex-compute what only works with dtype, buffers and rust native types?

I cannot remember if this was a problem before?

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

robert3005 · 2026-05-28T13:24:19Z

there was no problem with vortex-compute iirc. There were some constructs that made it hard in the past but I think we're fine now.

github-actions · 2026-06-12T02:17:35Z

This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

AdamGS · 2026-06-12T11:39:13Z

+/// The kernels in this crate require this trait instead of `Iterator` so that lane
+/// reads carry no inter-iteration data dependency — the autovectorizer treats each
+/// lane independently.
+pub trait IndexedSource {


These traits really look like Buf/BufMut from bytes which we already implement for buffers and are implemented for similar things

I think the random access read/write

AdamGS · 2026-06-12T11:40:38Z

+///
+/// Use this to drive a binary kernel from two columns. Length equality is enforced
+/// at construction.
+pub struct LaneZip<A, B>(pub A, pub B);


Seems like a good candidate for implementing Iterator and ExactSizeIterator

AdamGS · 2026-06-12T11:41:45Z

+/// `impl<S: IndexedSource> IndexedSourceExt for S` below. Bring the trait into
+/// scope (`use vortex_compute::lane_kernels::IndexedSourceExt;`) to call
+/// them with method syntax: `values.try_map_masked_into(&mask, &mut out, f)`.
+pub trait IndexedSourceExt: IndexedSource + Sized {


Can we skip the ext traits? just increases the mental load of tracking where things happen IMO

What's another way of doing this?

why can't it be default functions on the core trait?

I really don't think its worth having default functions here we likely want more of these

not sure I understand

Do this really matter?

AdamGS · 2026-06-12T13:10:08Z

+    // Skip the fallible kernel when type widening or (cached) min/max prove every value fits.
+    let target_dtype = DType::Primitive(T::PTYPE, Nullability::NonNullable);
+    let infallible = casts_losslessly_to(F::PTYPE, T::PTYPE)
+        || cached_values_fit_in(array, &target_dtype) == Some(true);


nit - unwrap_or_default

AdamGS · 2026-06-12T13:10:28Z

-        Mask::AllTrue(_) => BufferMut::try_from_trusted_len_iter(
+
+    // Returns `true` if every value of `from` is representable in `to` without loss.
+    fn casts_losslessly_to(from: PType, to: PType) -> bool {


doesn't need to be a function

I prefer this only that the body does easily read like that?

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

AdamGS · 2026-06-17T10:16:29Z

@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: Apache-2.0


Should this be in vortex-buffer?

I did think about putting this there, but it just seemed like the wrong place

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

robert3005 · 2026-06-17T15:36:01Z

we need to add the crate to the benchmarks, since it's a new one it has to be enumerate in the ci matrix

joseph-isaacs added 10 commits May 27, 2026 15:25

wip

7b5828f

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

wip

85ef2f8

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

wip

5cf469a

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

502a286

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

2f6df63

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

769a258

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

3a30290

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

d2bca93

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

6fd7fc1

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

72bca8b

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs force-pushed the ji/fast-iter-valid branch from 4b444dd to 72bca8b Compare May 27, 2026 14:25

joseph-isaacs added 3 commits May 27, 2026 15:44

f

fe34ccb

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

4299cf0

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

8e5945f

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs changed the title ~~faster iteration infra~~ perf[buffer]: iteration for fallible operations with validity May 27, 2026

joseph-isaacs marked this pull request as ready for review May 27, 2026 15:13

joseph-isaacs added 8 commits May 27, 2026 16:58

f

e9aac1d

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

d8d5463

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

2556d53

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

aa8a6d1

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

ca2ad88

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

608111c

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

d0a7806

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

f

fc9b5e8

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs added the changelog/performance A performance improvement label May 27, 2026

f

2c314ab

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs requested a review from robert3005 May 28, 2026 11:30

github-actions Bot added the stale This PR is stale and will be auto-closed soon label Jun 12, 2026

joseph-isaacs removed the stale This PR is stale and will be auto-closed soon label Jun 12, 2026

Merge branch 'develop' into ji/fast-iter-valid

73dbb94

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs requested a review from a team June 12, 2026 11:23

AdamGS reviewed Jun 12, 2026

View reviewed changes

Comment thread vortex-array/benches/cast_primitive.rs

AdamGS reviewed Jun 12, 2026

View reviewed changes

Comment thread vortex-compute/src/lane_kernels/map_in_place.rs Outdated

AdamGS reviewed Jun 12, 2026

View reviewed changes

Comment thread vortex-array/src/arrays/primitive/compute/cast.rs Outdated

w

49ec12a

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

AdamGS reviewed Jun 17, 2026

View reviewed changes

joseph-isaacs added 3 commits June 17, 2026 11:49

w

337147c

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

w

74216c1

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

w

7ed76bc

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

AdamGS approved these changes Jun 17, 2026

View reviewed changes

joseph-isaacs merged commit f0b8fb9 into develop Jun 17, 2026
64 of 65 checks passed

joseph-isaacs deleted the ji/fast-iter-valid branch June 17, 2026 13:26

Conversation

joseph-isaacs commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

joseph-isaacs commented May 27, 2026

Uh oh!

robert3005 commented May 27, 2026

Uh oh!

joseph-isaacs commented May 28, 2026

Uh oh!

robert3005 commented May 28, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robert3005 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joseph-isaacs commented May 27, 2026 •

edited

Loading

codspeed-hq Bot commented May 27, 2026 •

edited

Loading