bench: first_last remove noisy benchmarks, add update_batch (#21487)

theirix · web-flow · commit 2818abb8a10b · 2026-04-14T17:30:53.000Z
## Which issue does this PR close? - Refers to #21409 and #21409. ## Rationale for this change Reliable benchmarks for GroupsAccumulator operations for FIRST_VALUE, LAST_VALUE ## What changes are included in this PR? 1. As discussed in #21383 (comment), it's better to remove noisy fast benchmarks - done 2. Added bench for Accumulator (allows for measuring one of the improvements in the "perf" PR) - reliable measurement 3. Added initial benchmarks for update_batch, merge_batch - they test heavy paths in first_last, but still unpredictable in performance. The gain ranges from -20% to 20%, while statistical significance is good (p=0.0), the running time is higher (on the order of microseconds), and the variance is low (less than 10% with microseconds). I would suggest dropping bench (3), while keeping it in this PR for future reference. ## Are these changes tested? - Manual comparison of the optimised version in #21383 with the baseline. Raw output: <details> Benchmarking first_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 8.1994 s (10k ite first_value evaluate_bench nulls=0%, filter=false, first(2) time: [7.1169 µs 7.4442 µs 7.8068 µs] change: [−59.076% −56.116% −53.288%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe Benchmarking first_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.8861 s (10k iteratio first_value evaluate_bench nulls=0%, filter=false, all time: [61.417 µs 62.884 µs 64.445 µs] change: [+16.139% +18.982% +21.955%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking first_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=0%, filter=false time: [698.62 µs 715.94 µs 734.97 µs] change: [+14.291% +18.719% +23.681%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 8 (8.00%) high mild 3 (3.00%) high severe Benchmarking first_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50. first_value merge_bench nulls=0%, filter=false time: [790.57 µs 803.35 µs 816.87 µs] change: [+21.098% +22.993% +24.543%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 11 (11.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 5.5210 s (5050 ite first_value evaluate_bench nulls=0%, filter=true, first(2) time: [6.9505 µs 7.2537 µs 7.5774 µs] change: [−58.159% −56.529% −54.753%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 5 (5.00%) high mild Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.1760 s (5050 iteratio first_value evaluate_bench nulls=0%, filter=true, all time: [61.186 µs 62.361 µs 63.591 µs] change: [+22.124% +25.679% +29.909%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe Benchmarking first_value update_bench nulls=0%, filter=true: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=0%, filter=true time: [1.0514 ms 1.0802 ms 1.1132 ms] change: [+7.1670% +10.215% +13.276%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild first_value merge_bench nulls=0%, filter=true time: [1.0923 ms 1.1098 ms 1.1277 ms] change: [+3.0074% +5.0368% +7.0955%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0097 s (2.2M first_value trivial_update_bench nulls=0%, ignore_nulls=false time: [622.71 ns 631.10 ns 640.70 ns] change: [−50.114% −49.416% −48.705%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 8 (8.00%) high mild Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0045 s (2.2M i first_value trivial_update_bench nulls=0%, ignore_nulls=true time: [679.38 ns 694.90 ns 712.37 ns] change: [−43.205% −41.668% −39.912%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 5.2118 s (5050 i first_value evaluate_bench nulls=90%, filter=false, first(2) time: [7.4166 µs 7.8253 µs 8.2654 µs] change: [−48.325% −45.555% −42.419%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking first_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 9.9570 s (10k iterati first_value evaluate_bench nulls=90%, filter=false, all time: [58.517 µs 59.706 µs 60.970 µs] change: [−15.981% −11.746% −7.7891%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking first_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=90%, filter=false time: [780.07 µs 791.71 µs 804.95 µs] change: [+5.9470% +8.0014% +10.066%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking first_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. first_value merge_bench nulls=90%, filter=false time: [958.86 µs 970.42 µs 981.14 µs] change: [+18.439% +20.316% +22.356%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.0288 s (5050 it first_value evaluate_bench nulls=90%, filter=true, first(2) time: [6.8537 µs 7.1723 µs 7.5444 µs] change: [−56.908% −54.593% −52.231%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 5.8762 s (5050 iterati first_value evaluate_bench nulls=90%, filter=true, all time: [63.052 µs 64.334 µs 65.771 µs] change: [−7.9370% −3.3294% +1.2769%] (p = 0.18 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe Benchmarking first_value update_bench nulls=90%, filter=true: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=90%, filter=true time: [973.31 µs 987.93 µs 1.0051 ms] change: [−14.982% −13.123% −11.211%] (p = 0.00 < 0.05) Performance has improved. first_value merge_bench nulls=90%, filter=true time: [1.0484 ms 1.0733 ms 1.1015 ms] change: [−13.327% −8.5896% −4.0916%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0029 s (2.2M first_value trivial_update_bench nulls=90%, ignore_nulls=false time: [531.48 ns 540.09 ns 549.00 ns] change: [−53.396% −51.782% −50.199%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0038 s (2.2M first_value trivial_update_bench nulls=90%, ignore_nulls=true time: [915.47 ns 940.51 ns 964.57 ns] change: [−42.061% −40.291% −38.529%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild Benchmarking last_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 9.3836 s (10k iter last_value evaluate_bench nulls=0%, filter=false, first(2) time: [7.0199 µs 7.3045 µs 7.6053 µs] change: [−77.228% −67.635% −58.431%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.4206 s (10k iteration last_value evaluate_bench nulls=0%, filter=false, all time: [59.921 µs 61.048 µs 62.232 µs] change: [+4.8794% +8.3094% +11.439%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe Benchmarking last_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50. last_value update_bench nulls=0%, filter=false time: [700.36 µs 713.46 µs 726.83 µs] change: [+9.3963% +11.898% +14.265%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking last_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50. last_value merge_bench nulls=0%, filter=false time: [836.97 µs 858.80 µs 884.97 µs] change: [+23.796% +29.496% +37.261%] (p = 0.00 < 0.05) Performance has regressed. Found 19 outliers among 100 measurements (19.00%) 8 (8.00%) high mild 11 (11.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 6.3338 s (5050 iter last_value evaluate_bench nulls=0%, filter=true, first(2) time: [7.3937 µs 7.8961 µs 8.4694 µs] change: [−50.815% −47.152% −43.605%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.4877 s (5050 iteration last_value evaluate_bench nulls=0%, filter=true, all time: [68.529 µs 72.235 µs 76.626 µs] change: [−10.658% −4.0677% +2.9060%] (p = 0.26 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe last_value update_bench nulls=0%, filter=true time: [1.0699 ms 1.0877 ms 1.1064 ms] change: [−10.966% −8.9078% −6.9281%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe last_value merge_bench nulls=0%, filter=true time: [1.2981 ms 1.3350 ms 1.3750 ms] change: [+3.5348% +7.0744% +10.890%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high mild Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0073 s (2.3M i last_value trivial_update_bench nulls=0%, ignore_nulls=false time: [675.96 ns 691.10 ns 707.11 ns] change: [−52.010% −51.201% −50.366%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 1 (1.00%) low severe 11 (11.00%) low mild 4 (4.00%) high mild 1 (1.00%) high severe Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0026 s (2.5M it last_value trivial_update_bench nulls=0%, ignore_nulls=true time: [722.68 ns 752.73 ns 786.26 ns] change: [−47.665% −45.601% −43.316%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 6.4252 s (5050 it last_value evaluate_bench nulls=90%, filter=false, first(2) time: [7.1491 µs 7.5298 µs 7.9733 µs] change: [−58.264% −56.205% −53.750%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 6.6840 s (5050 iterati last_value evaluate_bench nulls=90%, filter=false, all time: [61.176 µs 63.371 µs 65.812 µs] change: [−6.7601% −3.4004% +0.0769%] (p = 0.05 > 0.05) No change in performance detected. Found 22 outliers among 100 measurements (22.00%) 11 (11.00%) low mild 3 (3.00%) high mild 8 (8.00%) high severe Benchmarking last_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50. last_value update_bench nulls=90%, filter=false time: [998.11 µs 1.0286 ms 1.0609 ms] change: [+6.6045% +9.8044% +13.224%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild Benchmarking last_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50. last_value merge_bench nulls=90%, filter=false time: [1.1076 ms 1.1214 ms 1.1338 ms] change: [+5.6096% +9.2265% +12.860%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.7049 s (5050 ite last_value evaluate_bench nulls=90%, filter=true, first(2) time: [7.3496 µs 7.7714 µs 8.2713 µs] change: [−58.463% −54.783% −50.403%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 6.6016 s (5050 iteratio last_value evaluate_bench nulls=90%, filter=true, all time: [58.656 µs 59.702 µs 60.853 µs] change: [+0.5598% +3.4200% +6.1524%] (p = 0.01 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild last_value update_bench nulls=90%, filter=true time: [1.2175 ms 1.2303 ms 1.2426 ms] change: [−0.2830% +1.3593% +3.0630%] (p = 0.12 > 0.05) No change in performance detected. Found 26 outliers among 100 measurements (26.00%) 13 (13.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe last_value merge_bench nulls=90%, filter=true time: [1.3002 ms 1.3176 ms 1.3346 ms] change: [−10.464% −7.8770% −5.2989%] (p = 0.00 < 0.05) Performance has improved. Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0064 s (2.3M last_value trivial_update_bench nulls=90%, ignore_nulls=false time: [533.70 ns 539.66 ns 545.06 ns] change: [−63.723% −62.553% −61.383%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 5 (5.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0065 s (1.6M i last_value trivial_update_bench nulls=90%, ignore_nulls=true time: [1.4710 µs 1.4893 µs 1.5120 µs] change: [−34.994% −33.485% −31.932%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cargo bench --bench first_last -- --baseline main4 1158.42s user 40.19s system 140% cpu 14:11.29 total </details> ## Are there any user-facing changes?
diff --git a/datafusion/functions-aggregate/benches/first_last.rs b/datafusion/functions-aggregate/benches/first_last.rs
@@ -15,22 +15,24 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use std::hint::black_box;
-use std::sync::Arc;
-
-use arrow::array::{ArrayRef, BooleanArray};
+use arrow::array::{ArrayRef, BooleanArray, Int64Array};
 use arrow::compute::SortOptions;
 use arrow::datatypes::{DataType, Field, Int64Type, Schema};
 use arrow::util::bench_util::{create_boolean_array, create_primitive_array};
+use datafusion_common::instant::Instant;
+use std::hint::black_box;
+use std::sync::Arc;
 
 use datafusion_expr::{
     Accumulator, AggregateUDFImpl, EmitTo, GroupsAccumulator, function::AccumulatorArgs,
 };
-use datafusion_functions_aggregate::first_last::{FirstValue, LastValue};
+use datafusion_functions_aggregate::first_last::{
+    FirstValue, LastValue, TrivialFirstValueAccumulator, TrivialLastValueAccumulator,
+};
 use datafusion_physical_expr::PhysicalSortExpr;
 use datafusion_physical_expr::expressions::col;
 
-use criterion::{Criterion, criterion_group, criterion_main};
+use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
 
 fn prepare_groups_accumulator(is_first: bool) -> Box<dyn GroupsAccumulator> {
     let schema = Arc::new(Schema::new(vec![
@@ -72,92 +74,120 @@ fn prepare_groups_accumulator(is_first: bool) -> Box<dyn GroupsAccumulator> {
     }
 }
 
-fn prepare_accumulator(is_first: bool) -> Box<dyn Accumulator> {
-    let schema = Arc::new(Schema::new(vec![
-        Field::new("value", DataType::Int64, true),
-        Field::new("ord", DataType::Int64, true),
-    ]));
-
-    let order_expr = col("ord", &schema).unwrap();
-    let sort_expr = PhysicalSortExpr {
-        expr: order_expr,
-        options: SortOptions::default(),
-    };
-
-    let value_field: Arc<Field> = Field::new("value", DataType::Int64, true).into();
-    let accumulator_args = AccumulatorArgs {
-        return_field: Arc::clone(&value_field),
-        schema: &schema,
-        expr_fields: &[value_field],
-        ignore_nulls: false,
-        order_bys: std::slice::from_ref(&sort_expr),
-        is_reversed: false,
-        name: if is_first {
-            "FIRST_VALUE(value ORDER BY ord)"
-        } else {
-            "LAST_VALUE(value ORDER BY ord)"
-        },
-        is_distinct: false,
-        exprs: &[col("value", &schema).unwrap()],
-    };
-
+fn create_trivial_accumulator(
+    is_first: bool,
+    ignore_nulls: bool,
+) -> Box<dyn Accumulator> {
     if is_first {
-        FirstValue::new().accumulator(accumulator_args).unwrap()
+        Box::new(
+            TrivialFirstValueAccumulator::try_new(&DataType::Int64, ignore_nulls)
+                .unwrap(),
+        )
     } else {
-        LastValue::new().accumulator(accumulator_args).unwrap()
+        Box::new(
+            TrivialLastValueAccumulator::try_new(&DataType::Int64, ignore_nulls).unwrap(),
+        )
     }
 }
 
 #[expect(clippy::needless_pass_by_value)]
-fn convert_to_state_bench(
+#[expect(clippy::too_many_arguments)]
+fn evaluate_bench(
     c: &mut Criterion,
     is_first: bool,
+    emit_to: EmitTo,
     name: &str,
     values: ArrayRef,
+    ord: ArrayRef,
     opt_filter: Option<&BooleanArray>,
+    num_groups: usize,
 ) {
+    let n = values.len();
+    let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
+
     c.bench_function(name, |b| {
-        b.iter(|| {
-            let accumulator = prepare_groups_accumulator(is_first);
-            black_box(
+        b.iter_batched(
+            || {
+                let mut accumulator = prepare_groups_accumulator(is_first);
                 accumulator
-                    .convert_to_state(std::slice::from_ref(&values), opt_filter)
-                    .unwrap(),
-            )
-        })
+                    .update_batch(
+                        &[Arc::clone(&values), Arc::clone(&ord)],
+                        &group_indices,
+                        opt_filter,
+                        num_groups,
+                    )
+                    .unwrap();
+                accumulator
+            },
+            |mut accumulator| {
+                black_box(accumulator.evaluate(emit_to).unwrap());
+            },
+            BatchSize::SmallInput,
+        )
     });
 }
 
 #[expect(clippy::needless_pass_by_value)]
-fn evaluate_accumulator_bench(
+fn update_bench(
     c: &mut Criterion,
     is_first: bool,
     name: &str,
     values: ArrayRef,
     ord: ArrayRef,
+    opt_filter: Option<&BooleanArray>,
+    num_groups: usize,
 ) {
+    let n = values.len();
+    let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
+
+    // Initialize with worst-case ordering so update_batch forces rows comparison for all groups.
+    let worst_ord: ArrayRef = Arc::new(Int64Array::from(vec![
+        if is_first {
+            i64::MAX
+        } else {
+            i64::MIN
+        };
+        n
+    ]));
+
     c.bench_function(name, |b| {
         b.iter_batched(
             || {
-                // setup, not timed
-                let mut accumulator = prepare_accumulator(is_first);
+                let mut accumulator = prepare_groups_accumulator(is_first);
                 accumulator
-                    .update_batch(&[Arc::clone(&values), Arc::clone(&ord)])
+                    .update_batch(
+                        &[Arc::clone(&values), Arc::clone(&worst_ord)],
+                        &group_indices,
+                        None, // no filter: ensure all groups are initialised
+                        num_groups,
+                    )
                     .unwrap();
                 accumulator
             },
-            |mut accumulator| black_box(accumulator.evaluate().unwrap()),
-            criterion::BatchSize::SmallInput,
+            |mut accumulator| {
+                for _ in 0..100 {
+                    #[expect(clippy::unit_arg)]
+                    black_box(
+                        accumulator
+                            .update_batch(
+                                &[Arc::clone(&values), Arc::clone(&ord)],
+                                &group_indices,
+                                opt_filter,
+                                num_groups,
+                            )
+                            .unwrap(),
+                    );
+                }
+            },
+            BatchSize::SmallInput,
         )
     });
 }
 
 #[expect(clippy::needless_pass_by_value)]
-#[expect(clippy::too_many_arguments)]
-fn evaluate_bench(
+fn merge_bench(
     c: &mut Criterion,
     is_first: bool,
-    emit_to: EmitTo,
     name: &str,
     values: ArrayRef,
     ord: ArrayRef,
@@ -166,28 +196,81 @@ fn evaluate_bench(
 ) {
     let n = values.len();
     let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
+    let is_set: ArrayRef = Arc::new(BooleanArray::from(vec![true; n]));
+
+    // Initialize with worst-case ordering so update_batch forces rows comparison for all groups.
+    let worst_ord: ArrayRef = Arc::new(Int64Array::from(vec![
+        if is_first {
+            i64::MAX
+        } else {
+            i64::MIN
+        };
+        n
+    ]));
 
     c.bench_function(name, |b| {
         b.iter_batched(
             || {
-                // setup, not timed
+                // Prebuild accumulator
                 let mut accumulator = prepare_groups_accumulator(is_first);
                 accumulator
                     .update_batch(
-                        &[Arc::clone(&values), Arc::clone(&ord)],
+                        &[Arc::clone(&values), Arc::clone(&worst_ord)],
                         &group_indices,
                         opt_filter,
                         num_groups,
                     )
                     .unwrap();
                 accumulator
             },
-            |mut accumulator| black_box(accumulator.evaluate(emit_to).unwrap()),
-            criterion::BatchSize::SmallInput,
+            |mut accumulator| {
+                for _ in 0..100 {
+                    #[expect(clippy::unit_arg)]
+                    black_box(
+                        accumulator
+                            .merge_batch(
+                                &[
+                                    Arc::clone(&values),
+                                    Arc::clone(&ord),
+                                    Arc::clone(&is_set),
+                                ],
+                                &group_indices,
+                                opt_filter,
+                                num_groups,
+                            )
+                            .unwrap(),
+                    );
+                }
+            },
+            BatchSize::SmallInput,
         )
     });
 }
 
+#[expect(clippy::needless_pass_by_value)]
+fn trivial_update_bench(
+    c: &mut Criterion,
+    is_first: bool,
+    ignore_nulls: bool,
+    name: &str,
+    values: ArrayRef,
+) {
+    c.bench_function(name, |b| {
+        b.iter_custom(|iters| {
+            // The bench is way too fast, so apply scaling factor
+            let mut accumulators: Vec<Box<dyn Accumulator>> = (0..iters * 100)
+                .map(|_| create_trivial_accumulator(is_first, ignore_nulls))
+                .collect();
+            let start = Instant::now();
+            for acc in &mut accumulators {
+                #[expect(clippy::unit_arg)]
+                black_box(acc.update_batch(&[Arc::clone(&values)]).unwrap());
+            }
+            start.elapsed()
+        })
+    });
+}
+
 fn first_last_benchmark(c: &mut Criterion) {
     const N: usize = 65536;
     const NUM_GROUPS: usize = 1024;
@@ -208,27 +291,10 @@ fn first_last_benchmark(c: &mut Criterion) {
             let ord = Arc::new(create_primitive_array::<Int64Type>(N, null_density))
                 as ArrayRef;
 
-            evaluate_accumulator_bench(
-                c,
-                is_first,
-                &format!("{fn_name} evaluate_accumulator_bench nulls={pct}%"),
-                values.clone(),
-                ord.clone(),
-            );
-
             for with_filter in [false, true] {
                 let filter = create_boolean_array(N, 0.0, 0.5);
                 let opt_filter = if with_filter { Some(&filter) } else { None };
 
-                convert_to_state_bench(
-                    c,
-                    is_first,
-                    &format!(
-                        "{fn_name} convert_to_state nulls={pct}%, filter={with_filter}"
-                    ),
-                    values.clone(),
-                    opt_filter,
-                );
                 evaluate_bench(
                     c,
                     is_first,
@@ -253,6 +319,37 @@ fn first_last_benchmark(c: &mut Criterion) {
                     opt_filter,
                     NUM_GROUPS,
                 );
+
+                update_bench(
+                    c,
+                    is_first,
+                    &format!("{fn_name} update_bench nulls={pct}%, filter={with_filter}"),
+                    values.clone(),
+                    ord.clone(),
+                    opt_filter,
+                    NUM_GROUPS,
+                );
+                merge_bench(
+                    c,
+                    is_first,
+                    &format!("{fn_name} merge_bench nulls={pct}%, filter={with_filter}"),
+                    values.clone(),
+                    ord.clone(),
+                    opt_filter,
+                    NUM_GROUPS,
+                );
+            }
+
+            for ignore_nulls in [false, true] {
+                trivial_update_bench(
+                    c,
+                    is_first,
+                    ignore_nulls,
+                    &format!(
+                        "{fn_name} trivial_update_bench nulls={pct}%, ignore_nulls={ignore_nulls}"
+                    ),
+                    values.clone(),
+                );
             }
         }
     }