Skip to content

Commit 2818abb

Browse files
authored
bench: first_last remove noisy benchmarks, add update_batch (#21487)
## Which issue does this PR close? - Refers to #21409 and #21409. ## Rationale for this change Reliable benchmarks for GroupsAccumulator operations for FIRST_VALUE, LAST_VALUE ## What changes are included in this PR? 1. As discussed in #21383 (comment), it's better to remove noisy fast benchmarks - done 2. Added bench for Accumulator (allows for measuring one of the improvements in the "perf" PR) - reliable measurement 3. Added initial benchmarks for update_batch, merge_batch - they test heavy paths in first_last, but still unpredictable in performance. The gain ranges from -20% to 20%, while statistical significance is good (p=0.0), the running time is higher (on the order of microseconds), and the variance is low (less than 10% with microseconds). I would suggest dropping bench (3), while keeping it in this PR for future reference. ## Are these changes tested? - Manual comparison of the optimised version in #21383 with the baseline. Raw output: <details> Benchmarking first_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 8.1994 s (10k ite first_value evaluate_bench nulls=0%, filter=false, first(2) time: [7.1169 µs 7.4442 µs 7.8068 µs] change: [−59.076% −56.116% −53.288%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe Benchmarking first_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.8861 s (10k iteratio first_value evaluate_bench nulls=0%, filter=false, all time: [61.417 µs 62.884 µs 64.445 µs] change: [+16.139% +18.982% +21.955%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking first_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=0%, filter=false time: [698.62 µs 715.94 µs 734.97 µs] change: [+14.291% +18.719% +23.681%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 8 (8.00%) high mild 3 (3.00%) high severe Benchmarking first_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50. first_value merge_bench nulls=0%, filter=false time: [790.57 µs 803.35 µs 816.87 µs] change: [+21.098% +22.993% +24.543%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 11 (11.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 5.5210 s (5050 ite first_value evaluate_bench nulls=0%, filter=true, first(2) time: [6.9505 µs 7.2537 µs 7.5774 µs] change: [−58.159% −56.529% −54.753%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 5 (5.00%) high mild Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.1760 s (5050 iteratio first_value evaluate_bench nulls=0%, filter=true, all time: [61.186 µs 62.361 µs 63.591 µs] change: [+22.124% +25.679% +29.909%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe Benchmarking first_value update_bench nulls=0%, filter=true: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=0%, filter=true time: [1.0514 ms 1.0802 ms 1.1132 ms] change: [+7.1670% +10.215% +13.276%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild first_value merge_bench nulls=0%, filter=true time: [1.0923 ms 1.1098 ms 1.1277 ms] change: [+3.0074% +5.0368% +7.0955%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0097 s (2.2M first_value trivial_update_bench nulls=0%, ignore_nulls=false time: [622.71 ns 631.10 ns 640.70 ns] change: [−50.114% −49.416% −48.705%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 8 (8.00%) high mild Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0045 s (2.2M i first_value trivial_update_bench nulls=0%, ignore_nulls=true time: [679.38 ns 694.90 ns 712.37 ns] change: [−43.205% −41.668% −39.912%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 5.2118 s (5050 i first_value evaluate_bench nulls=90%, filter=false, first(2) time: [7.4166 µs 7.8253 µs 8.2654 µs] change: [−48.325% −45.555% −42.419%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking first_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 9.9570 s (10k iterati first_value evaluate_bench nulls=90%, filter=false, all time: [58.517 µs 59.706 µs 60.970 µs] change: [−15.981% −11.746% −7.7891%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking first_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=90%, filter=false time: [780.07 µs 791.71 µs 804.95 µs] change: [+5.9470% +8.0014% +10.066%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking first_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. first_value merge_bench nulls=90%, filter=false time: [958.86 µs 970.42 µs 981.14 µs] change: [+18.439% +20.316% +22.356%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.0288 s (5050 it first_value evaluate_bench nulls=90%, filter=true, first(2) time: [6.8537 µs 7.1723 µs 7.5444 µs] change: [−56.908% −54.593% −52.231%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 5.8762 s (5050 iterati first_value evaluate_bench nulls=90%, filter=true, all time: [63.052 µs 64.334 µs 65.771 µs] change: [−7.9370% −3.3294% +1.2769%] (p = 0.18 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe Benchmarking first_value update_bench nulls=90%, filter=true: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50. first_value update_bench nulls=90%, filter=true time: [973.31 µs 987.93 µs 1.0051 ms] change: [−14.982% −13.123% −11.211%] (p = 0.00 < 0.05) Performance has improved. first_value merge_bench nulls=90%, filter=true time: [1.0484 ms 1.0733 ms 1.1015 ms] change: [−13.327% −8.5896% −4.0916%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0029 s (2.2M first_value trivial_update_bench nulls=90%, ignore_nulls=false time: [531.48 ns 540.09 ns 549.00 ns] change: [−53.396% −51.782% −50.199%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0038 s (2.2M first_value trivial_update_bench nulls=90%, ignore_nulls=true time: [915.47 ns 940.51 ns 964.57 ns] change: [−42.061% −40.291% −38.529%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild Benchmarking last_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 9.3836 s (10k iter last_value evaluate_bench nulls=0%, filter=false, first(2) time: [7.0199 µs 7.3045 µs 7.6053 µs] change: [−77.228% −67.635% −58.431%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.4206 s (10k iteration last_value evaluate_bench nulls=0%, filter=false, all time: [59.921 µs 61.048 µs 62.232 µs] change: [+4.8794% +8.3094% +11.439%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe Benchmarking last_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50. last_value update_bench nulls=0%, filter=false time: [700.36 µs 713.46 µs 726.83 µs] change: [+9.3963% +11.898% +14.265%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking last_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50. last_value merge_bench nulls=0%, filter=false time: [836.97 µs 858.80 µs 884.97 µs] change: [+23.796% +29.496% +37.261%] (p = 0.00 < 0.05) Performance has regressed. Found 19 outliers among 100 measurements (19.00%) 8 (8.00%) high mild 11 (11.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 6.3338 s (5050 iter last_value evaluate_bench nulls=0%, filter=true, first(2) time: [7.3937 µs 7.8961 µs 8.4694 µs] change: [−50.815% −47.152% −43.605%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.4877 s (5050 iteration last_value evaluate_bench nulls=0%, filter=true, all time: [68.529 µs 72.235 µs 76.626 µs] change: [−10.658% −4.0677% +2.9060%] (p = 0.26 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe last_value update_bench nulls=0%, filter=true time: [1.0699 ms 1.0877 ms 1.1064 ms] change: [−10.966% −8.9078% −6.9281%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe last_value merge_bench nulls=0%, filter=true time: [1.2981 ms 1.3350 ms 1.3750 ms] change: [+3.5348% +7.0744% +10.890%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high mild Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0073 s (2.3M i last_value trivial_update_bench nulls=0%, ignore_nulls=false time: [675.96 ns 691.10 ns 707.11 ns] change: [−52.010% −51.201% −50.366%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 1 (1.00%) low severe 11 (11.00%) low mild 4 (4.00%) high mild 1 (1.00%) high severe Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0026 s (2.5M it last_value trivial_update_bench nulls=0%, ignore_nulls=true time: [722.68 ns 752.73 ns 786.26 ns] change: [−47.665% −45.601% −43.316%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 6.4252 s (5050 it last_value evaluate_bench nulls=90%, filter=false, first(2) time: [7.1491 µs 7.5298 µs 7.9733 µs] change: [−58.264% −56.205% −53.750%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 6.6840 s (5050 iterati last_value evaluate_bench nulls=90%, filter=false, all time: [61.176 µs 63.371 µs 65.812 µs] change: [−6.7601% −3.4004% +0.0769%] (p = 0.05 > 0.05) No change in performance detected. Found 22 outliers among 100 measurements (22.00%) 11 (11.00%) low mild 3 (3.00%) high mild 8 (8.00%) high severe Benchmarking last_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50. last_value update_bench nulls=90%, filter=false time: [998.11 µs 1.0286 ms 1.0609 ms] change: [+6.6045% +9.8044% +13.224%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild Benchmarking last_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50. last_value merge_bench nulls=90%, filter=false time: [1.1076 ms 1.1214 ms 1.1338 ms] change: [+5.6096% +9.2265% +12.860%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.7049 s (5050 ite last_value evaluate_bench nulls=90%, filter=true, first(2) time: [7.3496 µs 7.7714 µs 8.2713 µs] change: [−58.463% −54.783% −50.403%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 6.6016 s (5050 iteratio last_value evaluate_bench nulls=90%, filter=true, all time: [58.656 µs 59.702 µs 60.853 µs] change: [+0.5598% +3.4200% +6.1524%] (p = 0.01 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild last_value update_bench nulls=90%, filter=true time: [1.2175 ms 1.2303 ms 1.2426 ms] change: [−0.2830% +1.3593% +3.0630%] (p = 0.12 > 0.05) No change in performance detected. Found 26 outliers among 100 measurements (26.00%) 13 (13.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe last_value merge_bench nulls=90%, filter=true time: [1.3002 ms 1.3176 ms 1.3346 ms] change: [−10.464% −7.8770% −5.2989%] (p = 0.00 < 0.05) Performance has improved. Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0064 s (2.3M last_value trivial_update_bench nulls=90%, ignore_nulls=false time: [533.70 ns 539.66 ns 545.06 ns] change: [−63.723% −62.553% −61.383%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 5 (5.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0065 s (1.6M i last_value trivial_update_bench nulls=90%, ignore_nulls=true time: [1.4710 µs 1.4893 µs 1.5120 µs] change: [−34.994% −33.485% −31.932%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cargo bench --bench first_last -- --baseline main4 1158.42s user 40.19s system 140% cpu 14:11.29 total </details> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent e6b32fe commit 2818abb

File tree

1 file changed

+172
-75
lines changed

1 file changed

+172
-75
lines changed

datafusion/functions-aggregate/benches/first_last.rs

Lines changed: 172 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,24 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18-
use std::hint::black_box;
19-
use std::sync::Arc;
20-
21-
use arrow::array::{ArrayRef, BooleanArray};
18+
use arrow::array::{ArrayRef, BooleanArray, Int64Array};
2219
use arrow::compute::SortOptions;
2320
use arrow::datatypes::{DataType, Field, Int64Type, Schema};
2421
use arrow::util::bench_util::{create_boolean_array, create_primitive_array};
22+
use datafusion_common::instant::Instant;
23+
use std::hint::black_box;
24+
use std::sync::Arc;
2525

2626
use datafusion_expr::{
2727
Accumulator, AggregateUDFImpl, EmitTo, GroupsAccumulator, function::AccumulatorArgs,
2828
};
29-
use datafusion_functions_aggregate::first_last::{FirstValue, LastValue};
29+
use datafusion_functions_aggregate::first_last::{
30+
FirstValue, LastValue, TrivialFirstValueAccumulator, TrivialLastValueAccumulator,
31+
};
3032
use datafusion_physical_expr::PhysicalSortExpr;
3133
use datafusion_physical_expr::expressions::col;
3234

33-
use criterion::{Criterion, criterion_group, criterion_main};
35+
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
3436

3537
fn prepare_groups_accumulator(is_first: bool) -> Box<dyn GroupsAccumulator> {
3638
let schema = Arc::new(Schema::new(vec![
@@ -72,92 +74,120 @@ fn prepare_groups_accumulator(is_first: bool) -> Box<dyn GroupsAccumulator> {
7274
}
7375
}
7476

75-
fn prepare_accumulator(is_first: bool) -> Box<dyn Accumulator> {
76-
let schema = Arc::new(Schema::new(vec![
77-
Field::new("value", DataType::Int64, true),
78-
Field::new("ord", DataType::Int64, true),
79-
]));
80-
81-
let order_expr = col("ord", &schema).unwrap();
82-
let sort_expr = PhysicalSortExpr {
83-
expr: order_expr,
84-
options: SortOptions::default(),
85-
};
86-
87-
let value_field: Arc<Field> = Field::new("value", DataType::Int64, true).into();
88-
let accumulator_args = AccumulatorArgs {
89-
return_field: Arc::clone(&value_field),
90-
schema: &schema,
91-
expr_fields: &[value_field],
92-
ignore_nulls: false,
93-
order_bys: std::slice::from_ref(&sort_expr),
94-
is_reversed: false,
95-
name: if is_first {
96-
"FIRST_VALUE(value ORDER BY ord)"
97-
} else {
98-
"LAST_VALUE(value ORDER BY ord)"
99-
},
100-
is_distinct: false,
101-
exprs: &[col("value", &schema).unwrap()],
102-
};
103-
77+
fn create_trivial_accumulator(
78+
is_first: bool,
79+
ignore_nulls: bool,
80+
) -> Box<dyn Accumulator> {
10481
if is_first {
105-
FirstValue::new().accumulator(accumulator_args).unwrap()
82+
Box::new(
83+
TrivialFirstValueAccumulator::try_new(&DataType::Int64, ignore_nulls)
84+
.unwrap(),
85+
)
10686
} else {
107-
LastValue::new().accumulator(accumulator_args).unwrap()
87+
Box::new(
88+
TrivialLastValueAccumulator::try_new(&DataType::Int64, ignore_nulls).unwrap(),
89+
)
10890
}
10991
}
11092

11193
#[expect(clippy::needless_pass_by_value)]
112-
fn convert_to_state_bench(
94+
#[expect(clippy::too_many_arguments)]
95+
fn evaluate_bench(
11396
c: &mut Criterion,
11497
is_first: bool,
98+
emit_to: EmitTo,
11599
name: &str,
116100
values: ArrayRef,
101+
ord: ArrayRef,
117102
opt_filter: Option<&BooleanArray>,
103+
num_groups: usize,
118104
) {
105+
let n = values.len();
106+
let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
107+
119108
c.bench_function(name, |b| {
120-
b.iter(|| {
121-
let accumulator = prepare_groups_accumulator(is_first);
122-
black_box(
109+
b.iter_batched(
110+
|| {
111+
let mut accumulator = prepare_groups_accumulator(is_first);
123112
accumulator
124-
.convert_to_state(std::slice::from_ref(&values), opt_filter)
125-
.unwrap(),
126-
)
127-
})
113+
.update_batch(
114+
&[Arc::clone(&values), Arc::clone(&ord)],
115+
&group_indices,
116+
opt_filter,
117+
num_groups,
118+
)
119+
.unwrap();
120+
accumulator
121+
},
122+
|mut accumulator| {
123+
black_box(accumulator.evaluate(emit_to).unwrap());
124+
},
125+
BatchSize::SmallInput,
126+
)
128127
});
129128
}
130129

131130
#[expect(clippy::needless_pass_by_value)]
132-
fn evaluate_accumulator_bench(
131+
fn update_bench(
133132
c: &mut Criterion,
134133
is_first: bool,
135134
name: &str,
136135
values: ArrayRef,
137136
ord: ArrayRef,
137+
opt_filter: Option<&BooleanArray>,
138+
num_groups: usize,
138139
) {
140+
let n = values.len();
141+
let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
142+
143+
// Initialize with worst-case ordering so update_batch forces rows comparison for all groups.
144+
let worst_ord: ArrayRef = Arc::new(Int64Array::from(vec![
145+
if is_first {
146+
i64::MAX
147+
} else {
148+
i64::MIN
149+
};
150+
n
151+
]));
152+
139153
c.bench_function(name, |b| {
140154
b.iter_batched(
141155
|| {
142-
// setup, not timed
143-
let mut accumulator = prepare_accumulator(is_first);
156+
let mut accumulator = prepare_groups_accumulator(is_first);
144157
accumulator
145-
.update_batch(&[Arc::clone(&values), Arc::clone(&ord)])
158+
.update_batch(
159+
&[Arc::clone(&values), Arc::clone(&worst_ord)],
160+
&group_indices,
161+
None, // no filter: ensure all groups are initialised
162+
num_groups,
163+
)
146164
.unwrap();
147165
accumulator
148166
},
149-
|mut accumulator| black_box(accumulator.evaluate().unwrap()),
150-
criterion::BatchSize::SmallInput,
167+
|mut accumulator| {
168+
for _ in 0..100 {
169+
#[expect(clippy::unit_arg)]
170+
black_box(
171+
accumulator
172+
.update_batch(
173+
&[Arc::clone(&values), Arc::clone(&ord)],
174+
&group_indices,
175+
opt_filter,
176+
num_groups,
177+
)
178+
.unwrap(),
179+
);
180+
}
181+
},
182+
BatchSize::SmallInput,
151183
)
152184
});
153185
}
154186

155187
#[expect(clippy::needless_pass_by_value)]
156-
#[expect(clippy::too_many_arguments)]
157-
fn evaluate_bench(
188+
fn merge_bench(
158189
c: &mut Criterion,
159190
is_first: bool,
160-
emit_to: EmitTo,
161191
name: &str,
162192
values: ArrayRef,
163193
ord: ArrayRef,
@@ -166,28 +196,81 @@ fn evaluate_bench(
166196
) {
167197
let n = values.len();
168198
let group_indices: Vec<usize> = (0..n).map(|i| i % num_groups).collect();
199+
let is_set: ArrayRef = Arc::new(BooleanArray::from(vec![true; n]));
200+
201+
// Initialize with worst-case ordering so update_batch forces rows comparison for all groups.
202+
let worst_ord: ArrayRef = Arc::new(Int64Array::from(vec![
203+
if is_first {
204+
i64::MAX
205+
} else {
206+
i64::MIN
207+
};
208+
n
209+
]));
169210

170211
c.bench_function(name, |b| {
171212
b.iter_batched(
172213
|| {
173-
// setup, not timed
214+
// Prebuild accumulator
174215
let mut accumulator = prepare_groups_accumulator(is_first);
175216
accumulator
176217
.update_batch(
177-
&[Arc::clone(&values), Arc::clone(&ord)],
218+
&[Arc::clone(&values), Arc::clone(&worst_ord)],
178219
&group_indices,
179220
opt_filter,
180221
num_groups,
181222
)
182223
.unwrap();
183224
accumulator
184225
},
185-
|mut accumulator| black_box(accumulator.evaluate(emit_to).unwrap()),
186-
criterion::BatchSize::SmallInput,
226+
|mut accumulator| {
227+
for _ in 0..100 {
228+
#[expect(clippy::unit_arg)]
229+
black_box(
230+
accumulator
231+
.merge_batch(
232+
&[
233+
Arc::clone(&values),
234+
Arc::clone(&ord),
235+
Arc::clone(&is_set),
236+
],
237+
&group_indices,
238+
opt_filter,
239+
num_groups,
240+
)
241+
.unwrap(),
242+
);
243+
}
244+
},
245+
BatchSize::SmallInput,
187246
)
188247
});
189248
}
190249

250+
#[expect(clippy::needless_pass_by_value)]
251+
fn trivial_update_bench(
252+
c: &mut Criterion,
253+
is_first: bool,
254+
ignore_nulls: bool,
255+
name: &str,
256+
values: ArrayRef,
257+
) {
258+
c.bench_function(name, |b| {
259+
b.iter_custom(|iters| {
260+
// The bench is way too fast, so apply scaling factor
261+
let mut accumulators: Vec<Box<dyn Accumulator>> = (0..iters * 100)
262+
.map(|_| create_trivial_accumulator(is_first, ignore_nulls))
263+
.collect();
264+
let start = Instant::now();
265+
for acc in &mut accumulators {
266+
#[expect(clippy::unit_arg)]
267+
black_box(acc.update_batch(&[Arc::clone(&values)]).unwrap());
268+
}
269+
start.elapsed()
270+
})
271+
});
272+
}
273+
191274
fn first_last_benchmark(c: &mut Criterion) {
192275
const N: usize = 65536;
193276
const NUM_GROUPS: usize = 1024;
@@ -208,27 +291,10 @@ fn first_last_benchmark(c: &mut Criterion) {
208291
let ord = Arc::new(create_primitive_array::<Int64Type>(N, null_density))
209292
as ArrayRef;
210293

211-
evaluate_accumulator_bench(
212-
c,
213-
is_first,
214-
&format!("{fn_name} evaluate_accumulator_bench nulls={pct}%"),
215-
values.clone(),
216-
ord.clone(),
217-
);
218-
219294
for with_filter in [false, true] {
220295
let filter = create_boolean_array(N, 0.0, 0.5);
221296
let opt_filter = if with_filter { Some(&filter) } else { None };
222297

223-
convert_to_state_bench(
224-
c,
225-
is_first,
226-
&format!(
227-
"{fn_name} convert_to_state nulls={pct}%, filter={with_filter}"
228-
),
229-
values.clone(),
230-
opt_filter,
231-
);
232298
evaluate_bench(
233299
c,
234300
is_first,
@@ -253,6 +319,37 @@ fn first_last_benchmark(c: &mut Criterion) {
253319
opt_filter,
254320
NUM_GROUPS,
255321
);
322+
323+
update_bench(
324+
c,
325+
is_first,
326+
&format!("{fn_name} update_bench nulls={pct}%, filter={with_filter}"),
327+
values.clone(),
328+
ord.clone(),
329+
opt_filter,
330+
NUM_GROUPS,
331+
);
332+
merge_bench(
333+
c,
334+
is_first,
335+
&format!("{fn_name} merge_bench nulls={pct}%, filter={with_filter}"),
336+
values.clone(),
337+
ord.clone(),
338+
opt_filter,
339+
NUM_GROUPS,
340+
);
341+
}
342+
343+
for ignore_nulls in [false, true] {
344+
trivial_update_bench(
345+
c,
346+
is_first,
347+
ignore_nulls,
348+
&format!(
349+
"{fn_name} trivial_update_bench nulls={pct}%, ignore_nulls={ignore_nulls}"
350+
),
351+
values.clone(),
352+
);
256353
}
257354
}
258355
}

0 commit comments

Comments
 (0)