feat: add ExecutionPlan::benefits_from_output_partitioning by adriangb · Pull Request #22440 · apache/datafusion

adriangb · 2026-05-21T19:20:51Z

Stacks on top of #22384 — both branches need to land for the diff to be coherent.

Summary

Add ExecutionPlan::benefits_from_output_partitioning() -> bool (default false) as the symmetric counterpart of the existing benefits_from_input_partitioning. The optimizer's EnforceDistribution already inserts a RepartitionExec(RoundRobinBatch(target_partitions)) when a parent's benefits_from_input_partitioning is true. With this addition it also fires when the child itself opts in via benefits_from_output_partitioning — no special handling in repartitioned() or DistributionContext bookkeeping.

Why

When a parquet scan owns a filter and #22384 runs it post-decode inside the scan thread (the pushdown_filters = false path), there is no sibling FilterExec above the scan. Single-partition consumers — SortExec, CoalescePartitionsExec, a CollectLeft hash-join build — therefore inherit a single-thread scan + filter, even when the cluster has plenty of idle cores. The companion PRs (#22438 disabling join dynamic filter pushdown by default, #22439 lowering repartition_file_min_size to 1 MiB) close most of the regression but leave TPC-DS with ~18 queries still slower than main on small dim-table joins where byte-range splitting alone can't reach target_partitions. This PR closes the rest.

Wiring

ExecutionPlan ─┬─ DataSourceExec  -> DataSource::benefits_from_output_partitioning
               │
DataSource ─── FileScanConfig    -> FileSource::benefits_from_output_partitioning
               │
FileSource ─── ParquetSource     -> predicate.is_some() && !pushdown_filters()

The pushdown_filters = true gate is important: with RowFilter doing the work during decode, the round-robin wouldn't help and would also defeat limit-pushdown for ordered scans.

Benchmark numbers (12 cores, SF1)

Run with the companion PRs (#22438 + #22439) applied so the dynamic-filter and split-size doors are open:

Suite	PR #22384 alone	+ this PR
TPC-H slower-than-main	2	2
TPC-DS slower-than-main	18	2
ClickBench slower-than-main	3	4

The remaining residuals (TPC-H Q5 ~3%, TPC-DS Q41 ~4% on a 15 ms query, ClickBench Q13 ~5%) look like fixed-cost per-batch overhead in the post-scan filter path itself and are within run-to-run variance for the rest.

Test plan

cargo test --test sqllogictests — all 472 files pass after snapshot updates that all show RepartitionExec: partitioning=RoundRobinBatch(N) inserted above filtered scans where a single-partition parent sits above.
cargo test -p datafusion --test core_integration
run benchmarks

…ream `RowGroupsPrunedParquetOpen::build_stream` used to inline the `build_projection_read_plan` + `reassign_expr_columns` + `make_projector` + `replace_schema` triple right next to the decoder / stream wiring, which made the opener's main orchestration body harder to follow. Move that triple into a new `post_scan_filter` module exposing a single `DecoderProjection::build(projection, physical_file_schema, parquet_schema, output_schema)` entry point that returns the projection mask, projector, and replace_schema flag. The opener becomes a single call. `replace_schema` is now derived from the projector's output schema (rather than the read plan's projected schema) so it stays correct under future widening of the decoder mask. `DecoderBuilderConfig` now carries the projection mask directly (`projection_mask: &ProjectionMask`) instead of the full `ParquetReadPlan`, since the read plan's `projected_schema` is no longer needed in this layer. No behaviour change. All existing parquet tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`build_row_filter` (and its `RowFilterGenerator` wrapper) silently dropped conjuncts that `FilterCandidateBuilder::build` rejected (`Ok(None)` was `.flatten()`-ed away) and swallowed whole-build errors. By the time `build_row_filter` runs, `ParquetSource::try_pushdown_filters` has already accepted the filter and the parent `FilterExec` has been removed, so those dropped conjuncts were never applied anywhere — wrong results. Most reproducible trigger: the per-file expr adapter rewrites a predicate that was pushable at *table schema* time into something the `PushdownChecker` rejects at *physical file schema* time (schema evolution / coercion / whole-struct references introduced by the rewrite). Surface the rejected conjuncts instead of dropping them: - `build_row_filter` now returns `Result<(Option<RowFilter>, Vec<Arc<dyn PhysicalExpr>>)>`. The second element is the conjuncts it could not place. Bench / in-file test call sites updated. - `RowFilterGenerator` exposes `rejected_conjuncts()`. On a whole-file build error it routes every conjunct through that list, so an error no longer relaxes the predicate. - `DecoderProjection::build` grows a `post_scan_conjuncts` parameter and a `post_scan_filter: Option<PostScanFilter>` field. When non-empty it widens the decoder mask (over the user projection ∪ post-scan filter columns), rebases the conjuncts onto the stream schema, and returns a `PostScanFilter` that the stream applies to every decoded batch with SQL `WHERE` semantics (mirroring `FilterExec`'s `batch_filter`). - `PushDecoderStreamState` carries the optional `PostScanFilter` and applies it in the `DecodeResult::Data` arm, skipping empty batches. - The decoder-local LIMIT is unsafe with a post-scan filter (the decoder would short-circuit before the filter rejects enough rows), so the opener routes the limit to `remaining_limit` whenever a post-scan filter is present. - New `post_scan_rows_pruned` / `post_scan_rows_matched` counters and `post_scan_filter_eval_time` `Time` on `ParquetFileMetrics`, mirroring the existing `pushdown_rows_*` / `row_pushdown_eval_time` so `EXPLAIN ANALYZE` keeps surfacing filter cost. Two regression tests: - `build_row_filter_surfaces_rejected_struct_conjunct` (`row_filter.rs`) asserts the new API contract directly — the rejected struct conjunct is returned, not dropped. - `rejected_struct_conjunct_runs_post_scan_not_dropped` (`opener/mod.rs`) is end-to-end: with `pushdown_filters=true` and a `s IS NOT NULL` predicate over a struct column where one row is NULL, `main` returns 3 rows (conjunct silently dropped, predicate relaxed); after this fix it correctly returns 2. The `pushdown_filters = false` path is intentionally unchanged in this commit — `try_pushdown_filters` still leaves the `FilterExec` above the scan in that case. Always-accepting filters and removing the `FilterExec` unconditionally is a separate behaviour change in a follow-up commit. `push_down_filter_parquet.slt` updated for the new `post_scan_rows_*` metric lines on `EXPLAIN ANALYZE` output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…st-scan `ParquetSource::try_pushdown_filters` always returns the per-filter `Yes` / `No` discriminant from `can_expr_be_pushed_down_with_schemas`, regardless of the `pushdown_filters` config. The parent `FilterExec` is always removed for pushable filters, and the scan owns the predicate. The opener routes the predicate to the post-scan filter when `pushdown_filters = false`, in addition to the rejected-conjunct path that already exists for `pushdown_filters = true`: - `pushdown_filters = true` → row-filterable conjuncts via the parquet `RowFilter`; any rejected conjuncts via the post-scan filter (the correctness fix from the previous commit). - `pushdown_filters = false` → the whole predicate runs as a post-scan filter on decoded batches (behaviorally identical to a `FilterExec`). The `pushdown_filters` config keeps its meaning ("build a parquet `RowFilter`"); doc comments updated. Plan / test consequences (all results unchanged, plan shape and metrics change): - The `FilterExec` no longer appears above a `DataSourceExec` for pushable parquet filters. The predicate appears as `predicate=…` on the `DataSourceExec`. Parquet `.slt` files are regenerated to reflect this (clickbench, push_down_filter_parquet, projection_pushdown, parquet*, etc.). Spurious whitespace churn from `--complete` was reverted. - Opener / integration tests that asserted "row group not pruned ⇒ all rows returned" (e.g. `a = 1` over `[1, 2, 3]` returning 3 rows) are updated to reflect the matching-row count, since the scan now applies the predicate row-level via the post-scan filter. - `FilterExec: id@0 = 1` assertions in DataFrame / view tests become `predicate=id@0 = 1` on the `DataSourceExec`. - Insta inline snapshots in `parquet.rs` and `explain_analyze.rs` are re-accepted (`output_rows=8` → `output_rows=5` plus `post_scan_rows_pruned=3`, multi-line plans collapse where the `FilterExec`/`RepartitionExec` chain is gone). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a parquet scan owns a filter and runs it post-decode inside the scan thread (which the post-scan-filter work in apache#22384 introduces for the `pushdown_filters = false` case), there is no sibling `FilterExec` above the scan, and `EnforceDistribution` no longer inserts the `RoundRobinBatch(target_partitions)` repartition it used to trigger from the filter's `benefits_from_input_partitioning`. Single-partition consumers — `SortExec`, `CoalescePartitionsExec`, a `CollectLeft` hash join build — therefore inherit a single-thread scan + filter, even when the cluster has plenty of idle cores. Add `ExecutionPlan::benefits_from_output_partitioning() -> bool` (default `false`) as the symmetric counterpart of `benefits_from_input_partitioning`. The optimizer consults it in the same branch that already decides whether to wrap a child in a round-robin, so the existing `add_roundrobin_on_top` path does the work — no special handling in `repartitioned()` or `DistributionContext` bookkeeping. Wire it through the data-source stack: ExecutionPlan ─┬─ DataSourceExec -> DataSource::benefits_from_output_partitioning │ DataSource ─── FileScanConfig -> FileSource::benefits_from_output_partitioning │ FileSource ─── ParquetSource -> predicate.is_some() && !pushdown_filters() With `pushdown_filters = true` parquet evaluates conjuncts via `RowFilter` during decode (so the round-robin wouldn't help and would also defeat limit pushdown), hence the gate. Restores the parallelism a sibling `FilterExec` used to provide. On TPC-DS SF1 (12 cores, with `enable_join_dynamic_filter_pushdown=false` + `repartition_file_min_size=1 MiB` applied via the companion PRs) the slower-than-main query count drops from 18 → 2 (and the residuals are ~3-5% noise around the post-scan filter's fixed per-batch cost).

adriangb · 2026-05-21T19:20:58Z

run benchmarks

adriangbot · 2026-05-21T19:23:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-265-8dkwq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-21T19:24:01Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-266-lhz9l 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-21T19:24:19Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-264-q6phn 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-21T19:38:21Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ benefits-from-output-partitioning ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.94 / 40.24 ±1.20 / 42.17 ms │    38.71 / 39.79 ±1.19 / 41.82 ms │    no change │
│ QQuery 2  │ 20.56 / 21.17 ±0.57 / 22.22 ms │    20.05 / 20.28 ±0.22 / 20.70 ms │    no change │
│ QQuery 3  │ 33.82 / 35.49 ±1.42 / 37.41 ms │    55.36 / 55.84 ±0.49 / 56.74 ms │ 1.57x slower │
│ QQuery 4  │ 17.74 / 17.98 ±0.21 / 18.36 ms │    19.34 / 19.54 ±0.15 / 19.80 ms │ 1.09x slower │
│ QQuery 5  │ 42.83 / 43.39 ±0.32 / 43.73 ms │    63.65 / 65.83 ±2.41 / 70.51 ms │ 1.52x slower │
│ QQuery 6  │ 16.57 / 16.88 ±0.26 / 17.23 ms │    16.45 / 16.94 ±0.43 / 17.63 ms │    no change │
│ QQuery 7  │ 45.76 / 47.77 ±1.80 / 50.84 ms │    57.18 / 58.29 ±1.49 / 61.16 ms │ 1.22x slower │
│ QQuery 8  │ 45.41 / 45.78 ±0.22 / 46.08 ms │    64.04 / 64.44 ±0.27 / 64.81 ms │ 1.41x slower │
│ QQuery 9  │ 50.20 / 51.79 ±1.67 / 54.88 ms │    73.92 / 74.61 ±0.70 / 75.87 ms │ 1.44x slower │
│ QQuery 10 │ 64.21 / 64.92 ±0.91 / 66.71 ms │    70.86 / 72.79 ±2.90 / 78.53 ms │ 1.12x slower │
│ QQuery 11 │ 13.62 / 14.23 ±0.74 / 15.65 ms │    14.22 / 14.67 ±0.62 / 15.89 ms │    no change │
│ QQuery 12 │ 24.54 / 24.93 ±0.45 / 25.74 ms │    33.93 / 34.52 ±0.79 / 36.06 ms │ 1.38x slower │
│ QQuery 13 │ 34.06 / 36.13 ±2.06 / 39.96 ms │    46.85 / 49.09 ±2.63 / 54.12 ms │ 1.36x slower │
│ QQuery 14 │ 25.74 / 25.97 ±0.19 / 26.27 ms │    37.11 / 37.87 ±0.87 / 39.54 ms │ 1.46x slower │
│ QQuery 15 │ 31.87 / 32.31 ±0.59 / 33.45 ms │    32.73 / 33.19 ±0.51 / 34.13 ms │    no change │
│ QQuery 16 │ 15.24 / 15.33 ±0.09 / 15.50 ms │    18.82 / 18.84 ±0.02 / 18.88 ms │ 1.23x slower │
│ QQuery 17 │ 75.26 / 77.26 ±1.69 / 79.29 ms │ 158.24 / 160.32 ±2.38 / 163.34 ms │ 2.08x slower │
│ QQuery 18 │ 62.94 / 64.30 ±1.10 / 66.24 ms │    83.40 / 84.12 ±0.91 / 85.90 ms │ 1.31x slower │
│ QQuery 19 │ 35.68 / 36.08 ±0.67 / 37.42 ms │    37.79 / 37.96 ±0.11 / 38.14 ms │ 1.05x slower │
│ QQuery 20 │ 38.45 / 39.38 ±0.72 / 40.30 ms │    47.16 / 48.88 ±2.49 / 53.82 ms │ 1.24x slower │
│ QQuery 21 │ 56.11 / 58.47 ±2.90 / 64.08 ms │    57.43 / 57.94 ±0.53 / 58.67 ms │    no change │
│ QQuery 22 │ 23.58 / 24.35 ±0.75 / 25.55 ms │    30.55 / 30.96 ±0.31 / 31.40 ms │ 1.27x slower │
└───────────┴────────────────────────────────┴───────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │  834.15ms │
│ Total Time (benefits-from-output-partitioning)   │ 1096.71ms │
│ Average Time (HEAD)                              │   37.92ms │
│ Average Time (benefits-from-output-partitioning) │   49.85ms │
│ Queries Faster                                   │         0 │
│ Queries Slower                                   │        16 │
│ Queries with No Change                           │         6 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	30.0s
CPU sys	2.3s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	10.0s
Peak memory	5.5 GiB
Avg memory	4.8 GiB
CPU user	41.4s
CPU sys	2.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-21T19:40:22Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     benefits-from-output-partitioning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.88 / 7.43 ±0.90 / 9.22 ms │           6.59 / 7.13 ±0.87 / 8.86 ms │     no change │
│ QQuery 2  │        83.53 / 83.98 ±0.26 / 84.28 ms │        48.40 / 48.91 ±0.41 / 49.63 ms │ +1.72x faster │
│ QQuery 3  │        30.24 / 30.76 ±0.31 / 31.08 ms │        32.31 / 32.59 ±0.24 / 33.00 ms │  1.06x slower │
│ QQuery 4  │    555.09 / 577.58 ±17.72 / 608.61 ms │     346.25 / 353.89 ±5.49 / 362.40 ms │ +1.63x faster │
│ QQuery 5  │        55.62 / 56.08 ±0.31 / 56.57 ms │        81.76 / 82.54 ±0.65 / 83.42 ms │  1.47x slower │
│ QQuery 6  │        38.87 / 39.46 ±0.47 / 40.29 ms │        36.54 / 37.14 ±0.35 / 37.56 ms │ +1.06x faster │
│ QQuery 7  │     114.81 / 116.75 ±2.47 / 121.49 ms │     140.95 / 144.04 ±2.73 / 149.09 ms │  1.23x slower │
│ QQuery 8  │        41.08 / 41.52 ±0.36 / 42.16 ms │        19.50 / 19.66 ±0.14 / 19.89 ms │ +2.11x faster │
│ QQuery 9  │        56.85 / 57.69 ±0.66 / 58.62 ms │        56.72 / 59.03 ±1.44 / 61.18 ms │     no change │
│ QQuery 10 │        85.20 / 85.95 ±0.53 / 86.77 ms │     112.93 / 114.62 ±1.65 / 117.61 ms │  1.33x slower │
│ QQuery 11 │     368.71 / 373.61 ±3.93 / 378.59 ms │     229.54 / 234.22 ±3.07 / 238.82 ms │ +1.60x faster │
│ QQuery 12 │        30.12 / 30.55 ±0.36 / 31.11 ms │        25.84 / 26.03 ±0.24 / 26.49 ms │ +1.17x faster │
│ QQuery 13 │     134.78 / 135.45 ±0.48 / 136.21 ms │     219.49 / 221.23 ±1.60 / 223.98 ms │  1.63x slower │
│ QQuery 14 │     517.28 / 522.76 ±4.92 / 529.30 ms │     494.69 / 499.13 ±3.64 / 505.54 ms │     no change │
│ QQuery 15 │        65.70 / 66.11 ±0.51 / 67.12 ms │        30.10 / 30.50 ±0.34 / 30.95 ms │ +2.17x faster │
│ QQuery 16 │           7.58 / 7.66 ±0.09 / 7.82 ms │          7.23 / 7.94 ±1.33 / 10.59 ms │     no change │
│ QQuery 17 │        83.23 / 84.16 ±0.58 / 84.98 ms │     136.18 / 137.16 ±0.62 / 137.96 ms │  1.63x slower │
│ QQuery 18 │     155.13 / 156.24 ±1.12 / 158.15 ms │     359.90 / 363.57 ±2.53 / 367.17 ms │  2.33x slower │
│ QQuery 19 │        42.63 / 42.92 ±0.24 / 43.25 ms │        55.31 / 55.99 ±0.43 / 56.65 ms │  1.30x slower │
│ QQuery 20 │        37.32 / 37.79 ±0.43 / 38.31 ms │        28.79 / 29.00 ±0.11 / 29.12 ms │ +1.30x faster │
│ QQuery 21 │        18.94 / 19.10 ±0.18 / 19.43 ms │        17.66 / 17.94 ±0.23 / 18.21 ms │ +1.06x faster │
│ QQuery 22 │        64.99 / 65.70 ±0.55 / 66.66 ms │        66.22 / 67.59 ±0.79 / 68.67 ms │     no change │
│ QQuery 23 │    509.76 / 545.90 ±27.27 / 575.16 ms │    368.82 / 389.24 ±17.32 / 409.49 ms │ +1.40x faster │
│ QQuery 24 │     243.03 / 250.72 ±6.68 / 261.19 ms │     582.06 / 590.32 ±8.40 / 601.64 ms │  2.35x slower │
│ QQuery 25 │     116.93 / 117.67 ±0.75 / 119.04 ms │     160.93 / 161.35 ±0.35 / 161.81 ms │  1.37x slower │
│ QQuery 26 │        72.56 / 73.37 ±0.49 / 74.09 ms │        86.88 / 88.36 ±1.76 / 91.80 ms │  1.20x slower │
│ QQuery 27 │           7.55 / 7.65 ±0.10 / 7.81 ms │           7.42 / 7.61 ±0.15 / 7.79 ms │     no change │
│ QQuery 28 │        60.06 / 63.43 ±1.72 / 64.72 ms │        59.85 / 62.99 ±2.51 / 65.55 ms │     no change │
│ QQuery 29 │     100.37 / 102.73 ±3.08 / 108.55 ms │     170.53 / 175.14 ±4.89 / 184.42 ms │  1.70x slower │
│ QQuery 30 │        31.99 / 32.41 ±0.38 / 33.11 ms │        37.49 / 38.18 ±0.64 / 38.95 ms │  1.18x slower │
│ QQuery 31 │     116.26 / 117.97 ±2.68 / 123.29 ms │     156.78 / 158.71 ±1.49 / 161.25 ms │  1.35x slower │
│ QQuery 32 │        23.32 / 23.59 ±0.22 / 23.95 ms │        23.61 / 24.22 ±0.37 / 24.74 ms │     no change │
│ QQuery 33 │        42.75 / 43.20 ±0.28 / 43.54 ms │        50.50 / 51.18 ±0.46 / 51.85 ms │  1.18x slower │
│ QQuery 34 │        11.04 / 11.34 ±0.23 / 11.73 ms │        10.72 / 10.94 ±0.22 / 11.33 ms │     no change │
│ QQuery 35 │        87.67 / 87.95 ±0.24 / 88.25 ms │     113.18 / 114.87 ±2.01 / 117.65 ms │  1.31x slower │
│ QQuery 36 │           7.30 / 7.41 ±0.12 / 7.63 ms │           7.00 / 7.14 ±0.15 / 7.38 ms │     no change │
│ QQuery 37 │          8.10 / 8.76 ±0.70 / 10.04 ms │           7.28 / 7.55 ±0.18 / 7.78 ms │ +1.16x faster │
│ QQuery 38 │        78.06 / 78.47 ±0.35 / 78.95 ms │        81.86 / 85.07 ±2.55 / 88.38 ms │  1.08x slower │
│ QQuery 39 │     112.65 / 117.06 ±3.55 / 121.58 ms │       97.92 / 99.55 ±1.21 / 101.43 ms │ +1.18x faster │
│ QQuery 40 │        24.97 / 26.30 ±1.73 / 29.70 ms │        22.88 / 23.14 ±0.20 / 23.41 ms │ +1.14x faster │
│ QQuery 41 │        15.60 / 15.73 ±0.18 / 16.08 ms │        15.96 / 16.23 ±0.15 / 16.37 ms │     no change │
│ QQuery 42 │        25.47 / 25.89 ±0.35 / 26.49 ms │        32.60 / 32.81 ±0.22 / 33.21 ms │  1.27x slower │
│ QQuery 43 │           5.71 / 5.81 ±0.13 / 6.06 ms │           5.39 / 5.57 ±0.13 / 5.77 ms │     no change │
│ QQuery 44 │        11.62 / 11.80 ±0.10 / 11.91 ms │        11.33 / 11.51 ±0.12 / 11.70 ms │     no change │
│ QQuery 45 │        44.02 / 45.76 ±0.97 / 46.70 ms │        31.76 / 32.00 ±0.22 / 32.33 ms │ +1.43x faster │
│ QQuery 46 │        14.09 / 14.54 ±0.29 / 15.00 ms │        14.28 / 14.60 ±0.32 / 15.17 ms │     no change │
│ QQuery 47 │     251.80 / 254.83 ±1.58 / 256.31 ms │     243.29 / 247.81 ±3.21 / 253.11 ms │     no change │
│ QQuery 48 │     105.81 / 106.60 ±0.51 / 107.40 ms │     186.44 / 187.69 ±1.31 / 190.12 ms │  1.76x slower │
│ QQuery 49 │        82.63 / 83.58 ±0.58 / 84.10 ms │        80.93 / 81.68 ±0.67 / 82.80 ms │     no change │
│ QQuery 50 │        61.66 / 63.12 ±2.58 / 68.26 ms │     139.63 / 141.39 ±1.61 / 144.12 ms │  2.24x slower │
│ QQuery 51 │        95.44 / 96.23 ±0.65 / 97.27 ms │       98.71 / 99.98 ±1.22 / 102.21 ms │     no change │
│ QQuery 52 │        25.38 / 25.46 ±0.10 / 25.66 ms │        32.42 / 32.91 ±0.30 / 33.23 ms │  1.29x slower │
│ QQuery 53 │        31.44 / 31.53 ±0.06 / 31.61 ms │        35.31 / 35.60 ±0.17 / 35.83 ms │  1.13x slower │
│ QQuery 54 │        57.07 / 57.36 ±0.31 / 57.89 ms │        33.42 / 34.50 ±0.88 / 35.97 ms │ +1.66x faster │
│ QQuery 55 │        24.64 / 24.97 ±0.23 / 25.34 ms │        31.49 / 31.70 ±0.15 / 31.95 ms │  1.27x slower │
│ QQuery 56 │        41.39 / 41.63 ±0.18 / 41.80 ms │        53.59 / 54.21 ±0.37 / 54.73 ms │  1.30x slower │
│ QQuery 57 │     183.80 / 185.65 ±1.14 / 187.15 ms │     163.09 / 164.01 ±0.83 / 165.52 ms │ +1.13x faster │
│ QQuery 58 │     120.48 / 121.34 ±0.70 / 122.35 ms │        82.91 / 83.18 ±0.32 / 83.74 ms │ +1.46x faster │
│ QQuery 59 │     119.87 / 120.42 ±0.58 / 121.30 ms │        79.56 / 80.03 ±0.45 / 80.88 ms │ +1.50x faster │
│ QQuery 60 │        41.28 / 41.65 ±0.41 / 42.46 ms │        48.81 / 49.37 ±0.31 / 49.70 ms │  1.19x slower │
│ QQuery 61 │        14.54 / 15.11 ±1.02 / 17.15 ms │        13.66 / 13.77 ±0.11 / 13.93 ms │ +1.10x faster │
│ QQuery 62 │        47.57 / 48.09 ±0.41 / 48.72 ms │        41.47 / 41.77 ±0.23 / 42.13 ms │ +1.15x faster │
│ QQuery 63 │        31.93 / 32.33 ±0.75 / 33.83 ms │        35.57 / 35.76 ±0.13 / 35.96 ms │  1.11x slower │
│ QQuery 64 │     474.56 / 481.06 ±5.01 / 487.71 ms │    906.50 / 914.89 ±10.51 / 935.29 ms │  1.90x slower │
│ QQuery 65 │     146.69 / 148.68 ±2.23 / 152.64 ms │ 1419.98 / 1467.95 ±27.10 / 1495.16 ms │  9.87x slower │
│ QQuery 66 │        85.36 / 86.48 ±0.88 / 87.91 ms │        73.33 / 73.78 ±0.28 / 74.04 ms │ +1.17x faster │
│ QQuery 67 │     263.55 / 268.23 ±3.37 / 272.47 ms │     274.60 / 281.80 ±5.02 / 287.18 ms │  1.05x slower │
│ QQuery 68 │        14.55 / 14.69 ±0.12 / 14.89 ms │        14.89 / 15.06 ±0.13 / 15.26 ms │     no change │
│ QQuery 69 │        79.50 / 79.85 ±0.36 / 80.49 ms │     101.87 / 102.64 ±0.55 / 103.31 ms │  1.29x slower │
│ QQuery 70 │     108.81 / 113.28 ±4.58 / 119.50 ms │     116.61 / 121.10 ±3.63 / 127.51 ms │  1.07x slower │
│ QQuery 71 │        36.60 / 37.30 ±0.46 / 38.03 ms │        44.73 / 45.51 ±0.58 / 46.15 ms │  1.22x slower │
│ QQuery 72 │ 2148.90 / 2227.84 ±54.81 / 2320.53 ms │     235.02 / 240.51 ±3.83 / 246.38 ms │ +9.26x faster │
│ QQuery 73 │        10.50 / 10.58 ±0.08 / 10.72 ms │        10.18 / 10.44 ±0.22 / 10.73 ms │     no change │
│ QQuery 74 │     208.65 / 215.24 ±6.40 / 226.46 ms │     148.16 / 151.07 ±2.45 / 154.45 ms │ +1.42x faster │
│ QQuery 75 │     154.27 / 157.60 ±2.39 / 161.73 ms │     205.59 / 207.21 ±2.41 / 211.95 ms │  1.31x slower │
│ QQuery 76 │        37.90 / 38.30 ±0.47 / 39.21 ms │        43.52 / 43.78 ±0.28 / 44.32 ms │  1.14x slower │
│ QQuery 77 │        65.44 / 66.67 ±1.30 / 69.06 ms │        73.30 / 74.09 ±1.22 / 76.51 ms │  1.11x slower │
│ QQuery 78 │     201.10 / 203.86 ±1.82 / 206.33 ms │     166.37 / 171.50 ±3.65 / 176.90 ms │ +1.19x faster │
│ QQuery 79 │        71.31 / 72.56 ±1.30 / 74.88 ms │        84.06 / 84.93 ±0.60 / 85.85 ms │  1.17x slower │
│ QQuery 80 │     105.90 / 107.65 ±1.04 / 108.86 ms │      99.29 / 102.71 ±4.02 / 108.95 ms │     no change │
│ QQuery 81 │        26.19 / 27.91 ±2.95 / 33.81 ms │        28.90 / 29.47 ±0.43 / 29.95 ms │  1.06x slower │
│ QQuery 82 │        18.06 / 18.65 ±0.50 / 19.49 ms │        18.95 / 19.12 ±0.16 / 19.38 ms │     no change │
│ QQuery 83 │        40.69 / 41.40 ±0.86 / 43.06 ms │        36.56 / 37.11 ±0.35 / 37.60 ms │ +1.12x faster │
│ QQuery 84 │        45.27 / 45.55 ±0.17 / 45.72 ms │        58.84 / 60.45 ±1.17 / 61.52 ms │  1.33x slower │
│ QQuery 85 │     141.92 / 143.59 ±1.21 / 145.42 ms │     251.39 / 252.89 ±1.69 / 256.09 ms │  1.76x slower │
│ QQuery 86 │        27.28 / 27.48 ±0.19 / 27.81 ms │        30.05 / 30.96 ±1.01 / 32.89 ms │  1.13x slower │
│ QQuery 87 │        74.46 / 76.02 ±0.82 / 76.73 ms │        85.56 / 86.39 ±0.98 / 88.27 ms │  1.14x slower │
│ QQuery 88 │        67.93 / 68.68 ±0.78 / 70.10 ms │        70.18 / 70.61 ±0.23 / 70.83 ms │     no change │
│ QQuery 89 │        37.92 / 39.05 ±0.90 / 40.23 ms │        47.54 / 49.40 ±3.06 / 55.50 ms │  1.26x slower │
│ QQuery 90 │        18.78 / 18.94 ±0.14 / 19.14 ms │        20.04 / 20.24 ±0.15 / 20.48 ms │  1.07x slower │
│ QQuery 91 │        54.27 / 55.09 ±0.52 / 55.87 ms │        69.38 / 69.93 ±0.32 / 70.26 ms │  1.27x slower │
│ QQuery 92 │        31.52 / 31.94 ±0.29 / 32.38 ms │        36.88 / 37.30 ±0.24 / 37.64 ms │  1.17x slower │
│ QQuery 93 │        53.10 / 54.41 ±0.81 / 55.44 ms │        52.57 / 53.56 ±0.98 / 55.39 ms │     no change │
│ QQuery 94 │        40.56 / 41.37 ±0.78 / 42.74 ms │        47.12 / 47.82 ±0.38 / 48.17 ms │  1.16x slower │
│ QQuery 95 │        87.59 / 88.76 ±0.83 / 89.71 ms │     129.38 / 130.09 ±0.61 / 131.05 ms │  1.47x slower │
│ QQuery 96 │        25.66 / 25.77 ±0.14 / 26.06 ms │        27.97 / 28.71 ±0.81 / 29.95 ms │  1.11x slower │
│ QQuery 97 │        48.40 / 49.69 ±1.74 / 53.08 ms │        51.80 / 52.55 ±0.42 / 53.00 ms │  1.06x slower │
│ QQuery 98 │        44.94 / 46.31 ±1.10 / 47.99 ms │        36.37 / 36.58 ±0.16 / 36.79 ms │ +1.27x faster │
│ QQuery 99 │        72.01 / 72.74 ±0.81 / 74.25 ms │        60.99 / 61.19 ±0.16 / 61.39 ms │ +1.19x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 11127.86ms │
│ Total Time (benefits-from-output-partitioning)   │ 11496.85ms │
│ Average Time (HEAD)                              │   112.40ms │
│ Average Time (benefits-from-output-partitioning) │   116.13ms │
│ Queries Faster                                   │         27 │
│ Queries Slower                                   │         49 │
│ Queries with No Change                           │         23 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	60.0s
Peak memory	6.9 GiB
Avg memory	6.1 GiB
CPU user	245.7s
CPU sys	6.1s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	60.0s
Peak memory	6.9 GiB
Avg memory	6.2 GiB
CPU user	155.5s
CPU sys	5.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-21T19:42:45Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     benefits-from-output-partitioning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.24 / 4.70 ±6.80 / 18.29 ms │          1.25 / 4.70 ±6.84 / 18.39 ms │     no change │
│ QQuery 1  │        12.49 / 12.77 ±0.16 / 12.94 ms │        13.16 / 13.36 ±0.12 / 13.53 ms │     no change │
│ QQuery 2  │        35.32 / 35.69 ±0.21 / 35.88 ms │        36.41 / 36.77 ±0.28 / 37.25 ms │     no change │
│ QQuery 3  │        30.59 / 31.14 ±0.61 / 32.30 ms │        30.78 / 31.19 ±0.44 / 31.79 ms │     no change │
│ QQuery 4  │     218.81 / 224.98 ±3.97 / 231.21 ms │     217.92 / 223.47 ±4.30 / 228.57 ms │     no change │
│ QQuery 5  │     268.04 / 270.47 ±1.89 / 272.55 ms │     264.73 / 270.21 ±3.58 / 275.65 ms │     no change │
│ QQuery 6  │           1.28 / 1.42 ±0.21 / 1.83 ms │           1.29 / 1.44 ±0.22 / 1.87 ms │     no change │
│ QQuery 7  │        13.80 / 13.85 ±0.04 / 13.92 ms │        14.22 / 14.48 ±0.16 / 14.73 ms │     no change │
│ QQuery 8  │     314.89 / 321.68 ±5.18 / 327.59 ms │     314.39 / 317.41 ±1.80 / 319.74 ms │     no change │
│ QQuery 9  │    430.50 / 449.63 ±13.18 / 468.51 ms │     448.13 / 455.42 ±8.89 / 472.35 ms │     no change │
│ QQuery 10 │        69.96 / 70.75 ±0.61 / 71.55 ms │        71.44 / 71.57 ±0.11 / 71.77 ms │     no change │
│ QQuery 11 │        80.53 / 81.80 ±0.74 / 82.81 ms │        80.94 / 81.87 ±0.67 / 82.58 ms │     no change │
│ QQuery 12 │     261.53 / 266.61 ±4.04 / 273.47 ms │     252.50 / 259.05 ±6.17 / 268.44 ms │     no change │
│ QQuery 13 │     366.45 / 374.52 ±9.12 / 392.27 ms │    379.29 / 405.09 ±25.46 / 449.33 ms │  1.08x slower │
│ QQuery 14 │     275.64 / 280.16 ±4.21 / 286.33 ms │     269.66 / 274.20 ±6.25 / 286.39 ms │     no change │
│ QQuery 15 │     262.75 / 267.39 ±4.76 / 274.40 ms │    258.95 / 272.36 ±14.27 / 299.36 ms │     no change │
│ QQuery 16 │     602.03 / 607.95 ±3.30 / 611.18 ms │     609.01 / 611.62 ±1.82 / 614.71 ms │     no change │
│ QQuery 17 │     600.74 / 609.07 ±6.41 / 616.74 ms │     607.64 / 614.08 ±7.46 / 628.24 ms │     no change │
│ QQuery 18 │  1232.45 / 1244.44 ±6.63 / 1250.73 ms │ 1226.94 / 1246.90 ±17.60 / 1274.01 ms │     no change │
│ QQuery 19 │        28.02 / 28.27 ±0.23 / 28.68 ms │        27.08 / 31.94 ±5.61 / 40.69 ms │  1.13x slower │
│ QQuery 20 │    515.34 / 527.02 ±17.01 / 560.84 ms │    513.35 / 528.33 ±17.89 / 562.86 ms │     no change │
│ QQuery 21 │     591.08 / 594.75 ±4.84 / 603.97 ms │     591.58 / 594.28 ±3.75 / 601.55 ms │     no change │
│ QQuery 22 │ 1046.00 / 1065.70 ±15.07 / 1089.58 ms │  1051.42 / 1064.06 ±7.26 / 1074.07 ms │     no change │
│ QQuery 23 │ 3151.05 / 3178.54 ±17.64 / 3197.76 ms │     706.33 / 722.10 ±8.65 / 729.46 ms │ +4.40x faster │
│ QQuery 24 │        42.33 / 43.69 ±1.02 / 45.27 ms │        39.78 / 43.14 ±6.14 / 55.41 ms │     no change │
│ QQuery 25 │     110.70 / 112.43 ±1.78 / 115.84 ms │     107.23 / 110.22 ±2.69 / 114.76 ms │     no change │
│ QQuery 26 │        42.63 / 42.98 ±0.54 / 44.06 ms │        41.59 / 42.37 ±1.39 / 45.14 ms │     no change │
│ QQuery 27 │     665.96 / 671.33 ±4.35 / 676.34 ms │     644.48 / 650.88 ±5.26 / 658.03 ms │     no change │
│ QQuery 28 │ 2996.10 / 3016.95 ±12.56 / 3034.36 ms │  3008.42 / 3019.59 ±8.50 / 3030.30 ms │     no change │
│ QQuery 29 │        41.26 / 45.01 ±5.52 / 55.90 ms │      41.68 / 54.15 ±24.48 / 103.11 ms │  1.20x slower │
│ QQuery 30 │     297.06 / 300.64 ±3.86 / 307.89 ms │     302.21 / 306.47 ±4.28 / 313.32 ms │     no change │
│ QQuery 31 │     282.28 / 291.44 ±9.82 / 309.31 ms │     311.47 / 321.58 ±8.18 / 332.34 ms │  1.10x slower │
│ QQuery 32 │    901.68 / 916.53 ±10.51 / 928.96 ms │    886.90 / 907.43 ±13.50 / 920.89 ms │     no change │
│ QQuery 33 │ 1396.36 / 1410.93 ±10.57 / 1423.68 ms │ 1366.18 / 1403.96 ±20.35 / 1426.30 ms │     no change │
│ QQuery 34 │ 1417.23 / 1443.94 ±20.30 / 1477.03 ms │ 1401.07 / 1417.44 ±14.50 / 1434.86 ms │     no change │
│ QQuery 35 │     274.63 / 281.18 ±6.91 / 294.21 ms │    271.16 / 290.85 ±25.52 / 341.32 ms │     no change │
│ QQuery 36 │        64.11 / 69.33 ±5.30 / 79.20 ms │        63.67 / 70.92 ±7.07 / 84.06 ms │     no change │
│ QQuery 37 │        35.00 / 37.74 ±2.97 / 42.66 ms │        35.00 / 35.97 ±1.26 / 38.42 ms │     no change │
│ QQuery 38 │        39.89 / 45.36 ±4.10 / 52.29 ms │        41.28 / 44.07 ±2.61 / 49.01 ms │     no change │
│ QQuery 39 │     138.16 / 147.18 ±5.92 / 154.81 ms │     133.60 / 142.16 ±8.41 / 157.21 ms │     no change │
│ QQuery 40 │        13.84 / 16.14 ±3.99 / 24.11 ms │        13.84 / 14.05 ±0.21 / 14.45 ms │ +1.15x faster │
│ QQuery 41 │        13.52 / 14.75 ±1.74 / 18.14 ms │        13.41 / 13.54 ±0.08 / 13.62 ms │ +1.09x faster │
│ QQuery 42 │        12.93 / 18.27 ±6.78 / 29.54 ms │        13.42 / 15.33 ±3.63 / 22.57 ms │ +1.19x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 19489.11ms │
│ Total Time (benefits-from-output-partitioning)   │ 17049.98ms │
│ Average Time (HEAD)                              │   453.24ms │
│ Average Time (benefits-from-output-partitioning) │   396.51ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │          4 │
│ Queries with No Change                           │         35 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	30.9 GiB
Avg memory	23.3 GiB
CPU user	1017.4s
CPU sys	61.3s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	90.0s
Peak memory	31.2 GiB
Avg memory	23.3 GiB
CPU user	882.7s
CPU sys	50.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangb and others added 5 commits May 20, 2026 00:53

fmt

1404755

github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ExecutionPlan::benefits_from_output_partitioning#22440

feat: add ExecutionPlan::benefits_from_output_partitioning#22440
adriangb wants to merge 5 commits into
apache:mainfrom
adriangb:benefits-from-output-partitioning

adriangb commented May 21, 2026

Uh oh!

adriangb commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adriangb commented May 21, 2026

Summary

Why

Wiring

Benchmark numbers (12 cores, SF1)

Test plan

Uh oh!

adriangb commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

adriangbot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants