Skip to content

feat: add ExecutionPlan::benefits_from_output_partitioning#22440

Draft
adriangb wants to merge 5 commits into
apache:mainfrom
adriangb:benefits-from-output-partitioning
Draft

feat: add ExecutionPlan::benefits_from_output_partitioning#22440
adriangb wants to merge 5 commits into
apache:mainfrom
adriangb:benefits-from-output-partitioning

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

Stacks on top of #22384 — both branches need to land for the diff to be coherent.

Summary

Add ExecutionPlan::benefits_from_output_partitioning() -> bool (default false) as the symmetric counterpart of the existing benefits_from_input_partitioning. The optimizer's EnforceDistribution already inserts a RepartitionExec(RoundRobinBatch(target_partitions)) when a parent's benefits_from_input_partitioning is true. With this addition it also fires when the child itself opts in via benefits_from_output_partitioning — no special handling in repartitioned() or DistributionContext bookkeeping.

Why

When a parquet scan owns a filter and #22384 runs it post-decode inside the scan thread (the pushdown_filters = false path), there is no sibling FilterExec above the scan. Single-partition consumers — SortExec, CoalescePartitionsExec, a CollectLeft hash-join build — therefore inherit a single-thread scan + filter, even when the cluster has plenty of idle cores. The companion PRs (#22438 disabling join dynamic filter pushdown by default, #22439 lowering repartition_file_min_size to 1 MiB) close most of the regression but leave TPC-DS with ~18 queries still slower than main on small dim-table joins where byte-range splitting alone can't reach target_partitions. This PR closes the rest.

Wiring

ExecutionPlan ─┬─ DataSourceExec  -> DataSource::benefits_from_output_partitioning
               │
DataSource ─── FileScanConfig    -> FileSource::benefits_from_output_partitioning
               │
FileSource ─── ParquetSource     -> predicate.is_some() && !pushdown_filters()

The pushdown_filters = true gate is important: with RowFilter doing the work during decode, the round-robin wouldn't help and would also defeat limit-pushdown for ordered scans.

Benchmark numbers (12 cores, SF1)

Run with the companion PRs (#22438 + #22439) applied so the dynamic-filter and split-size doors are open:

Suite PR #22384 alone + this PR
TPC-H slower-than-main 2 2
TPC-DS slower-than-main 18 2
ClickBench slower-than-main 3 4

The remaining residuals (TPC-H Q5 ~3%, TPC-DS Q41 ~4% on a 15 ms query, ClickBench Q13 ~5%) look like fixed-cost per-batch overhead in the post-scan filter path itself and are within run-to-run variance for the rest.

Test plan

  • cargo test --test sqllogictests — all 472 files pass after snapshot updates that all show RepartitionExec: partitioning=RoundRobinBatch(N) inserted above filtered scans where a single-partition parent sits above.
  • cargo test -p datafusion --test core_integration
  • run benchmarks

adriangb and others added 5 commits May 20, 2026 00:53
…ream

`RowGroupsPrunedParquetOpen::build_stream` used to inline the
`build_projection_read_plan` + `reassign_expr_columns` + `make_projector`
+ `replace_schema` triple right next to the decoder / stream wiring,
which made the opener's main orchestration body harder to follow.

Move that triple into a new `post_scan_filter` module exposing a single
`DecoderProjection::build(projection, physical_file_schema, parquet_schema,
output_schema)` entry point that returns the projection mask, projector,
and replace_schema flag. The opener becomes a single call. `replace_schema`
is now derived from the projector's output schema (rather than the read
plan's projected schema) so it stays correct under future widening of the
decoder mask.

`DecoderBuilderConfig` now carries the projection mask directly
(`projection_mask: &ProjectionMask`) instead of the full `ParquetReadPlan`,
since the read plan's `projected_schema` is no longer needed in this layer.

No behaviour change. All existing parquet tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`build_row_filter` (and its `RowFilterGenerator` wrapper) silently dropped
conjuncts that `FilterCandidateBuilder::build` rejected (`Ok(None)` was
`.flatten()`-ed away) and swallowed whole-build errors. By the time
`build_row_filter` runs, `ParquetSource::try_pushdown_filters` has already
accepted the filter and the parent `FilterExec` has been removed, so those
dropped conjuncts were never applied anywhere — wrong results.

Most reproducible trigger: the per-file expr adapter rewrites a predicate
that was pushable at *table schema* time into something the
`PushdownChecker` rejects at *physical file schema* time (schema
evolution / coercion / whole-struct references introduced by the rewrite).

Surface the rejected conjuncts instead of dropping them:

  - `build_row_filter` now returns
    `Result<(Option<RowFilter>, Vec<Arc<dyn PhysicalExpr>>)>`. The second
    element is the conjuncts it could not place. Bench / in-file test call
    sites updated.
  - `RowFilterGenerator` exposes `rejected_conjuncts()`. On a whole-file
    build error it routes every conjunct through that list, so an error no
    longer relaxes the predicate.
  - `DecoderProjection::build` grows a `post_scan_conjuncts` parameter and
    a `post_scan_filter: Option<PostScanFilter>` field. When non-empty it
    widens the decoder mask (over the user projection ∪ post-scan filter
    columns), rebases the conjuncts onto the stream schema, and returns a
    `PostScanFilter` that the stream applies to every decoded batch with
    SQL `WHERE` semantics (mirroring `FilterExec`'s `batch_filter`).
  - `PushDecoderStreamState` carries the optional `PostScanFilter` and
    applies it in the `DecodeResult::Data` arm, skipping empty batches.
  - The decoder-local LIMIT is unsafe with a post-scan filter (the decoder
    would short-circuit before the filter rejects enough rows), so the
    opener routes the limit to `remaining_limit` whenever a post-scan
    filter is present.
  - New `post_scan_rows_pruned` / `post_scan_rows_matched` counters and
    `post_scan_filter_eval_time` `Time` on `ParquetFileMetrics`, mirroring
    the existing `pushdown_rows_*` / `row_pushdown_eval_time` so
    `EXPLAIN ANALYZE` keeps surfacing filter cost.

Two regression tests:

  - `build_row_filter_surfaces_rejected_struct_conjunct` (`row_filter.rs`)
    asserts the new API contract directly — the rejected struct conjunct
    is returned, not dropped.
  - `rejected_struct_conjunct_runs_post_scan_not_dropped` (`opener/mod.rs`)
    is end-to-end: with `pushdown_filters=true` and a `s IS NOT NULL`
    predicate over a struct column where one row is NULL, `main` returns 3
    rows (conjunct silently dropped, predicate relaxed); after this fix it
    correctly returns 2.

The `pushdown_filters = false` path is intentionally unchanged in this
commit — `try_pushdown_filters` still leaves the `FilterExec` above the
scan in that case. Always-accepting filters and removing the `FilterExec`
unconditionally is a separate behaviour change in a follow-up commit.

`push_down_filter_parquet.slt` updated for the new `post_scan_rows_*`
metric lines on `EXPLAIN ANALYZE` output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st-scan

`ParquetSource::try_pushdown_filters` always returns the per-filter
`Yes` / `No` discriminant from `can_expr_be_pushed_down_with_schemas`,
regardless of the `pushdown_filters` config. The parent `FilterExec` is
always removed for pushable filters, and the scan owns the predicate.

The opener routes the predicate to the post-scan filter when
`pushdown_filters = false`, in addition to the rejected-conjunct path
that already exists for `pushdown_filters = true`:

  - `pushdown_filters = true`  → row-filterable conjuncts via the parquet
    `RowFilter`; any rejected conjuncts via the post-scan filter (the
    correctness fix from the previous commit).
  - `pushdown_filters = false` → the whole predicate runs as a post-scan
    filter on decoded batches (behaviorally identical to a `FilterExec`).

The `pushdown_filters` config keeps its meaning ("build a parquet
`RowFilter`"); doc comments updated.

Plan / test consequences (all results unchanged, plan shape and metrics
change):

  - The `FilterExec` no longer appears above a `DataSourceExec` for
    pushable parquet filters. The predicate appears as `predicate=…` on
    the `DataSourceExec`. Parquet `.slt` files are regenerated to reflect
    this (clickbench, push_down_filter_parquet, projection_pushdown,
    parquet*, etc.). Spurious whitespace churn from `--complete` was
    reverted.
  - Opener / integration tests that asserted "row group not pruned ⇒ all
    rows returned" (e.g. `a = 1` over `[1, 2, 3]` returning 3 rows) are
    updated to reflect the matching-row count, since the scan now applies
    the predicate row-level via the post-scan filter.
  - `FilterExec: id@0 = 1` assertions in DataFrame / view tests become
    `predicate=id@0 = 1` on the `DataSourceExec`.
  - Insta inline snapshots in `parquet.rs` and `explain_analyze.rs` are
    re-accepted (`output_rows=8` → `output_rows=5` plus
    `post_scan_rows_pruned=3`, multi-line plans collapse where the
    `FilterExec`/`RepartitionExec` chain is gone).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a parquet scan owns a filter and runs it post-decode inside the
scan thread (which the post-scan-filter work in apache#22384 introduces for
the `pushdown_filters = false` case), there is no sibling `FilterExec`
above the scan, and `EnforceDistribution` no longer inserts the
`RoundRobinBatch(target_partitions)` repartition it used to trigger from
the filter's `benefits_from_input_partitioning`. Single-partition
consumers — `SortExec`, `CoalescePartitionsExec`, a `CollectLeft` hash
join build — therefore inherit a single-thread scan + filter, even when
the cluster has plenty of idle cores.

Add `ExecutionPlan::benefits_from_output_partitioning() -> bool`
(default `false`) as the symmetric counterpart of
`benefits_from_input_partitioning`. The optimizer consults it in the
same branch that already decides whether to wrap a child in a
round-robin, so the existing `add_roundrobin_on_top` path does the
work — no special handling in `repartitioned()` or
`DistributionContext` bookkeeping.

Wire it through the data-source stack:

  ExecutionPlan ─┬─ DataSourceExec  -> DataSource::benefits_from_output_partitioning
                 │
  DataSource ─── FileScanConfig    -> FileSource::benefits_from_output_partitioning
                 │
  FileSource ─── ParquetSource     -> predicate.is_some() && !pushdown_filters()

With `pushdown_filters = true` parquet evaluates conjuncts via `RowFilter`
during decode (so the round-robin wouldn't help and would also defeat
limit pushdown), hence the gate.

Restores the parallelism a sibling `FilterExec` used to provide. On
TPC-DS SF1 (12 cores, with `enable_join_dynamic_filter_pushdown=false`
+ `repartition_file_min_size=1 MiB` applied via the companion PRs) the
slower-than-main query count drops from 18 → 2 (and the residuals are
~3-5% noise around the post-scan filter's fixed per-batch cost).
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmarks

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels May 21, 2026
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-265-8dkwq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-266-lhz9l 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4512018538-264-q6phn 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing benefits-from-output-partitioning (6e5c241) to c8b784a (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ benefits-from-output-partitioning ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.94 / 40.24 ±1.20 / 42.17 ms │    38.71 / 39.79 ±1.19 / 41.82 ms │    no change │
│ QQuery 2  │ 20.56 / 21.17 ±0.57 / 22.22 ms │    20.05 / 20.28 ±0.22 / 20.70 ms │    no change │
│ QQuery 3  │ 33.82 / 35.49 ±1.42 / 37.41 ms │    55.36 / 55.84 ±0.49 / 56.74 ms │ 1.57x slower │
│ QQuery 4  │ 17.74 / 17.98 ±0.21 / 18.36 ms │    19.34 / 19.54 ±0.15 / 19.80 ms │ 1.09x slower │
│ QQuery 5  │ 42.83 / 43.39 ±0.32 / 43.73 ms │    63.65 / 65.83 ±2.41 / 70.51 ms │ 1.52x slower │
│ QQuery 6  │ 16.57 / 16.88 ±0.26 / 17.23 ms │    16.45 / 16.94 ±0.43 / 17.63 ms │    no change │
│ QQuery 7  │ 45.76 / 47.77 ±1.80 / 50.84 ms │    57.18 / 58.29 ±1.49 / 61.16 ms │ 1.22x slower │
│ QQuery 8  │ 45.41 / 45.78 ±0.22 / 46.08 ms │    64.04 / 64.44 ±0.27 / 64.81 ms │ 1.41x slower │
│ QQuery 9  │ 50.20 / 51.79 ±1.67 / 54.88 ms │    73.92 / 74.61 ±0.70 / 75.87 ms │ 1.44x slower │
│ QQuery 10 │ 64.21 / 64.92 ±0.91 / 66.71 ms │    70.86 / 72.79 ±2.90 / 78.53 ms │ 1.12x slower │
│ QQuery 11 │ 13.62 / 14.23 ±0.74 / 15.65 ms │    14.22 / 14.67 ±0.62 / 15.89 ms │    no change │
│ QQuery 12 │ 24.54 / 24.93 ±0.45 / 25.74 ms │    33.93 / 34.52 ±0.79 / 36.06 ms │ 1.38x slower │
│ QQuery 13 │ 34.06 / 36.13 ±2.06 / 39.96 ms │    46.85 / 49.09 ±2.63 / 54.12 ms │ 1.36x slower │
│ QQuery 14 │ 25.74 / 25.97 ±0.19 / 26.27 ms │    37.11 / 37.87 ±0.87 / 39.54 ms │ 1.46x slower │
│ QQuery 15 │ 31.87 / 32.31 ±0.59 / 33.45 ms │    32.73 / 33.19 ±0.51 / 34.13 ms │    no change │
│ QQuery 16 │ 15.24 / 15.33 ±0.09 / 15.50 ms │    18.82 / 18.84 ±0.02 / 18.88 ms │ 1.23x slower │
│ QQuery 17 │ 75.26 / 77.26 ±1.69 / 79.29 ms │ 158.24 / 160.32 ±2.38 / 163.34 ms │ 2.08x slower │
│ QQuery 18 │ 62.94 / 64.30 ±1.10 / 66.24 ms │    83.40 / 84.12 ±0.91 / 85.90 ms │ 1.31x slower │
│ QQuery 19 │ 35.68 / 36.08 ±0.67 / 37.42 ms │    37.79 / 37.96 ±0.11 / 38.14 ms │ 1.05x slower │
│ QQuery 20 │ 38.45 / 39.38 ±0.72 / 40.30 ms │    47.16 / 48.88 ±2.49 / 53.82 ms │ 1.24x slower │
│ QQuery 21 │ 56.11 / 58.47 ±2.90 / 64.08 ms │    57.43 / 57.94 ±0.53 / 58.67 ms │    no change │
│ QQuery 22 │ 23.58 / 24.35 ±0.75 / 25.55 ms │    30.55 / 30.96 ±0.31 / 31.40 ms │ 1.27x slower │
└───────────┴────────────────────────────────┴───────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │  834.15ms │
│ Total Time (benefits-from-output-partitioning)   │ 1096.71ms │
│ Average Time (HEAD)                              │   37.92ms │
│ Average Time (benefits-from-output-partitioning) │   49.85ms │
│ Queries Faster                                   │         0 │
│ Queries Slower                                   │        16 │
│ Queries with No Change                           │         6 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 30.0s
CPU sys 2.3s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 10.0s
Peak memory 5.5 GiB
Avg memory 4.8 GiB
CPU user 41.4s
CPU sys 2.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     benefits-from-output-partitioning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.88 / 7.43 ±0.90 / 9.22 ms │           6.59 / 7.13 ±0.87 / 8.86 ms │     no change │
│ QQuery 2  │        83.53 / 83.98 ±0.26 / 84.28 ms │        48.40 / 48.91 ±0.41 / 49.63 ms │ +1.72x faster │
│ QQuery 3  │        30.24 / 30.76 ±0.31 / 31.08 ms │        32.31 / 32.59 ±0.24 / 33.00 ms │  1.06x slower │
│ QQuery 4  │    555.09 / 577.58 ±17.72 / 608.61 ms │     346.25 / 353.89 ±5.49 / 362.40 ms │ +1.63x faster │
│ QQuery 5  │        55.62 / 56.08 ±0.31 / 56.57 ms │        81.76 / 82.54 ±0.65 / 83.42 ms │  1.47x slower │
│ QQuery 6  │        38.87 / 39.46 ±0.47 / 40.29 ms │        36.54 / 37.14 ±0.35 / 37.56 ms │ +1.06x faster │
│ QQuery 7  │     114.81 / 116.75 ±2.47 / 121.49 ms │     140.95 / 144.04 ±2.73 / 149.09 ms │  1.23x slower │
│ QQuery 8  │        41.08 / 41.52 ±0.36 / 42.16 ms │        19.50 / 19.66 ±0.14 / 19.89 ms │ +2.11x faster │
│ QQuery 9  │        56.85 / 57.69 ±0.66 / 58.62 ms │        56.72 / 59.03 ±1.44 / 61.18 ms │     no change │
│ QQuery 10 │        85.20 / 85.95 ±0.53 / 86.77 ms │     112.93 / 114.62 ±1.65 / 117.61 ms │  1.33x slower │
│ QQuery 11 │     368.71 / 373.61 ±3.93 / 378.59 ms │     229.54 / 234.22 ±3.07 / 238.82 ms │ +1.60x faster │
│ QQuery 12 │        30.12 / 30.55 ±0.36 / 31.11 ms │        25.84 / 26.03 ±0.24 / 26.49 ms │ +1.17x faster │
│ QQuery 13 │     134.78 / 135.45 ±0.48 / 136.21 ms │     219.49 / 221.23 ±1.60 / 223.98 ms │  1.63x slower │
│ QQuery 14 │     517.28 / 522.76 ±4.92 / 529.30 ms │     494.69 / 499.13 ±3.64 / 505.54 ms │     no change │
│ QQuery 15 │        65.70 / 66.11 ±0.51 / 67.12 ms │        30.10 / 30.50 ±0.34 / 30.95 ms │ +2.17x faster │
│ QQuery 16 │           7.58 / 7.66 ±0.09 / 7.82 ms │          7.23 / 7.94 ±1.33 / 10.59 ms │     no change │
│ QQuery 17 │        83.23 / 84.16 ±0.58 / 84.98 ms │     136.18 / 137.16 ±0.62 / 137.96 ms │  1.63x slower │
│ QQuery 18 │     155.13 / 156.24 ±1.12 / 158.15 ms │     359.90 / 363.57 ±2.53 / 367.17 ms │  2.33x slower │
│ QQuery 19 │        42.63 / 42.92 ±0.24 / 43.25 ms │        55.31 / 55.99 ±0.43 / 56.65 ms │  1.30x slower │
│ QQuery 20 │        37.32 / 37.79 ±0.43 / 38.31 ms │        28.79 / 29.00 ±0.11 / 29.12 ms │ +1.30x faster │
│ QQuery 21 │        18.94 / 19.10 ±0.18 / 19.43 ms │        17.66 / 17.94 ±0.23 / 18.21 ms │ +1.06x faster │
│ QQuery 22 │        64.99 / 65.70 ±0.55 / 66.66 ms │        66.22 / 67.59 ±0.79 / 68.67 ms │     no change │
│ QQuery 23 │    509.76 / 545.90 ±27.27 / 575.16 ms │    368.82 / 389.24 ±17.32 / 409.49 ms │ +1.40x faster │
│ QQuery 24 │     243.03 / 250.72 ±6.68 / 261.19 ms │     582.06 / 590.32 ±8.40 / 601.64 ms │  2.35x slower │
│ QQuery 25 │     116.93 / 117.67 ±0.75 / 119.04 ms │     160.93 / 161.35 ±0.35 / 161.81 ms │  1.37x slower │
│ QQuery 26 │        72.56 / 73.37 ±0.49 / 74.09 ms │        86.88 / 88.36 ±1.76 / 91.80 ms │  1.20x slower │
│ QQuery 27 │           7.55 / 7.65 ±0.10 / 7.81 ms │           7.42 / 7.61 ±0.15 / 7.79 ms │     no change │
│ QQuery 28 │        60.06 / 63.43 ±1.72 / 64.72 ms │        59.85 / 62.99 ±2.51 / 65.55 ms │     no change │
│ QQuery 29 │     100.37 / 102.73 ±3.08 / 108.55 ms │     170.53 / 175.14 ±4.89 / 184.42 ms │  1.70x slower │
│ QQuery 30 │        31.99 / 32.41 ±0.38 / 33.11 ms │        37.49 / 38.18 ±0.64 / 38.95 ms │  1.18x slower │
│ QQuery 31 │     116.26 / 117.97 ±2.68 / 123.29 ms │     156.78 / 158.71 ±1.49 / 161.25 ms │  1.35x slower │
│ QQuery 32 │        23.32 / 23.59 ±0.22 / 23.95 ms │        23.61 / 24.22 ±0.37 / 24.74 ms │     no change │
│ QQuery 33 │        42.75 / 43.20 ±0.28 / 43.54 ms │        50.50 / 51.18 ±0.46 / 51.85 ms │  1.18x slower │
│ QQuery 34 │        11.04 / 11.34 ±0.23 / 11.73 ms │        10.72 / 10.94 ±0.22 / 11.33 ms │     no change │
│ QQuery 35 │        87.67 / 87.95 ±0.24 / 88.25 ms │     113.18 / 114.87 ±2.01 / 117.65 ms │  1.31x slower │
│ QQuery 36 │           7.30 / 7.41 ±0.12 / 7.63 ms │           7.00 / 7.14 ±0.15 / 7.38 ms │     no change │
│ QQuery 37 │          8.10 / 8.76 ±0.70 / 10.04 ms │           7.28 / 7.55 ±0.18 / 7.78 ms │ +1.16x faster │
│ QQuery 38 │        78.06 / 78.47 ±0.35 / 78.95 ms │        81.86 / 85.07 ±2.55 / 88.38 ms │  1.08x slower │
│ QQuery 39 │     112.65 / 117.06 ±3.55 / 121.58 ms │       97.92 / 99.55 ±1.21 / 101.43 ms │ +1.18x faster │
│ QQuery 40 │        24.97 / 26.30 ±1.73 / 29.70 ms │        22.88 / 23.14 ±0.20 / 23.41 ms │ +1.14x faster │
│ QQuery 41 │        15.60 / 15.73 ±0.18 / 16.08 ms │        15.96 / 16.23 ±0.15 / 16.37 ms │     no change │
│ QQuery 42 │        25.47 / 25.89 ±0.35 / 26.49 ms │        32.60 / 32.81 ±0.22 / 33.21 ms │  1.27x slower │
│ QQuery 43 │           5.71 / 5.81 ±0.13 / 6.06 ms │           5.39 / 5.57 ±0.13 / 5.77 ms │     no change │
│ QQuery 44 │        11.62 / 11.80 ±0.10 / 11.91 ms │        11.33 / 11.51 ±0.12 / 11.70 ms │     no change │
│ QQuery 45 │        44.02 / 45.76 ±0.97 / 46.70 ms │        31.76 / 32.00 ±0.22 / 32.33 ms │ +1.43x faster │
│ QQuery 46 │        14.09 / 14.54 ±0.29 / 15.00 ms │        14.28 / 14.60 ±0.32 / 15.17 ms │     no change │
│ QQuery 47 │     251.80 / 254.83 ±1.58 / 256.31 ms │     243.29 / 247.81 ±3.21 / 253.11 ms │     no change │
│ QQuery 48 │     105.81 / 106.60 ±0.51 / 107.40 ms │     186.44 / 187.69 ±1.31 / 190.12 ms │  1.76x slower │
│ QQuery 49 │        82.63 / 83.58 ±0.58 / 84.10 ms │        80.93 / 81.68 ±0.67 / 82.80 ms │     no change │
│ QQuery 50 │        61.66 / 63.12 ±2.58 / 68.26 ms │     139.63 / 141.39 ±1.61 / 144.12 ms │  2.24x slower │
│ QQuery 51 │        95.44 / 96.23 ±0.65 / 97.27 ms │       98.71 / 99.98 ±1.22 / 102.21 ms │     no change │
│ QQuery 52 │        25.38 / 25.46 ±0.10 / 25.66 ms │        32.42 / 32.91 ±0.30 / 33.23 ms │  1.29x slower │
│ QQuery 53 │        31.44 / 31.53 ±0.06 / 31.61 ms │        35.31 / 35.60 ±0.17 / 35.83 ms │  1.13x slower │
│ QQuery 54 │        57.07 / 57.36 ±0.31 / 57.89 ms │        33.42 / 34.50 ±0.88 / 35.97 ms │ +1.66x faster │
│ QQuery 55 │        24.64 / 24.97 ±0.23 / 25.34 ms │        31.49 / 31.70 ±0.15 / 31.95 ms │  1.27x slower │
│ QQuery 56 │        41.39 / 41.63 ±0.18 / 41.80 ms │        53.59 / 54.21 ±0.37 / 54.73 ms │  1.30x slower │
│ QQuery 57 │     183.80 / 185.65 ±1.14 / 187.15 ms │     163.09 / 164.01 ±0.83 / 165.52 ms │ +1.13x faster │
│ QQuery 58 │     120.48 / 121.34 ±0.70 / 122.35 ms │        82.91 / 83.18 ±0.32 / 83.74 ms │ +1.46x faster │
│ QQuery 59 │     119.87 / 120.42 ±0.58 / 121.30 ms │        79.56 / 80.03 ±0.45 / 80.88 ms │ +1.50x faster │
│ QQuery 60 │        41.28 / 41.65 ±0.41 / 42.46 ms │        48.81 / 49.37 ±0.31 / 49.70 ms │  1.19x slower │
│ QQuery 61 │        14.54 / 15.11 ±1.02 / 17.15 ms │        13.66 / 13.77 ±0.11 / 13.93 ms │ +1.10x faster │
│ QQuery 62 │        47.57 / 48.09 ±0.41 / 48.72 ms │        41.47 / 41.77 ±0.23 / 42.13 ms │ +1.15x faster │
│ QQuery 63 │        31.93 / 32.33 ±0.75 / 33.83 ms │        35.57 / 35.76 ±0.13 / 35.96 ms │  1.11x slower │
│ QQuery 64 │     474.56 / 481.06 ±5.01 / 487.71 ms │    906.50 / 914.89 ±10.51 / 935.29 ms │  1.90x slower │
│ QQuery 65 │     146.69 / 148.68 ±2.23 / 152.64 ms │ 1419.98 / 1467.95 ±27.10 / 1495.16 ms │  9.87x slower │
│ QQuery 66 │        85.36 / 86.48 ±0.88 / 87.91 ms │        73.33 / 73.78 ±0.28 / 74.04 ms │ +1.17x faster │
│ QQuery 67 │     263.55 / 268.23 ±3.37 / 272.47 ms │     274.60 / 281.80 ±5.02 / 287.18 ms │  1.05x slower │
│ QQuery 68 │        14.55 / 14.69 ±0.12 / 14.89 ms │        14.89 / 15.06 ±0.13 / 15.26 ms │     no change │
│ QQuery 69 │        79.50 / 79.85 ±0.36 / 80.49 ms │     101.87 / 102.64 ±0.55 / 103.31 ms │  1.29x slower │
│ QQuery 70 │     108.81 / 113.28 ±4.58 / 119.50 ms │     116.61 / 121.10 ±3.63 / 127.51 ms │  1.07x slower │
│ QQuery 71 │        36.60 / 37.30 ±0.46 / 38.03 ms │        44.73 / 45.51 ±0.58 / 46.15 ms │  1.22x slower │
│ QQuery 72 │ 2148.90 / 2227.84 ±54.81 / 2320.53 ms │     235.02 / 240.51 ±3.83 / 246.38 ms │ +9.26x faster │
│ QQuery 73 │        10.50 / 10.58 ±0.08 / 10.72 ms │        10.18 / 10.44 ±0.22 / 10.73 ms │     no change │
│ QQuery 74 │     208.65 / 215.24 ±6.40 / 226.46 ms │     148.16 / 151.07 ±2.45 / 154.45 ms │ +1.42x faster │
│ QQuery 75 │     154.27 / 157.60 ±2.39 / 161.73 ms │     205.59 / 207.21 ±2.41 / 211.95 ms │  1.31x slower │
│ QQuery 76 │        37.90 / 38.30 ±0.47 / 39.21 ms │        43.52 / 43.78 ±0.28 / 44.32 ms │  1.14x slower │
│ QQuery 77 │        65.44 / 66.67 ±1.30 / 69.06 ms │        73.30 / 74.09 ±1.22 / 76.51 ms │  1.11x slower │
│ QQuery 78 │     201.10 / 203.86 ±1.82 / 206.33 ms │     166.37 / 171.50 ±3.65 / 176.90 ms │ +1.19x faster │
│ QQuery 79 │        71.31 / 72.56 ±1.30 / 74.88 ms │        84.06 / 84.93 ±0.60 / 85.85 ms │  1.17x slower │
│ QQuery 80 │     105.90 / 107.65 ±1.04 / 108.86 ms │      99.29 / 102.71 ±4.02 / 108.95 ms │     no change │
│ QQuery 81 │        26.19 / 27.91 ±2.95 / 33.81 ms │        28.90 / 29.47 ±0.43 / 29.95 ms │  1.06x slower │
│ QQuery 82 │        18.06 / 18.65 ±0.50 / 19.49 ms │        18.95 / 19.12 ±0.16 / 19.38 ms │     no change │
│ QQuery 83 │        40.69 / 41.40 ±0.86 / 43.06 ms │        36.56 / 37.11 ±0.35 / 37.60 ms │ +1.12x faster │
│ QQuery 84 │        45.27 / 45.55 ±0.17 / 45.72 ms │        58.84 / 60.45 ±1.17 / 61.52 ms │  1.33x slower │
│ QQuery 85 │     141.92 / 143.59 ±1.21 / 145.42 ms │     251.39 / 252.89 ±1.69 / 256.09 ms │  1.76x slower │
│ QQuery 86 │        27.28 / 27.48 ±0.19 / 27.81 ms │        30.05 / 30.96 ±1.01 / 32.89 ms │  1.13x slower │
│ QQuery 87 │        74.46 / 76.02 ±0.82 / 76.73 ms │        85.56 / 86.39 ±0.98 / 88.27 ms │  1.14x slower │
│ QQuery 88 │        67.93 / 68.68 ±0.78 / 70.10 ms │        70.18 / 70.61 ±0.23 / 70.83 ms │     no change │
│ QQuery 89 │        37.92 / 39.05 ±0.90 / 40.23 ms │        47.54 / 49.40 ±3.06 / 55.50 ms │  1.26x slower │
│ QQuery 90 │        18.78 / 18.94 ±0.14 / 19.14 ms │        20.04 / 20.24 ±0.15 / 20.48 ms │  1.07x slower │
│ QQuery 91 │        54.27 / 55.09 ±0.52 / 55.87 ms │        69.38 / 69.93 ±0.32 / 70.26 ms │  1.27x slower │
│ QQuery 92 │        31.52 / 31.94 ±0.29 / 32.38 ms │        36.88 / 37.30 ±0.24 / 37.64 ms │  1.17x slower │
│ QQuery 93 │        53.10 / 54.41 ±0.81 / 55.44 ms │        52.57 / 53.56 ±0.98 / 55.39 ms │     no change │
│ QQuery 94 │        40.56 / 41.37 ±0.78 / 42.74 ms │        47.12 / 47.82 ±0.38 / 48.17 ms │  1.16x slower │
│ QQuery 95 │        87.59 / 88.76 ±0.83 / 89.71 ms │     129.38 / 130.09 ±0.61 / 131.05 ms │  1.47x slower │
│ QQuery 96 │        25.66 / 25.77 ±0.14 / 26.06 ms │        27.97 / 28.71 ±0.81 / 29.95 ms │  1.11x slower │
│ QQuery 97 │        48.40 / 49.69 ±1.74 / 53.08 ms │        51.80 / 52.55 ±0.42 / 53.00 ms │  1.06x slower │
│ QQuery 98 │        44.94 / 46.31 ±1.10 / 47.99 ms │        36.37 / 36.58 ±0.16 / 36.79 ms │ +1.27x faster │
│ QQuery 99 │        72.01 / 72.74 ±0.81 / 74.25 ms │        60.99 / 61.19 ±0.16 / 61.39 ms │ +1.19x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 11127.86ms │
│ Total Time (benefits-from-output-partitioning)   │ 11496.85ms │
│ Average Time (HEAD)                              │   112.40ms │
│ Average Time (benefits-from-output-partitioning) │   116.13ms │
│ Queries Faster                                   │         27 │
│ Queries Slower                                   │         49 │
│ Queries with No Change                           │         23 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 60.0s
Peak memory 6.9 GiB
Avg memory 6.1 GiB
CPU user 245.7s
CPU sys 6.1s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 60.0s
Peak memory 6.9 GiB
Avg memory 6.2 GiB
CPU user 155.5s
CPU sys 5.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and benefits-from-output-partitioning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     benefits-from-output-partitioning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.24 / 4.70 ±6.80 / 18.29 ms │          1.25 / 4.70 ±6.84 / 18.39 ms │     no change │
│ QQuery 1  │        12.49 / 12.77 ±0.16 / 12.94 ms │        13.16 / 13.36 ±0.12 / 13.53 ms │     no change │
│ QQuery 2  │        35.32 / 35.69 ±0.21 / 35.88 ms │        36.41 / 36.77 ±0.28 / 37.25 ms │     no change │
│ QQuery 3  │        30.59 / 31.14 ±0.61 / 32.30 ms │        30.78 / 31.19 ±0.44 / 31.79 ms │     no change │
│ QQuery 4  │     218.81 / 224.98 ±3.97 / 231.21 ms │     217.92 / 223.47 ±4.30 / 228.57 ms │     no change │
│ QQuery 5  │     268.04 / 270.47 ±1.89 / 272.55 ms │     264.73 / 270.21 ±3.58 / 275.65 ms │     no change │
│ QQuery 6  │           1.28 / 1.42 ±0.21 / 1.83 ms │           1.29 / 1.44 ±0.22 / 1.87 ms │     no change │
│ QQuery 7  │        13.80 / 13.85 ±0.04 / 13.92 ms │        14.22 / 14.48 ±0.16 / 14.73 ms │     no change │
│ QQuery 8  │     314.89 / 321.68 ±5.18 / 327.59 ms │     314.39 / 317.41 ±1.80 / 319.74 ms │     no change │
│ QQuery 9  │    430.50 / 449.63 ±13.18 / 468.51 ms │     448.13 / 455.42 ±8.89 / 472.35 ms │     no change │
│ QQuery 10 │        69.96 / 70.75 ±0.61 / 71.55 ms │        71.44 / 71.57 ±0.11 / 71.77 ms │     no change │
│ QQuery 11 │        80.53 / 81.80 ±0.74 / 82.81 ms │        80.94 / 81.87 ±0.67 / 82.58 ms │     no change │
│ QQuery 12 │     261.53 / 266.61 ±4.04 / 273.47 ms │     252.50 / 259.05 ±6.17 / 268.44 ms │     no change │
│ QQuery 13 │     366.45 / 374.52 ±9.12 / 392.27 ms │    379.29 / 405.09 ±25.46 / 449.33 ms │  1.08x slower │
│ QQuery 14 │     275.64 / 280.16 ±4.21 / 286.33 ms │     269.66 / 274.20 ±6.25 / 286.39 ms │     no change │
│ QQuery 15 │     262.75 / 267.39 ±4.76 / 274.40 ms │    258.95 / 272.36 ±14.27 / 299.36 ms │     no change │
│ QQuery 16 │     602.03 / 607.95 ±3.30 / 611.18 ms │     609.01 / 611.62 ±1.82 / 614.71 ms │     no change │
│ QQuery 17 │     600.74 / 609.07 ±6.41 / 616.74 ms │     607.64 / 614.08 ±7.46 / 628.24 ms │     no change │
│ QQuery 18 │  1232.45 / 1244.44 ±6.63 / 1250.73 ms │ 1226.94 / 1246.90 ±17.60 / 1274.01 ms │     no change │
│ QQuery 19 │        28.02 / 28.27 ±0.23 / 28.68 ms │        27.08 / 31.94 ±5.61 / 40.69 ms │  1.13x slower │
│ QQuery 20 │    515.34 / 527.02 ±17.01 / 560.84 ms │    513.35 / 528.33 ±17.89 / 562.86 ms │     no change │
│ QQuery 21 │     591.08 / 594.75 ±4.84 / 603.97 ms │     591.58 / 594.28 ±3.75 / 601.55 ms │     no change │
│ QQuery 22 │ 1046.00 / 1065.70 ±15.07 / 1089.58 ms │  1051.42 / 1064.06 ±7.26 / 1074.07 ms │     no change │
│ QQuery 23 │ 3151.05 / 3178.54 ±17.64 / 3197.76 ms │     706.33 / 722.10 ±8.65 / 729.46 ms │ +4.40x faster │
│ QQuery 24 │        42.33 / 43.69 ±1.02 / 45.27 ms │        39.78 / 43.14 ±6.14 / 55.41 ms │     no change │
│ QQuery 25 │     110.70 / 112.43 ±1.78 / 115.84 ms │     107.23 / 110.22 ±2.69 / 114.76 ms │     no change │
│ QQuery 26 │        42.63 / 42.98 ±0.54 / 44.06 ms │        41.59 / 42.37 ±1.39 / 45.14 ms │     no change │
│ QQuery 27 │     665.96 / 671.33 ±4.35 / 676.34 ms │     644.48 / 650.88 ±5.26 / 658.03 ms │     no change │
│ QQuery 28 │ 2996.10 / 3016.95 ±12.56 / 3034.36 ms │  3008.42 / 3019.59 ±8.50 / 3030.30 ms │     no change │
│ QQuery 29 │        41.26 / 45.01 ±5.52 / 55.90 ms │      41.68 / 54.15 ±24.48 / 103.11 ms │  1.20x slower │
│ QQuery 30 │     297.06 / 300.64 ±3.86 / 307.89 ms │     302.21 / 306.47 ±4.28 / 313.32 ms │     no change │
│ QQuery 31 │     282.28 / 291.44 ±9.82 / 309.31 ms │     311.47 / 321.58 ±8.18 / 332.34 ms │  1.10x slower │
│ QQuery 32 │    901.68 / 916.53 ±10.51 / 928.96 ms │    886.90 / 907.43 ±13.50 / 920.89 ms │     no change │
│ QQuery 33 │ 1396.36 / 1410.93 ±10.57 / 1423.68 ms │ 1366.18 / 1403.96 ±20.35 / 1426.30 ms │     no change │
│ QQuery 34 │ 1417.23 / 1443.94 ±20.30 / 1477.03 ms │ 1401.07 / 1417.44 ±14.50 / 1434.86 ms │     no change │
│ QQuery 35 │     274.63 / 281.18 ±6.91 / 294.21 ms │    271.16 / 290.85 ±25.52 / 341.32 ms │     no change │
│ QQuery 36 │        64.11 / 69.33 ±5.30 / 79.20 ms │        63.67 / 70.92 ±7.07 / 84.06 ms │     no change │
│ QQuery 37 │        35.00 / 37.74 ±2.97 / 42.66 ms │        35.00 / 35.97 ±1.26 / 38.42 ms │     no change │
│ QQuery 38 │        39.89 / 45.36 ±4.10 / 52.29 ms │        41.28 / 44.07 ±2.61 / 49.01 ms │     no change │
│ QQuery 39 │     138.16 / 147.18 ±5.92 / 154.81 ms │     133.60 / 142.16 ±8.41 / 157.21 ms │     no change │
│ QQuery 40 │        13.84 / 16.14 ±3.99 / 24.11 ms │        13.84 / 14.05 ±0.21 / 14.45 ms │ +1.15x faster │
│ QQuery 41 │        13.52 / 14.75 ±1.74 / 18.14 ms │        13.41 / 13.54 ±0.08 / 13.62 ms │ +1.09x faster │
│ QQuery 42 │        12.93 / 18.27 ±6.78 / 29.54 ms │        13.42 / 15.33 ±3.63 / 22.57 ms │ +1.19x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 19489.11ms │
│ Total Time (benefits-from-output-partitioning)   │ 17049.98ms │
│ Average Time (HEAD)                              │   453.24ms │
│ Average Time (benefits-from-output-partitioning) │   396.51ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │          4 │
│ Queries with No Change                           │         35 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 30.9 GiB
Avg memory 23.3 GiB
CPU user 1017.4s
CPU sys 61.3s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 90.0s
Peak memory 31.2 GiB
Avg memory 23.3 GiB
CPU user 882.7s
CPU sys 50.4s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants