Skip to content

Commit 9619b00

Browse files
Dandandanclaude
andcommitted
fix: only emit early after skip probe evaluates and decides not to skip
The early emission was firing before the skip aggregation probe could evaluate (needs 100K rows), causing regressions for very high-cardinality GROUP BY queries (e.g. Q32 GROUP BY WatchID, ClientIP was 1.53x slower). Fix: only enable early emission AFTER the skip probe has evaluated and decided NOT to skip. This ensures: - Before 100K rows: no early emission (let probe evaluate first) - High cardinality (ratio >= 0.8): skip probe takes over, no emission - Medium cardinality (ratio < 0.8): early emission keeps hash table cache-friendly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 966b507 commit 9619b00

1 file changed

Lines changed: 5 additions & 8 deletions

File tree

datafusion/physical-plan/src/aggregates/row_hash.rs

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -825,14 +825,11 @@ impl Stream for GroupedHashAggregateStream {
825825
// prevents early emission from interfering with
826826
// very-high-cardinality queries (ratio >= 0.8)
827827
// where the skip probe should take over entirely.
828-
let early_emit_enabled =
829-
self.early_emit_max_table_size > 0
830-
&& match &self.skip_aggregation_probe {
831-
None => true,
832-
Some(p) => {
833-
p.ratio().is_some() && !p.should_skip()
834-
}
835-
};
828+
let early_emit_enabled = self.early_emit_max_table_size > 0
829+
&& match &self.skip_aggregation_probe {
830+
None => true,
831+
Some(p) => p.ratio().is_some() && !p.should_skip(),
832+
};
836833
if early_emit_enabled {
837834
let table_size = self.group_values.size()
838835
+ self

0 commit comments

Comments
 (0)