Skip to content

Commit 240fbdb

Browse files
authored
fix: suppress nondeterministic metrics in agg_dyn_e2e sqllogictest (#21657)
## Which issue does this PR close? - Closes #. ## Rationale for this change PR #21620 (commit 5c653be) ported `test_aggregate_dynamic_filter_parquet_e2e` from Rust to sqllogictest using `analyze_categories = 'rows'`, which includes exact pushdown metrics. These metrics are nondeterministic under parallel execution — the order in which Partial aggregates publish dynamic filter updates races against when the scan reads each partition — so the expected output is flaky. Noticed on #21629 ([CI run](https://github.com/apache/datafusion/actions/runs/24472961090/job/71518239089?pr=21629)) and confirmed on main ([CI run](https://github.com/apache/datafusion/actions/runs/24467213913/job/71497866154)). ## What changes are included in this PR? Switch the `agg_dyn_e2e` test to `analyze_level = summary` + `analyze_categories = 'none'`, suppressing nondeterministic metrics. This matches the approach already used by the other aggregate dynamic filter tests in the same file. The original Rust test only asserted `matched < 4` (i.e. some pruning happened); the important invariant — that the `DynamicFilter [ column1@0 > 4 ]` text and pruning predicate are correct — is still verified. ## Are these changes tested? Yes — the test itself is what is being fixed. ## Are there any user-facing changes? No.
1 parent 7bf39b5 commit 240fbdb

File tree

1 file changed

+16
-5
lines changed

1 file changed

+16
-5
lines changed

datafusion/sqllogictest/test_files/push_down_filter_regression.slt

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -218,21 +218,32 @@ LOCATION 'test_files/scratch/push_down_filter_regression/agg_dyn/';
218218
statement ok
219219
set datafusion.execution.collect_statistics = true;
220220

221+
# Suppress metrics: pruning counts are nondeterministic under parallel
222+
# execution (the order in which Partial aggregates publish dynamic filter
223+
# updates races against when the scan reads each partition). The original
224+
# Rust test only asserted matched < 4; the important invariant here is
225+
# that the DynamicFilter text is correct.
221226
statement ok
222-
set datafusion.explain.analyze_categories = 'rows';
227+
set datafusion.explain.analyze_level = summary;
228+
229+
statement ok
230+
set datafusion.explain.analyze_categories = 'none';
223231

224232
query TT
225233
EXPLAIN ANALYZE select max(column1) from agg_dyn_e2e where column1 > 1;
226234
----
227235
Plan with Metrics
228-
01)AggregateExec: mode=Final, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[output_rows=1, output_batches=1]
229-
02)--CoalescePartitionsExec, metrics=[output_rows=2, output_batches=2]
230-
03)----AggregateExec: mode=Partial, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[output_rows=2, output_batches=2]
231-
04)------DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_0.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_1.parquet], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_2.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_3.parquet]]}, projection=[column1], file_type=parquet, predicate=column1@0 > 1 AND DynamicFilter [ column1@0 > 4 ], pruning_predicate=column1_null_count@1 != row_count@2 AND column1_max@0 > 1 AND column1_null_count@1 != row_count@2 AND column1_max@0 > 4, required_guarantees=[], metrics=[output_rows=2, output_batches=2, files_ranges_pruned_statistics=4 total → 4 matched, row_groups_pruned_statistics=4 total → 2 matched -> 2 fully matched, row_groups_pruned_bloom_filter=2 total → 2 matched, page_index_pages_pruned=2 total → 2 matched, page_index_rows_pruned=2 total → 2 matched, limit_pruned_row_groups=0 total → 0 matched, batches_split=0, file_open_errors=0, file_scan_errors=0, files_opened=4, files_processed=4, num_predicate_creation_errors=0, predicate_evaluation_errors=0, pushdown_rows_matched=2, pushdown_rows_pruned=0, predicate_cache_inner_records=2, predicate_cache_records=4, scan_efficiency_ratio=25.15% (130/517)]
236+
01)AggregateExec: mode=Final, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[]
237+
02)--CoalescePartitionsExec, metrics=[]
238+
03)----AggregateExec: mode=Partial, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[]
239+
04)------DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_0.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_1.parquet], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_2.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_3.parquet]]}, projection=[column1], file_type=parquet, predicate=column1@0 > 1 AND DynamicFilter [ column1@0 > 4 ], pruning_predicate=column1_null_count@1 != row_count@2 AND column1_max@0 > 1 AND column1_null_count@1 != row_count@2 AND column1_max@0 > 4, required_guarantees=[], metrics=[]
232240

233241
statement ok
234242
reset datafusion.explain.analyze_categories;
235243

244+
statement ok
245+
reset datafusion.explain.analyze_level;
246+
236247
statement ok
237248
reset datafusion.execution.collect_statistics;
238249

0 commit comments

Comments
 (0)