You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: disable join dynamic filter pushdown by default
When a hash join's build-side dynamic filter contains a `hash_lookup` term,
evaluating it on every probe-side row inside the scan duplicates the work
the join's own probe is about to do. On TPC-H Q17 this doubles end-to-end
query time (the equivalent of running the hash probe twice), and similar
~20-100% regressions show up across TPC-H Q3/Q5/Q8/Q9/Q14/Q18 and many
TPC-DS queries that join a small dim table to a large fact table.
Flip the default so users don't pay this cost on the common shape; the
config is still available per-query when the build-side filter is
selective enough to make scan-level pruning worthwhile (e.g. small
dimension table that prunes most of a fact table's row groups / pages).
The reset block at the end of dynamic_filter_pushdown_config.slt is
switched from `SET ... = true` to `RESET ...` so the test ends in the
configured default regardless of what that default is.
@@ -451,7 +451,7 @@ datafusion.optimizer.default_filter_selectivity 20 The default filter selectivit
451
451
datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Aggregate dynamic filters into the file scan phase.
452
452
datafusion.optimizer.enable_distinct_aggregation_soft_limit true When set to true, the optimizer will push a limit operation into grouped aggregations which have no aggregate expressions, as a soft limit, emitting groups once the limit is reached, before all rows in the group are read.
453
453
datafusion.optimizer.enable_dynamic_filter_pushdown true When set to true attempts to push down dynamic filters generated by operators (TopK, Join & Aggregate) into the file scan phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans. This means that if we already have 10 timestamps in the year 2025 any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan. The config will suppress `enable_join_dynamic_filter_pushdown`, `enable_topk_dynamic_filter_pushdown` & `enable_aggregate_dynamic_filter_pushdown` So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden.
454
-
datafusion.optimizer.enable_join_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase.
454
+
datafusion.optimizer.enable_join_dynamic_filter_pushdown false When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase. Disabled by default: when a join's build-side dynamic filter contains a `hash_lookup` term, evaluating it on every probe-side row inside the scan duplicates the work the join's own probe is about to do, which on benchmarks like TPC-H Q17 doubles the query time (the equivalent of running the probe twice). Re-enable per-query when the build-side filter is selective enough to make scan-level pruning worthwhile (e.g. a small dimension table that prunes most of a large fact table's row groups / pages).
455
455
datafusion.optimizer.enable_leaf_expression_pushdown true When set to true, the optimizer will extract leaf expressions (such as `get_field`) from filter/sort/join nodes into projections closer to the leaf table scans, and push those projections down towards the leaf nodes.
456
456
datafusion.optimizer.enable_piecewise_merge_join false When set to true, piecewise merge join is enabled. PiecewiseMergeJoin is currently experimental. Physical planner will opt for PiecewiseMergeJoin when there is only one range filter.
457
457
datafusion.optimizer.enable_round_robin_repartition true When set to true, the physical plan optimizer will try to add round robin repartitioning to increase parallelism to leverage more CPU cores
@@ -895,7 +895,7 @@ show functions
895
895
statement ok
896
896
reset datafusion.catalog.information_schema;
897
897
898
-
# The SLT runner sets `target_partitions` to 4 instead of using the default, so
898
+
# The SLT runner sets `target_partitions` to 4 instead of using the default, so
SELECT t1_id, t1_name, t1_int FROM right_semi_anti_join_table_t1 t1 WHERE EXISTS (SELECT * FROM right_semi_anti_join_table_t2 t2 where t2.t2_id = t1.t1_id and t2.t2_name <> t1.t1_name) ORDER BY t1_id
SELECT t1_id, t1_name, t1_int FROM right_semi_anti_join_table_t2 t2 RIGHT SEMI JOIN right_semi_anti_join_table_t1 t1 on (t2.t2_id = t1.t1_id and t2.t2_name <> t1.t1_name) ORDER BY t1_id
SELECT t1_id, t1_name, t1_int FROM right_semi_anti_join_table_t1 t1 WHERE EXISTS (SELECT * FROM right_semi_anti_join_table_t2 t2 where t2.t2_id = t1.t1_id and t2.t2_name <> t1.t1_name) ORDER BY t1_id
SELECT t1_id, t1_name, t1_int FROM right_semi_anti_join_table_t2 t2 RIGHT SEMI JOIN right_semi_anti_join_table_t1 t1 on (t2.t2_id = t1.t1_id and t2.t2_name <> t1.t1_name) ORDER BY t1_id
0 commit comments