Skip to content

Optimize logical optimizer: skip map_subqueries + in-place rewriting#20837

Draft
adriangb wants to merge 1 commit intoapache:mainfrom
pydantic:optimize-logical-optimizer
Draft

Optimize logical optimizer: skip map_subqueries + in-place rewriting#20837
adriangb wants to merge 1 commit intoapache:mainfrom
pydantic:optimize-logical-optimizer

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

Summary

  • map_subqueries short-circuit: skip expression tree walks when no subquery expressions exist (biggest contributor)
  • plan_has_subqueries per-pass check: bypass rewrite_with_subqueries entirely when plan has no subqueries
  • rewrite_plan_in_place with Arc::make_mut: avoid Arc::unwrap_or_clone + Arc::new cycle during tree traversal
  • Adds optimizer-only benchmarks that isolate optimizer perf from SQL parsing/analysis

Benchmark Results (optimizer-only)

Benchmark Baseline Optimized Change
optimizer_select_one_from_700 196 µs 201 µs +2.7% (noise)
optimizer_select_all_from_1000 4.84 ms 4.25 ms -12%
optimizer_join_chain_4 150 µs 136 µs -9%
optimizer_join_chain_8 462 µs 426 µs -8%
optimizer_wide_filter_200 4.95 ms 3.41 ms -31%
optimizer_wide_aggregate_100 1.98 ms 1.49 ms -25%
optimizer_join_4_with_agg_filter 429 µs 358 µs -17%
optimizer_tpch_all 13.96 ms 11.54 ms -17%
optimizer_tpcds_all 255.9 ms 213.1 ms -17%

Test plan

  • All 642 optimizer unit tests pass
  • Benchmarks confirmed across 2 independent runs
  • Run full CI

🤖 Generated with Claude Code

Three optimizations that together yield ~17% faster optimization on
TPC-H/TPC-DS and up to 31% on expression-heavy queries:

1. map_subqueries short-circuit: skip expression tree walks when no
   subquery expressions exist. Previously rewrite_with_subqueries
   called map_subqueries at every plan node, walking all expression
   trees via ownership-based transform_down even with no subqueries.

2. plan_has_subqueries per-pass check: when no subqueries exist in
   the plan, bypass rewrite_with_subqueries entirely and use the
   cheaper rewrite_plan_in_place path.

3. rewrite_plan_in_place with Arc::make_mut: new map_children_mut
   method that mutates children in-place, avoiding the
   Arc::unwrap_or_clone + Arc::new allocation cycle at every node.

Also adds optimizer-only benchmarks that isolate optimizer performance
from SQL parsing and analysis overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate labels Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant