Skip to content

feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150

Open
xiedeyantu wants to merge 2 commits into
apache:mainfrom
xiedeyantu:limit
Open

feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150
xiedeyantu wants to merge 2 commits into
apache:mainfrom
xiedeyantu:limit

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

GlobalLimitExec (and LocalLimitExec) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely.

Previously, the LimitPushdown rule had no mechanism to eliminate such trivially-satisfied limits. A query like SELECT * FROM (VALUES ...) LIMIT 10 — where the input is a single-row PlaceholderRowExec — still carried an unnecessary GlobalLimitExec in the physical plan. Similarly, a LIMIT N over an EmptyExec or any zero-row plan was retained.

What changes are included in this PR?

  • Adds limit_satisfied_by_input() in limit_pushdown.rs: checks whether a plan's child provably produces at most fetch rows (requires skip == 0 and a single output partition).
  • Adds limit_eliminable_exact_num_rows(): iteratively unwraps ProjectionExec wrappers and recognises EmptyExec (0 rows), PlaceholderRowExec (1 row), and any plan reporting Precision::Exact(0) rows as eliminable producers.
  • When a limit is statically satisfied, marks global_state.satisfied = true and returns early — without resetting fetch/skip — so nested limit nodes still receive the correct outer constraints to merge against.
  • Updates the merges_local_limit_with_local_limit snapshot: the result is now bare EmptyExec (limit eliminated).
  • Updates union.slt: ProjectionExec over PlaceholderRowExec (1 row) with fetch=3 no longer carries a redundant GlobalLimitExec.
  • Adds explain_tree.slt test: SELECT count(*) … LIMIT 10 over a two-row VALUES clause is correctly reduced to ProjectionExec → PlaceholderRowExec with no limit node.
  • Updates copy.slt: fetch=10 is now correctly pushed all the way down to DataSourceExec.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-core --test physical_optimizer limit
  • cargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.slt

Are there any user-facing changes?

No API changes. Physical plans for queries with LIMIT over statically small inputs (EmptyExec, PlaceholderRowExec, or zero-row tables) will now have the redundant GlobalLimitExec/LocalLimitExec nodes eliminated, resulting in simpler and slightly more efficient plans.

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant