feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150
Open
xiedeyantu wants to merge 2 commits into
Open
feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150xiedeyantu wants to merge 2 commits into
xiedeyantu wants to merge 2 commits into
Conversation
…already satisfied
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
GlobalLimitExec(andLocalLimitExec) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely.Previously, the
LimitPushdownrule had no mechanism to eliminate such trivially-satisfied limits. A query likeSELECT * FROM (VALUES ...) LIMIT 10— where the input is a single-rowPlaceholderRowExec— still carried an unnecessaryGlobalLimitExecin the physical plan. Similarly, aLIMIT Nover anEmptyExecor any zero-row plan was retained.What changes are included in this PR?
limit_satisfied_by_input()inlimit_pushdown.rs: checks whether a plan's child provably produces at mostfetchrows (requiresskip == 0and a single output partition).limit_eliminable_exact_num_rows(): iteratively unwrapsProjectionExecwrappers and recognisesEmptyExec(0 rows),PlaceholderRowExec(1 row), and any plan reportingPrecision::Exact(0)rows as eliminable producers.global_state.satisfied = trueand returns early — without resettingfetch/skip— so nested limit nodes still receive the correct outer constraints to merge against.merges_local_limit_with_local_limitsnapshot: the result is now bareEmptyExec(limit eliminated).union.slt:ProjectionExecoverPlaceholderRowExec(1 row) withfetch=3no longer carries a redundantGlobalLimitExec.explain_tree.slttest:SELECT count(*) … LIMIT 10over a two-row VALUES clause is correctly reduced toProjectionExec → PlaceholderRowExecwith no limit node.fetch=10is now correctly pushed all the way down toDataSourceExec.Are these changes tested?
Yes.
cargo fmt --allcargo clippy --all-targets --all-features -- -D warningscargo test -p datafusion-core --test physical_optimizer limitcargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.sltAre there any user-facing changes?
No API changes. Physical plans for queries with
LIMITover statically small inputs (EmptyExec,PlaceholderRowExec, or zero-row tables) will now have the redundantGlobalLimitExec/LocalLimitExecnodes eliminated, resulting in simpler and slightly more efficient plans.