planner: short-circuit cross join with limit via pushdown#67568
planner: short-circuit cross join with limit via pushdown#67568ljluestc wants to merge 1 commit into
Conversation
For pure cartesian inner joins under LIMIT, push a derived limit to both children so the plan can short-circuit earlier instead of scanning full join inputs.\n\nAlso update TestTopNPushDown expected output for the existing cross-join limit case.\n\nCo-Authored-By: Oz <oz-agent@warp.dev>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @ljluestc. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
|
|
Hi @ljluestc. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
There was a problem hiding this comment.
Pull request overview
This PR improves planner TopN/LIMIT pushdown for cartesian inner joins so CROSS JOIN ... LIMIT can short-circuit earlier, reducing unnecessary input processing and memory pressure (issue #63872).
Changes:
- Add an
InnerJoinbranch inLogicalJoin.PushDownTopNto push a pure LIMIT (no ORDER BY) down to both sides of a cartesian inner join. - Compute pushed-down limit as
count + offsetwith uint64 overflow saturation tomath.MaxUint64. - Update the plan suite golden output for
TestTopNPushDownto reflect the new pushed-down LIMIT shape.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/planner/core/operator/logicalop/logical_join.go | Adds cartesian-inner-join LIMIT pushdown logic during TopN/Limit pushdown. |
| pkg/planner/core/testdata/plan_suite_unexported_out.json | Updates expected plan output for TopN pushdown tests to match new optimization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (p *LogicalJoin) canPushLimitDownToCartesianChildren(topN *LogicalTopN) bool { | ||
| if topN == nil || !topN.IsLimit() { | ||
| return false | ||
| } | ||
| if _, isApply := p.Self().(*LogicalApply); isApply { | ||
| return false | ||
| } | ||
| return len(p.EqualConditions) == 0 && | ||
| len(p.NAEQConditions) == 0 && | ||
| len(p.LeftConditions) == 0 && | ||
| len(p.RightConditions) == 0 && | ||
| len(p.OtherConditions) == 0 |
There was a problem hiding this comment.
LogicalTopN.IsLimit() only checks len(ByItems)==0 and does not consider PartitionBy. This new pushdown will also treat an “enhanced TopN” (derived from window, with PartitionBy set) as a plain LIMIT and push it into both join children, which can change semantics. Please add a guard like len(topN.GetPartitionBy()) == 0 (or len(topN.PartitionBy)==0) to canPushLimitDownToCartesianChildren (and/or require topN.PartitionBy be empty) so the optimization only applies to true global LIMIT without ORDER BY/PartitionBy.
| case base.InnerJoin: | ||
| if p.canPushLimitDownToCartesianChildren(topN) { | ||
| limitCount, carry := bits.Add64(topN.Count, topN.Offset, 0) | ||
| if carry > 0 { | ||
| limitCount = math.MaxUint64 | ||
| } | ||
| leftLimit := LogicalTopN{ | ||
| Count: limitCount, | ||
| Offset: 0, | ||
| PreferLimitToCop: topN.PreferLimitToCop, | ||
| }.Init(topN.SCtx(), topN.QueryBlockOffset()) | ||
| rightLimit := LogicalTopN{ | ||
| Count: limitCount, | ||
| Offset: 0, | ||
| PreferLimitToCop: topN.PreferLimitToCop, | ||
| }.Init(topN.SCtx(), topN.QueryBlockOffset()) | ||
| p.Children()[0] = p.Children()[0].PushDownTopN(leftLimit) | ||
| p.Children()[1] = p.Children()[1].PushDownTopN(rightLimit) | ||
| return topN.AttachChild(p.Self()) | ||
| } |
There was a problem hiding this comment.
The new cartesian-inner-join LIMIT pushdown has several non-trivial branches (offset handling via Count+Offset with overflow saturation, apply exclusion, and only-on-empty-conditions gating), but the updated plan golden only covers the simple limit case. Please add unit test cases that exercise at least: (1) LIMIT with OFFSET, (2) overflow/saturation path, and (3) ensuring the optimization does not trigger for LogicalApply and for non-cartesian inner joins (any join condition present).
| leftLimit := LogicalTopN{ | ||
| Count: limitCount, | ||
| Offset: 0, | ||
| PreferLimitToCop: topN.PreferLimitToCop, | ||
| }.Init(topN.SCtx(), topN.QueryBlockOffset()) | ||
| rightLimit := LogicalTopN{ | ||
| Count: limitCount, | ||
| Offset: 0, | ||
| PreferLimitToCop: topN.PreferLimitToCop, | ||
| }.Init(topN.SCtx(), topN.QueryBlockOffset()) |
There was a problem hiding this comment.
Minor: leftLimit and rightLimit are constructed with identical fields. Consider factoring this into a small helper (e.g. newLimitTopN(limitCount)), or cloning a single initialized LogicalTopN, to reduce duplication and the risk of future field divergence when LogicalTopN gains new fields.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67568 +/- ##
================================================
- Coverage 77.6380% 77.1571% -0.4810%
================================================
Files 1961 1944 -17
Lines 543853 544022 +169
================================================
- Hits 422237 419752 -2485
- Misses 120806 124262 +3456
+ Partials 810 8 -802
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@ljluestc: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
planner: short-circuit cross join with limit via pushdown
What problem does this PR solve?
Issue Number: close #63872
Problem Summary:
在
CROSS JOIN场景下,语句SELECT * FROM t1 a CROSS JOIN t2 b LIMIT 1仍可能读取和处理过多数据,导致内存压力过高,甚至触发 memory limit cancel。该语句没有
ORDER BY,语义上允许返回任意 1 行,因此应尽早短路,而不是让笛卡尔积路径继续读取大规模输入。What changed and how does it work?
本 PR 在 planner 的 TopN/Limit 下推阶段新增了对“纯笛卡尔内连接”的优化:
LogicalJoin.PushDownTopN中新增InnerJoin分支处理。LIMIT(按count + offset,溢出时饱和到math.MaxUint64)分别下推到 join 两侧子计划:topN是纯LIMIT(无ORDER BY)InnerJoinEqualConditions/NAEQConditions/LeftConditions/RightConditions/OtherConditions均为空)LogicalApply(避免影响相关子查询语义)LIMIT,保证最终结果语义不变。TestTopNPushDown的期望计划输出,覆盖已有select * from t, t s limit 5用例的新计划形态。Check List
Tests
Side effects
Documentation
Release note