Skip to content

planner: short-circuit cross join with limit via pushdown#67568

Draft
ljluestc wants to merge 1 commit into
pingcap:masterfrom
ljluestc:fix/issue-63872-cross-join-limit
Draft

planner: short-circuit cross join with limit via pushdown#67568
ljluestc wants to merge 1 commit into
pingcap:masterfrom
ljluestc:fix/issue-63872-cross-join-limit

Conversation

@ljluestc
Copy link
Copy Markdown

@ljluestc ljluestc commented Apr 5, 2026

planner: short-circuit cross join with limit via pushdown

What problem does this PR solve?

Issue Number: close #63872

Problem Summary:

CROSS JOIN 场景下,语句 SELECT * FROM t1 a CROSS JOIN t2 b LIMIT 1 仍可能读取和处理过多数据,导致内存压力过高,甚至触发 memory limit cancel。

该语句没有 ORDER BY,语义上允许返回任意 1 行,因此应尽早短路,而不是让笛卡尔积路径继续读取大规模输入。

What changed and how does it work?

本 PR 在 planner 的 TopN/Limit 下推阶段新增了对“纯笛卡尔内连接”的优化:

  1. LogicalJoin.PushDownTopN 中新增 InnerJoin 分支处理。
  2. 当满足以下条件时,将根部 LIMIT(按 count + offset,溢出时饱和到 math.MaxUint64)分别下推到 join 两侧子计划:
    • topN 是纯 LIMIT(无 ORDER BY
    • join 是 InnerJoin
    • join 条件为空(EqualConditions/NAEQConditions/LeftConditions/RightConditions/OtherConditions 均为空)
    • 当前节点不是 LogicalApply(避免影响相关子查询语义)
  3. 保留 join 上层原始 LIMIT,保证最终结果语义不变。
  4. 同步更新 TestTopNPushDown 的期望计划输出,覆盖已有 select * from t, t s limit 5 用例的新计划形态。

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • ./tools/check/failpoint-go-test.sh pkg/planner/core -run TestTopNPushDown -count=1
    • make lint
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Optimize `CROSS JOIN ... LIMIT` by pushing limit down to both sides of pure cartesian inner joins, so execution can short-circuit earlier and avoid unnecessary large input processing.

For pure cartesian inner joins under LIMIT, push a derived limit to both children so the plan can short-circuit earlier instead of scanning full join inputs.\n\nAlso update TestTopNPushDown expected output for the existing cross-join limit case.\n\nCo-Authored-By: Oz <oz-agent@warp.dev>
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5a94696a-f26f-46e7-805b-2c484873a8a0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qw4990 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 5, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 5, 2026

Hi @ljluestc. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the sig/planner SIG: Planner label Apr 5, 2026
@pingcap-cla-assistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 5, 2026

Hi @ljluestc. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@terry1purcell
Copy link
Copy Markdown
Contributor

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 5, 2026
@terry1purcell terry1purcell requested a review from Copilot April 5, 2026 21:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves planner TopN/LIMIT pushdown for cartesian inner joins so CROSS JOIN ... LIMIT can short-circuit earlier, reducing unnecessary input processing and memory pressure (issue #63872).

Changes:

  • Add an InnerJoin branch in LogicalJoin.PushDownTopN to push a pure LIMIT (no ORDER BY) down to both sides of a cartesian inner join.
  • Compute pushed-down limit as count + offset with uint64 overflow saturation to math.MaxUint64.
  • Update the plan suite golden output for TestTopNPushDown to reflect the new pushed-down LIMIT shape.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/planner/core/operator/logicalop/logical_join.go Adds cartesian-inner-join LIMIT pushdown logic during TopN/Limit pushdown.
pkg/planner/core/testdata/plan_suite_unexported_out.json Updates expected plan output for TopN pushdown tests to match new optimization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +555 to +566
func (p *LogicalJoin) canPushLimitDownToCartesianChildren(topN *LogicalTopN) bool {
if topN == nil || !topN.IsLimit() {
return false
}
if _, isApply := p.Self().(*LogicalApply); isApply {
return false
}
return len(p.EqualConditions) == 0 &&
len(p.NAEQConditions) == 0 &&
len(p.LeftConditions) == 0 &&
len(p.RightConditions) == 0 &&
len(p.OtherConditions) == 0
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogicalTopN.IsLimit() only checks len(ByItems)==0 and does not consider PartitionBy. This new pushdown will also treat an “enhanced TopN” (derived from window, with PartitionBy set) as a plain LIMIT and push it into both join children, which can change semantics. Please add a guard like len(topN.GetPartitionBy()) == 0 (or len(topN.PartitionBy)==0) to canPushLimitDownToCartesianChildren (and/or require topN.PartitionBy be empty) so the optimization only applies to true global LIMIT without ORDER BY/PartitionBy.

Copilot uses AI. Check for mistakes.
Comment on lines +513 to +532
case base.InnerJoin:
if p.canPushLimitDownToCartesianChildren(topN) {
limitCount, carry := bits.Add64(topN.Count, topN.Offset, 0)
if carry > 0 {
limitCount = math.MaxUint64
}
leftLimit := LogicalTopN{
Count: limitCount,
Offset: 0,
PreferLimitToCop: topN.PreferLimitToCop,
}.Init(topN.SCtx(), topN.QueryBlockOffset())
rightLimit := LogicalTopN{
Count: limitCount,
Offset: 0,
PreferLimitToCop: topN.PreferLimitToCop,
}.Init(topN.SCtx(), topN.QueryBlockOffset())
p.Children()[0] = p.Children()[0].PushDownTopN(leftLimit)
p.Children()[1] = p.Children()[1].PushDownTopN(rightLimit)
return topN.AttachChild(p.Self())
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new cartesian-inner-join LIMIT pushdown has several non-trivial branches (offset handling via Count+Offset with overflow saturation, apply exclusion, and only-on-empty-conditions gating), but the updated plan golden only covers the simple limit case. Please add unit test cases that exercise at least: (1) LIMIT with OFFSET, (2) overflow/saturation path, and (3) ensuring the optimization does not trigger for LogicalApply and for non-cartesian inner joins (any join condition present).

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +528
leftLimit := LogicalTopN{
Count: limitCount,
Offset: 0,
PreferLimitToCop: topN.PreferLimitToCop,
}.Init(topN.SCtx(), topN.QueryBlockOffset())
rightLimit := LogicalTopN{
Count: limitCount,
Offset: 0,
PreferLimitToCop: topN.PreferLimitToCop,
}.Init(topN.SCtx(), topN.QueryBlockOffset())
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: leftLimit and rightLimit are constructed with identical fields. Consider factoring this into a small helper (e.g. newLimitTopN(limitCount)), or cloning a single initialized LogicalTopN, to reduce duplication and the risk of future field divergence when LogicalTopN gains new fields.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 81.25000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.1571%. Comparing base (3cebd11) to head (64a25aa).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67568        +/-   ##
================================================
- Coverage   77.6380%   77.1571%   -0.4810%     
================================================
  Files          1961       1944        -17     
  Lines        543853     544022       +169     
================================================
- Hits         422237     419752      -2485     
- Misses       120806     124262      +3456     
+ Partials        810          8       -802     
Flag Coverage Δ
integration 40.8901% <81.2500%> (+6.5596%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 48.8579% <ø> (-11.4800%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 5, 2026

@ljluestc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test 64a25aa link true /test mysql-test
idc-jenkins-ci-tidb/check_dev_2 64a25aa link true /test check-dev2
pull-unit-test-next-gen 64a25aa link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test 64a25aa link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

short-circuit evaluation for cross join with limit 1

3 participants