planner: derive static probe pruning conds for index join#67549
planner: derive static probe pruning conds for index join#67549AilinKid wants to merge 3 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughDerives probe-side static partition pruning predicates for index joins during physical plan construction, threads them into IndexJoin runtime properties, updates physical plan enumeration to pass these props into inner scan task constructors, adds tests, and updates build/test configs. Changes
Sequence DiagramsequenceDiagram
participant Planner as Planner/Enumerator
participant Extract as OuterFilterExtractor
participant Derive as PruningDeriver
participant Ranger as RangerBuilder
participant Apply as PhysicalPlanBuilder
Planner->>Extract: Collect deduped outer static filters (DS/Selection/Proj/UnionScan)
Planner->>Derive: Provide join keys and normalized join predicates
Extract-->>Derive: Supply outer static expressions
Derive->>Ranger: Build column ranges / normalize monotone patterns
Ranger-->>Derive: Return low/high bounds (fold constants where possible)
Derive-->>Planner: Emit ProbePartitionPruningCondGroup(s) into IndexJoinRuntimeProp
Planner->>Apply: Pass IndexJoinRuntimeProp into inner scan task constructors
Apply->>Apply: buildPartInfoFromIndexJoinProp attaches pruning conds to PhysPlanPartInfo
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @AilinKid. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@AilinKid I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/planner/core/index_join_partition_pruning.go`:
- Around line 250-255: The current code appends derived probe predicates into
partInfo.PruningConds which may share backing storage from
buildPhysPlanPartInfo(ds), causing later plans to mutate earlier ones; fix by
copying partInfo.PruningConds to a new slice before appending the extraConds
returned by getIndexJoinProbePartitionPruningConds(ds, indexJoinProp) (e.g.,
make a new slice and append the existing PruningConds into it, then append
extraConds) so DataSource reuse with different IndexJoinRuntimeProp cannot
overwrite shared PruningConds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6e684d85-a006-41a4-b3b4-4d4ffa836b85
📒 Files selected for processing (6)
pkg/planner/core/BUILD.bazelpkg/planner/core/casetest/partition/BUILD.bazelpkg/planner/core/casetest/partition/partition_pruner_test.gopkg/planner/core/exhaust_physical_plans.gopkg/planner/core/index_join_partition_pruning.gopkg/planner/property/physical_property.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67549 +/- ##
================================================
- Coverage 77.7969% 77.0120% -0.7850%
================================================
Files 1984 1973 -11
Lines 549983 555169 +5186
================================================
- Hits 427870 427547 -323
- Misses 121193 127221 +6028
+ Partials 920 401 -519
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/test unit-test |
|
@AilinKid: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/retest-required |
…c-partition-pruning # Conflicts: # pkg/planner/core/exhaust_physical_plans.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/planner/core/casetest/partition/partition_pruner_test.go`:
- Around line 287-292: The current assertions using
tk.MustQuery(...).MultiCheckContain and CheckNotContain are too permissive (they
allow plans like "partition:p1,p2"); instead, use the existing plan-partition
extraction helper to extract the plan's partition list for the same query and
assert it equals exactly ["p1"] (and similarly for the second instance at the
other lines). Replace the CheckNotContain("partition:all") and the loose
MultiCheckContain check with an exact partition-list equality assertion derived
from the helper, keeping the same tk.MustQuery invocation but validating the
extracted partition set exactly equals the single target partition "p1".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aaaacf92-245e-4a8a-967f-aa8a8b3dc6fa
📒 Files selected for processing (2)
pkg/planner/core/BUILD.bazelpkg/planner/core/casetest/partition/partition_pruner_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/planner/core/BUILD.bazel
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/planner/core/index_join_partition_pruning.go (1)
589-614: Whitelist of "monotone" functions is intentionally narrow — consider a short rationale comment.
extractMonotoneColumnForIndexJoinonly recognizesDATE_ADD / ADDDATE / DATE_SUB / SUBDATEwith(col, const, const)shape. That's fine as a conservative first cut (and matches the typical date-partitioned use case from the PR description), but the name "monotone" reads broader than the actual support — simple arithmetic likecol + constis not covered. A one-line comment noting that this is an intentionally restricted whitelist (and why other monotonic forms are deferred) would help future readers avoid re-deriving the scope decision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/index_join_partition_pruning.go` around lines 589 - 614, Add a short rationale comment above the extractMonotoneColumnForIndexJoin function explaining that the function intentionally whitelists only date-add/sub variants (DATE_ADD/ADDDATE/DATE_SUB/SUBDATE) with the (col, const, const) argument shape to conservatively detect monotone expressions for date-partitioned index join use-cases, and that simpler arithmetic forms (e.g., col + const) and more complex monotone transformations are deliberately excluded for now to avoid incorrect matches and to keep the implementation conservative; mention that this is a deliberate design decision and can be extended later with additional validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/planner/core/index_join_partition_pruning.go`:
- Around line 181-191: The projection case in collectIndexJoinOuterStaticFilters
uses candidateCols (projection output UIDs) to pre-filter when recursing into
the projection child, causing UID mismatches and dropped filters; fix by either
(A) recursing into the child without applying candidateCols (i.e., collect all
child filters), then call substituteIndexJoinOuterFiltersThroughProjection to
remap child UIDs to projection output UIDs and only then apply
filterIndexJoinOuterStaticFilters(candidateCols,...), or (B) build a child-side
candidateCols by inverting the projection's Exprs->Schema.Columns mapping (map
projection outputs back to source column UIDs) and pass that inverted
candidateCols into the recursive collect call; apply the same pattern to
collectIndexJoinProbePartitionColumns for LogicalProjection and
LogicalAggregation to keep behavior consistent.
---
Nitpick comments:
In `@pkg/planner/core/index_join_partition_pruning.go`:
- Around line 589-614: Add a short rationale comment above the
extractMonotoneColumnForIndexJoin function explaining that the function
intentionally whitelists only date-add/sub variants
(DATE_ADD/ADDDATE/DATE_SUB/SUBDATE) with the (col, const, const) argument shape
to conservatively detect monotone expressions for date-partitioned index join
use-cases, and that simpler arithmetic forms (e.g., col + const) and more
complex monotone transformations are deliberately excluded for now to avoid
incorrect matches and to keep the implementation conservative; mention that this
is a deliberate design decision and can be extended later with additional
validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bbac52d2-812f-4f51-85bd-2fa7db42a498
📒 Files selected for processing (2)
pkg/planner/core/casetest/partition/partition_pruner_test.gopkg/planner/core/index_join_partition_pruning.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/planner/core/casetest/partition/partition_pruner_test.go
| case *logicalop.LogicalProjection: | ||
| return substituteIndexJoinOuterFiltersThroughProjection(x, collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols), candidateCols) | ||
| case *logicalop.LogicalLimit, *logicalop.LogicalTopN, *logicalop.LogicalSort: | ||
| return collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols) | ||
| case *logicalop.LogicalUnionScan: | ||
| filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols) | ||
| return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...) | ||
| default: | ||
| return nil | ||
| } | ||
| } |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
#!/bin/bash
# Check whether LogicalProjection preserves child column UniqueIDs for simple column pass-through,
# or allocates fresh UIDs via AllocPlanColumnID / similar.
rg -nP --type=go -C4 '\bLogicalProjection\b' pkg/planner/core/operator/logicalop | rg -nP -C2 '(UniqueID|AllocPlanColumnID|NewColumn|Schema\(\)\.Columns)' || true
rg -nP --type=go -C3 'proj\.Exprs|proj\.Schema\(\)' pkg/planner/core | rg -nP -C2 'UniqueID' || trueRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
37-pkg/planner/core/operator/logicalop/logicalop_test/hash64_equals_test.go-327- p1.Hash64(hasher1)
38---
39:pkg/planner/core/operator/logicalop/logical_union_all.go-98- exprs := make([]expression.Expression, len(p.Schema().Columns))
40-pkg/planner/core/operator/logicalop/logical_union_all.go-99- for j, col := range schema.Columns {
41-pkg/planner/core/operator/logicalop/logical_union_all.go-100- exprs[j] = col
--
233-pkg/planner/core/operator/logicalop/logical_projection.go-390- fds := p.LogicalSchemaProducer.ExtractFD()
234-pkg/planner/core/operator/logicalop/logical_projection.go-391- // collect the output columns' unique ID.
235:pkg/planner/core/operator/logicalop/logical_projection.go-392- outputColsUniqueIDs := intset.NewFastIntSet()
236---
237-pkg/planner/core/operator/logicalop/logical_projection.go-480-
--
341-pkg/planner/core/operator/logicalop/logical_projection.go:683: proj, ok := p.(*LogicalProjection)
342-pkg/planner/core/operator/logicalop/logical_projection.go-684- if !ok {
343:pkg/planner/core/operator/logicalop/logical_projection.go:685: proj = LogicalProjection{Exprs: expression.Column2Exprs(p.Schema().Columns)}.Init(p.SCtx(), p.QueryBlockOffset())
344-pkg/planner/core/operator/logicalop/logical_projection.go-686- proj.SetSchema(p.Schema().Clone())
345-pkg/planner/core/operator/logicalop/logical_projection.go-687- proj.SetChildren(p)
--
430-pkg/planner/core/operator/logicalop/logical_join.go-1931- }
431-pkg/planner/core/operator/logicalop/logical_join.go:1932: proj = LogicalProjection{Exprs: make([]expression.Expression, 0, child.Schema().Len())}.Init(p.SCtx(), child.QueryBlockOffset())
432:pkg/planner/core/operator/logicalop/logical_join.go-1933- for _, col := range child.Schema().Columns {
433-pkg/planner/core/operator/logicalop/logical_join.go-1934- proj.Exprs = append(proj.Exprs, col)
434-pkg/planner/core/operator/logicalop/logical_join.go-1935- }
--
440-pkg/planner/core/operator/logicalop/logical_join.go-2069- innerJoin.AttachOnConds(expression.ScalarFuncs2Exprs(p.EqualConditions))
441-pkg/planner/core/operator/logicalop/logical_join.go:2070: proj := LogicalProjection{
442:pkg/planner/core/operator/logicalop/logical_join.go-2071- Exprs: expression.Column2Exprs(p.Children()[0].Schema().Columns),
443-pkg/planner/core/operator/logicalop/logical_join.go-2072- }.Init(p.SCtx(), p.QueryBlockOffset())
444-pkg/planner/core/operator/logicalop/logical_join.go-2073- proj.SetChildren(innerJoin)
210-pkg/planner/core/rule/rule_order_aware_join_reorder.go-236- }
211---
212:pkg/planner/core/rule/rule_join_key_type_cast.go-196- UniqueID: intInfo.origCol.UniqueID,
213-pkg/planner/core/rule/rule_join_key_type_cast.go-197- RetType: intInfo.origCol.RetType.Clone(),
214-pkg/planner/core/rule/rule_join_key_type_cast.go-198- }
--
217-pkg/planner/core/rule/rule_join_key_type_cast.go-201-
218-pkg/planner/core/rule/rule_join_key_type_cast.go-202- // VARCHAR side: add CAST(varchar_col AS SIGNED). We allocate a new
219:pkg/planner/core/rule/rule_join_key_type_cast.go-203- // UniqueID here because the data type changes (VARCHAR→INT), and
220---
221:pkg/planner/core/rule/rule_join_key_type_cast.go-208- UniqueID: ctx.GetSessionVars().AllocPlanColumnID(),
222-pkg/planner/core/rule/rule_join_key_type_cast.go-209- RetType: castIntExpr.GetType(evalCtx).Clone(),
223-pkg/planner/core/rule/rule_join_key_type_cast.go-210- }
--
233-pkg/planner/core/rule/rule_join_key_type_cast.go:270: schema := proj.Schema()
234-pkg/planner/core/rule/rule_join_key_type_cast.go-271- for i, schemaCol := range schema.Columns {
235:pkg/planner/core/rule/rule_join_key_type_cast.go-272- if schemaCol.UniqueID != col.UniqueID {
236-pkg/planner/core/rule/rule_join_key_type_cast.go-273- continue
237-pkg/planner/core/rule/rule_join_key_type_cast.go-274- }
--
258-pkg/planner/core/planbuilder.go:723: proj.Exprs = append(proj.Exprs, expr)
259-pkg/planner/core/planbuilder.go-724- schema.Append(&expression.Column{
260:pkg/planner/core/planbuilder.go-725- UniqueID: b.ctx.GetSessionVars().AllocPlanColumnID(),
261-pkg/planner/core/planbuilder.go-726- RetType: expr.GetType(b.ctx.GetExprCtx().GetEvalCtx()),
262---
--
266-pkg/planner/core/planbuilder.go:3647: proj.Exprs = append(proj.Exprs, col)
267-pkg/planner/core/planbuilder.go-3648- newCol := col.Clone().(*expression.Column)
268:pkg/planner/core/planbuilder.go-3649- newCol.UniqueID = b.ctx.GetSessionVars().AllocPlanColumnID()
269-pkg/planner/core/planbuilder.go-3650- schema.Append(newCol)
270---
--
304-pkg/planner/core/expression_rewriter.go:968: proj.Exprs = append(proj.Exprs, cond)
305-pkg/planner/core/expression_rewriter.go:969: proj.Schema().Append(&expression.Column{
306:pkg/planner/core/expression_rewriter.go-970- UniqueID: sessVars.AllocPlanColumnID(),
307-pkg/planner/core/expression_rewriter.go-971- RetType: cond.GetType(er.sctx.GetEvalCtx()),
308-pkg/planner/core/expression_rewriter.go-972- })
--
384-pkg/planner/core/logical_plan_builder.go-1925- name := ""
385-pkg/planner/core/logical_plan_builder.go:1926: for idx, schemaCol := range proj.Schema().Columns {
386:pkg/planner/core/logical_plan_builder.go-1927- if schemaCol.UniqueID == errShowCol.UniqueID {
387-pkg/planner/core/logical_plan_builder.go-1928- name = proj.OutputNames()[idx].String()
388-pkg/planner/core/logical_plan_builder.go-1929- break
--
390-pkg/planner/core/logical_plan_builder.go-1941- if fds.GroupByCols.Only1Zero() {
391-pkg/planner/core/logical_plan_builder.go-1942- // maxOneRow is delayed from agg's ExtractFD logic since some details listed in it.
392:pkg/planner/core/logical_plan_builder.go-1943- projectionUniqueIDs := intset.NewFastIntSet()
393-pkg/planner/core/logical_plan_builder.go:1944: for _, expr := range proj.Exprs {
394-pkg/planner/core/logical_plan_builder.go-1945- switch x := expr.(type) {
395-pkg/planner/core/logical_plan_builder.go-1946- case *expression.Column:
396:pkg/planner/core/logical_plan_builder.go-1947- projectionUniqueIDs.Insert(int(x.UniqueID))
397---
398-pkg/planner/core/logical_plan_builder.go-1960- fds.HasAggBuilt = false
--
464-pkg/planner/core/logical_plan_builder.go-6774- proj.SetOutputNames(append(proj.OutputNames(), types.EmptyName))
465-pkg/planner/core/logical_plan_builder.go-6775- col := &expression.Column{
466:pkg/planner/core/logical_plan_builder.go-6776- UniqueID: b.ctx.GetSessionVars().AllocPlanColumnID(),
467-pkg/planner/core/logical_plan_builder.go-6777- RetType: newArg.GetType(b.ctx.GetExprCtx().GetEvalCtx()),
468-pkg/planner/core/logical_plan_builder.go-6778- }
--
485-pkg/planner/core/logical_plan_builder.go-6848- proj.SetOutputNames(append(proj.OutputNames(), types.EmptyName))
486-pkg/planner/core/logical_plan_builder.go-6849- col := &expression.Column{
487:pkg/planner/core/logical_plan_builder.go-6850- UniqueID: b.ctx.GetSessionVars().AllocPlanColumnID(),
488-pkg/planner/core/logical_plan_builder.go-6851- RetType: it.GetType(b.ctx.GetExprCtx().GetEvalCtx()),
489-pkg/planner/core/logical_plan_builder.go-6852- }
Script executed:
# First, let's examine the exact code in index_join_partition_pruning.go around lines 160-256
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '150,260p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
150 case *logicalop.DataSource:
151 partCol := getSingleInnerPartitionColumnForIndexJoin(x)
152 if partCol == nil {
153 return
154 }
155 if _, ok := seen[partCol.UniqueID]; ok {
156 return
157 }
158 seen[partCol.UniqueID] = struct{}{}
159 result = append(result, partCol)
160 case *logicalop.LogicalSelection, *logicalop.LogicalProjection, *logicalop.LogicalAggregation, *logicalop.LogicalUnionScan:
161 if len(x.Children()) == 1 {
162 collect(x.Children()[0])
163 }
164 case *logicalop.LogicalJoin:
165 for _, child := range x.Children() {
166 collect(child)
167 }
168 }
169 }
170 collect(p)
171 return result
172 }
173
174 func collectIndexJoinOuterStaticFilters(p base.LogicalPlan, candidateCols map[int64]struct{}) []expression.Expression {
175 switch x := p.(type) {
176 case *logicalop.DataSource:
177 return filterIndexJoinOuterStaticFilters(candidateCols, x.AllConds)
178 case *logicalop.LogicalSelection:
179 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
180 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
181 case *logicalop.LogicalProjection:
182 return substituteIndexJoinOuterFiltersThroughProjection(x, collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols), candidateCols)
183 case *logicalop.LogicalLimit, *logicalop.LogicalTopN, *logicalop.LogicalSort:
184 return collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
185 case *logicalop.LogicalUnionScan:
186 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
187 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
188 default:
189 return nil
190 }
191 }
192
193 func filterIndexJoinOuterStaticFilters(candidateCols map[int64]struct{}, filters []expression.Expression) []expression.Expression {
194 result := make([]expression.Expression, 0, len(filters))
195 seen := make(map[string]struct{}, len(filters))
196 for _, filter := range filters {
197 cols := expression.ExtractColumns(filter)
198 if len(cols) != 1 {
199 continue
200 }
201 if _, ok := candidateCols[cols[0].UniqueID]; !ok {
202 continue
203 }
204 key := string(filter.HashCode())
205 if _, ok := seen[key]; ok {
206 continue
207 }
208 seen[key] = struct{}{}
209 result = append(result, filter)
210 }
211 return result
212 }
213
214 // substituteIndexJoinOuterFiltersThroughProjection handles the common outer-side shape
215 // DataSource/Selection -> Projection -> IndexJoin. Static filters are collected from the
216 // child side, but pruning candidates are tracked in the projection schema, so we remap
217 // pass-through column references through the projection before matching them again.
218 func substituteIndexJoinOuterFiltersThroughProjection(
219 proj *logicalop.LogicalProjection,
220 filters []expression.Expression,
221 candidateCols map[int64]struct{},
222 ) []expression.Expression {
223 if len(filters) == 0 {
224 return nil
225 }
226 replace := make(map[string]*expression.Column, len(proj.Exprs))
227 for i, expr := range proj.Exprs {
228 col, ok := expr.(*expression.Column)
229 if !ok {
230 continue
231 }
232 replace[string(col.HashCode())] = proj.Schema().Columns[i]
233 }
234 if len(replace) == 0 {
235 return nil
236 }
237 result := make([]expression.Expression, 0, len(filters))
238 for _, filter := range filters {
239 cols := expression.ExtractColumns(filter)
240 if len(cols) == 0 {
241 continue
242 }
243 canSubstitute := true
244 for _, col := range cols {
245 if replace[string(col.HashCode())] == nil {
246 canSubstitute = false
247 break
248 }
249 }
250 if !canSubstitute {
251 continue
252 }
253 result = append(result, ruleutil.ResolveExprAndReplace(filter.Clone(), replace))
254 }
255 return filterIndexJoinOuterStaticFilters(candidateCols, result)
256 }
257
258 func buildPartInfoFromIndexJoinProp(
259 ds *logicalop.DataSource,
260 indexJoinProp *property.IndexJoinRuntimeProp,
Script executed:
# Check the filterIndexJoinOuterStaticFilters function to see how it matches UIDs
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '195,230p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
195 seen := make(map[string]struct{}, len(filters))
196 for _, filter := range filters {
197 cols := expression.ExtractColumns(filter)
198 if len(cols) != 1 {
199 continue
200 }
201 if _, ok := candidateCols[cols[0].UniqueID]; !ok {
202 continue
203 }
204 key := string(filter.HashCode())
205 if _, ok := seen[key]; ok {
206 continue
207 }
208 seen[key] = struct{}{}
209 result = append(result, filter)
210 }
211 return result
212 }
213
214 // substituteIndexJoinOuterFiltersThroughProjection handles the common outer-side shape
215 // DataSource/Selection -> Projection -> IndexJoin. Static filters are collected from the
216 // child side, but pruning candidates are tracked in the projection schema, so we remap
217 // pass-through column references through the projection before matching them again.
218 func substituteIndexJoinOuterFiltersThroughProjection(
219 proj *logicalop.LogicalProjection,
220 filters []expression.Expression,
221 candidateCols map[int64]struct{},
222 ) []expression.Expression {
223 if len(filters) == 0 {
224 return nil
225 }
226 replace := make(map[string]*expression.Column, len(proj.Exprs))
227 for i, expr := range proj.Exprs {
228 col, ok := expr.(*expression.Column)
229 if !ok {
230 continue
Script executed:
# Look at how LogicalProjection.SetSchema works and whether it preserves child UIDs
rg -nA15 'func.*LogicalProjection.*SetSchema' pkg/planner/core/operator/logicalop/logical_projection.go || true
rg -nA10 'type LogicalProjection struct' pkg/planner/core/operator/logicalop/logical_projection.go | head -30Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
33:type LogicalProjection struct {
34- LogicalSchemaProducer `hash64-equals:"true"`
35-
36- Exprs []expression.Expression `hash64-equals:"true" shallow-ref:"true"`
37-
38- // CalculateNoDelay indicates this Projection is the root Plan and should be
39- // calculated without delay and will not return any result to client.
40- // Currently it is "true" only when the current sql query is a "DO" statement.
41- // See "https://dev.mysql.com/doc/refman/5.7/en/do.html" for more detail.
42- CalculateNoDelay bool `hash64-equals:"true"`
43-
Script executed:
# Understand substituteIndexJoinOuterFiltersThroughProjection implementation
rg -nA30 'func substituteIndexJoinOuterFiltersThroughProjection' pkg/planner/core/index_join_partition_pruning.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
218:func substituteIndexJoinOuterFiltersThroughProjection(
219- proj *logicalop.LogicalProjection,
220- filters []expression.Expression,
221- candidateCols map[int64]struct{},
222-) []expression.Expression {
223- if len(filters) == 0 {
224- return nil
225- }
226- replace := make(map[string]*expression.Column, len(proj.Exprs))
227- for i, expr := range proj.Exprs {
228- col, ok := expr.(*expression.Column)
229- if !ok {
230- continue
231- }
232- replace[string(col.HashCode())] = proj.Schema().Columns[i]
233- }
234- if len(replace) == 0 {
235- return nil
236- }
237- result := make([]expression.Expression, 0, len(filters))
238- for _, filter := range filters {
239- cols := expression.ExtractColumns(filter)
240- if len(cols) == 0 {
241- continue
242- }
243- canSubstitute := true
244- for _, col := range cols {
245- if replace[string(col.HashCode())] == nil {
246- canSubstitute = false
247- break
248- }
Script executed:
# Find where collectIndexJoinProbePartitionColumns is called and how candidateCols is built
rg -nB5 'collectIndexJoinProbePartitionColumns' pkg/planner/core/index_join_partition_pruning.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
84- outerSchema *expression.Schema,
85- innerJoinKeys []*expression.Column,
86- outerJoinKeys []*expression.Column,
87- otherConds []expression.Expression,
88-) []property.ProbePartitionPruningCondGroup {
89: innerPartCols := collectIndexJoinProbePartitionColumns(innerChild)
--
139- }
140- }
141- return candidateCols
142-}
143-
144:func collectIndexJoinProbePartitionColumns(p base.LogicalPlan) []*expression.Column {
Script executed:
# Look at how candidateCols is created from the projection schema
rg -nA20 'func tryIndexJoinPartitionPruning' pkg/planner/core/index_join_partition_pruning.go | head -50Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Check LogicalProjection.SetSchema and if it preserves or allocates UIDs
rg -nA8 'func.*\(p \*LogicalProjection\).*SetSchema' pkg/planner/core/operator/logicalop/logical_projection.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Search for tests of this feature with projections
rg -l 'collectIndexJoinProbePartitionColumns|index.*partition.*prun' --type=go pkg/planner/core/ | xargs -I{} basename {}Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
planbuilder.go
index_join_partition_pruning.go
Script executed:
# Look at the full tryIndexJoinPartitionPruning function to see how candidateCols is built
rg -nA40 'func tryIndexJoinPartitionPruning' pkg/planner/core/index_join_partition_pruning.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Check how the function is called from physical plan building
rg -nB3 -A3 'tryIndexJoinPartitionPruning' pkg/planner/core/ --type=goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Look at LogicalProjection initialization and schema setup
rg -nA20 'func.*NewLogicalProjection\|Init.*LogicalProjection' pkg/planner/core/operator/logicalop/logical_projection.go | head -40Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Verify the issue: in a projection context, do columns get fresh UniqueIDs?
# Check if Schema().Clone() preserves or resets UIDs
rg -nA5 'func.*Schema.*Clone' pkg/expression/schema.go | head -30Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
75:func (s *Schema) Clone() *Schema {
76- if s == nil {
77- return nil
78- }
79- cols := make([]*Column, 0, s.Len())
80- keys := make([]KeyInfo, 0, len(s.PKOrUK))
Script executed:
# Search for all references to collectIndexJoinProbePartitionColumns
rg -nB10 -A10 'collectIndexJoinProbePartitionColumns\|collectIndexJoinOuterStaticFilters' pkg/planner/core/index_join_partition_pruning.go | head -80Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Look at the full function that contains line 89
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '50,145p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
50 tableRangeScan bool,
51 ) *property.IndexJoinRuntimeProp {
52 var (
53 innerJoinKeys []*expression.Column
54 outerJoinKeys []*expression.Column
55 )
56 if outerIdx == 0 {
57 outerJoinKeys, innerJoinKeys, _, _ = join.GetJoinKeys()
58 } else {
59 innerJoinKeys, outerJoinKeys, _, _ = join.GetJoinKeys()
60 }
61 return &property.IndexJoinRuntimeProp{
62 OtherConditions: join.OtherConditions,
63 // if inner plan doesn't contain any partition, ProbePartitionPruningConds will be nil here.
64 ProbePartitionPruningConds: collectIndexJoinProbePartitionPruningCondGroups(
65 join.SCtx(),
66 join.Children()[outerIdx],
67 join.Children()[1-outerIdx],
68 outerSchema,
69 innerJoinKeys,
70 outerJoinKeys,
71 join.OtherConditions,
72 ),
73 InnerJoinKeys: innerJoinKeys,
74 OuterJoinKeys: outerJoinKeys,
75 AvgInnerRowCnt: avgInnerRowCnt,
76 TableRangeScan: tableRangeScan,
77 }
78 }
79
80 func collectIndexJoinProbePartitionPruningCondGroups(
81 sctx base.PlanContext,
82 outerChild base.LogicalPlan,
83 innerChild base.LogicalPlan,
84 outerSchema *expression.Schema,
85 innerJoinKeys []*expression.Column,
86 outerJoinKeys []*expression.Column,
87 otherConds []expression.Expression,
88 ) []property.ProbePartitionPruningCondGroup {
89 innerPartCols := collectIndexJoinProbePartitionColumns(innerChild)
90 if len(innerPartCols) == 0 {
91 return nil
92 }
93 candidateCols := extractIndexJoinOuterPartitionPruningCandidateCols(outerSchema, innerPartCols, innerJoinKeys, outerJoinKeys, otherConds)
94 if len(candidateCols) == 0 {
95 return nil
96 }
97 outerFilters := collectIndexJoinOuterStaticFilters(outerChild, candidateCols)
98 if len(outerFilters) == 0 {
99 return nil
100 }
101 return deriveIndexJoinProbePartitionPruningCondGroups(sctx, innerPartCols, outerFilters, innerJoinKeys, outerJoinKeys, otherConds)
102 }
103
104 // extractIndexJoinOuterPartitionPruningCandidateCols finds which outer-side columns
105 // can contribute static filters for probe-side partition pruning. For a join bound
106 // like "inner_part_col op monotone(outer_col)", we later collect static filters on
107 // that outer column and fold them back into coarse pruning conditions on the inner
108 // partition column.
109 func extractIndexJoinOuterPartitionPruningCandidateCols(
110 outerSchema *expression.Schema,
111 innerPartCols []*expression.Column,
112 innerJoinKeys []*expression.Column,
113 outerJoinKeys []*expression.Column,
114 otherConds []expression.Expression,
115 ) map[int64]struct{} {
116 candidateCols := make(map[int64]struct{})
117 // why we care about otherconds: for a join: t1 join t2 on t1.a = t2.a and t1.b > t2.b, if
118 // t2.b is the inner partition column, then the "t1.b > t2.b" condition can also contribute
119 // to pruning the inner partition, just with a different derived pruning condition. So we need
120 // to consider all otherconds instead of just the join keys.
121 for _, cond := range otherConds {
122 for _, innerPartCol := range innerPartCols {
123 bound, ok := extractNormalizedIndexJoinBound(innerPartCol, cond)
124 if !ok || !expression.ExprFromSchema(bound.outerCol, outerSchema) {
125 continue
126 }
127 candidateCols[bound.outerCol.UniqueID] = struct{}{}
128 }
129 }
130 for i, innerJoinKey := range innerJoinKeys {
131 if i >= len(outerJoinKeys) || !expression.ExprFromSchema(outerJoinKeys[i], outerSchema) {
132 continue
133 }
134 for _, innerPartCol := range innerPartCols {
135 if innerJoinKey.EqualColumn(innerPartCol) {
136 candidateCols[outerJoinKeys[i].UniqueID] = struct{}{}
137 break
138 }
139 }
140 }
141 return candidateCols
142 }
143
144 func collectIndexJoinProbePartitionColumns(p base.LogicalPlan) []*expression.Column {
145 seen := make(map[int64]struct{})
Script executed:
# Check Schema.Clone to understand if UIDs are preserved
cat -n pkg/expression/schema.go | sed -n '75,95p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
75 func (s *Schema) Clone() *Schema {
76 if s == nil {
77 return nil
78 }
79 cols := make([]*Column, 0, s.Len())
80 keys := make([]KeyInfo, 0, len(s.PKOrUK))
81 for _, col := range s.Columns {
82 cols = append(cols, col.Clone().(*Column))
83 }
84 for _, key := range s.PKOrUK {
85 keys = append(keys, key.Clone())
86 }
87 schema := NewSchema(cols...)
88 schema.SetKeys(keys)
89 if s.NullableUK != nil {
90 uniqueKeys := make([]KeyInfo, 0, len(s.NullableUK))
91 for _, key := range s.NullableUK {
92 uniqueKeys = append(uniqueKeys, key.Clone())
93 }
94 schema.SetUniqueKeys(uniqueKeys)
95 }
Script executed:
# Search for where index_join_partition_pruning functions are called
rg 'collectIndexJoinProbePartitionColumns\|buildPartitionPruningCondGroups' --type=go pkg/planner/core/ -B2 -A2Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Search for who calls buildIndexJoinRuntimeProp and what outerSchema is passed
rg 'buildIndexJoinRuntimeProp' --type=go pkg/planner/core/ -B5 -A2Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/index_join_partition_pruning.go- op string
pkg/planner/core/index_join_partition_pruning.go- boundExpr expression.Expression
pkg/planner/core/index_join_partition_pruning.go- outerCol *expression.Column
pkg/planner/core/index_join_partition_pruning.go-}
pkg/planner/core/index_join_partition_pruning.go-
pkg/planner/core/index_join_partition_pruning.go:// buildIndexJoinRuntimeProp collects the runtime metadata that index-join inner scans
pkg/planner/core/index_join_partition_pruning.go-// need during physical plan construction. Besides the original join-key/cost fields,
pkg/planner/core/index_join_partition_pruning.go-// it also tries to derive extra probe-side partition pruning predicates by:
pkg/planner/core/index_join_partition_pruning.go-// 1. finding the partition column on the inner subtree,
pkg/planner/core/index_join_partition_pruning.go-// 2. collecting outer-side static filters that can bound join predicates involving that column,
pkg/planner/core/index_join_partition_pruning.go-// 3. folding those bounds back into coarse predicates on the inner partition column.
pkg/planner/core/index_join_partition_pruning.go-// The derived predicates are attached later when the physical inner scan builds its
pkg/planner/core/index_join_partition_pruning.go-// partition pruning info.
pkg/planner/core/index_join_partition_pruning.go:func buildIndexJoinRuntimeProp(
pkg/planner/core/index_join_partition_pruning.go- join *logicalop.LogicalJoin,
pkg/planner/core/index_join_partition_pruning.go- outerIdx int,
--
pkg/planner/core/exhaust_physical_plans.go- var avgInnerRowCnt float64
pkg/planner/core/exhaust_physical_plans.go- if count := outerStats.RowCount; count > 0 {
pkg/planner/core/exhaust_physical_plans.go- avgInnerRowCnt = p.EqualCondOutCnt / count
pkg/planner/core/exhaust_physical_plans.go- }
pkg/planner/core/exhaust_physical_plans.go- // for pk path
pkg/planner/core/exhaust_physical_plans.go: indexJoinPropTS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, true)
pkg/planner/core/exhaust_physical_plans.go- // for normal index path
pkg/planner/core/exhaust_physical_plans.go: indexJoinPropIS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, false)
pkg/planner/core/exhaust_physical_plans.go- indexJoins := constructIndexJoinStatic(p, prop, outerIdx, indexJoinPropTS, outerStats)
pkg/planner/core/exhaust_physical_plans.go- indexJoins = append(indexJoins, constructIndexJoinStatic(p, prop, outerIdx, indexJoinPropIS, outerStats)...)
Script executed:
# Check the outerSchema parameter - where does it come from?
# Look for the function that builds the index join and calls buildIndexJoinRuntimeProp
rg -nA30 'func.*newIndexJoinRuntimeProp\|func.*deriveIndexJoinRuntimeProp' pkg/planner/core/ --type=go | head -80Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let me check what outerSchema is - is it from join.Schema() or from something else?
rg -B10 'buildIndexJoinRuntimeProp\(' pkg/planner/core/ --type=go | grep -A10 -B10 'outerSchema'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/index_join_partition_pruning.go-// need during physical plan construction. Besides the original join-key/cost fields,
pkg/planner/core/index_join_partition_pruning.go-// it also tries to derive extra probe-side partition pruning predicates by:
pkg/planner/core/index_join_partition_pruning.go-// 1. finding the partition column on the inner subtree,
pkg/planner/core/index_join_partition_pruning.go-// 2. collecting outer-side static filters that can bound join predicates involving that column,
pkg/planner/core/index_join_partition_pruning.go-// 3. folding those bounds back into coarse predicates on the inner partition column.
pkg/planner/core/index_join_partition_pruning.go-// The derived predicates are attached later when the physical inner scan builds its
pkg/planner/core/index_join_partition_pruning.go-// partition pruning info.
pkg/planner/core/index_join_partition_pruning.go:func buildIndexJoinRuntimeProp(
--
pkg/planner/core/exhaust_physical_plans.go- // If the order by columns are not all from outer child, index join cannot promise the order.
pkg/planner/core/exhaust_physical_plans.go- if !prop.AllColsFromSchema(outerSchema) || !all {
pkg/planner/core/exhaust_physical_plans.go- return nil
pkg/planner/core/exhaust_physical_plans.go- }
pkg/planner/core/exhaust_physical_plans.go- // computed the avgInnerRowCnt
pkg/planner/core/exhaust_physical_plans.go- var avgInnerRowCnt float64
pkg/planner/core/exhaust_physical_plans.go- if count := outerStats.RowCount; count > 0 {
pkg/planner/core/exhaust_physical_plans.go- avgInnerRowCnt = p.EqualCondOutCnt / count
pkg/planner/core/exhaust_physical_plans.go- }
pkg/planner/core/exhaust_physical_plans.go- // for pk path
pkg/planner/core/exhaust_physical_plans.go: indexJoinPropTS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, true)
pkg/planner/core/exhaust_physical_plans.go- // for normal index path
pkg/planner/core/exhaust_physical_plans.go: indexJoinPropIS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, false)
Script executed:
# Look at where outerSchema comes from in exhaust_physical_plans.go
cat -n pkg/planner/core/exhaust_physical_plans.go | sed -n '1000,1050p' | head -40Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
1000 // We set `StatsVersion` here and fill other fields in `(*copTask).finishIndexPlan`. Since `copTask.IndexPlan` may
1001 // change before calling `(*copTask).finishIndexPlan`, we don't know the stats information of `ts` currently and on
1002 // the other hand, it may be hard to identify `StatsVersion` of `ts` in `(*copTask).finishIndexPlan`.
1003 ts.SetStats(&property.StatsInfo{StatsVersion: ds.TableStats.StatsVersion})
1004 usedStats := ds.SCtx().GetSessionVars().StmtCtx.GetUsedStatsInfo(false)
1005 if usedStats != nil && usedStats.GetUsedInfo(ts.PhysicalTableID) != nil {
1006 ts.UsedStatsInfo = usedStats.GetUsedInfo(ts.PhysicalTableID)
1007 }
1008 // If inner cop task need keep order, the extraHandleCol should be set.
1009 if cop.KeepOrder && !ds.TableInfo.IsCommonHandle {
1010 var needExtraProj bool
1011 cop.ExtraHandleCol, needExtraProj = ts.AppendExtraHandleCol(ds)
1012 cop.NeedExtraProj = cop.NeedExtraProj || needExtraProj
1013 }
1014 if cop.NeedExtraProj {
1015 cop.OriginSchema = ds.Schema()
1016 }
1017 cop.TablePlan = ts
1018 }
1019 if cop.TablePlan != nil && ds.TableInfo.IsCommonHandle {
1020 cop.CommonHandleCols = ds.CommonHandleCols
1021 }
1022 is.InitSchema(append(path.FullIdxCols, ds.CommonHandleCols...), cop.TablePlan != nil)
1023 indexConds, tblConds := splitIndexFilterConditions(ds, filterConds, path.FullIdxCols, path.FullIdxColLens)
1024 // Only apply this gate to residual filters (not range builders) for IndexJoin probe side.
1025 // Range-deriving predicates are decided earlier and remain unchanged.
1026 pushDownIndexConds, rootTaskIndexConds := splitLargeInListFiltersForIndexJoinProbe(indexConds, indexJoinProbeSideLargeInNotInThreshold)
1027 pushDownTblConds, rootTaskTblConds := splitLargeInListFiltersForIndexJoinProbe(tblConds, indexJoinProbeSideLargeInNotInThreshold)
1028
1029 // Note: due to a regression in JOB workload, we use the optimizer fix control to enable this for now.
1030 //
1031 // Because we are estimating an average row count of the inner side corresponding to each row from the outer side,
1032 // the estimated row count of the IndexScan should be no larger than (total row count / NDV of join key columns).
1033 // We can calculate the lower bound of the NDV therefore we can get an upper bound of the row count here.
1034 rowCountUpperBound := -1.0
1035 fixControlOK := fixcontrol.GetBoolWithDefault(ds.SCtx().GetSessionVars().GetOptimizerFixControlMap(), fixcontrol.Fix44855, false)
1036 ds.SCtx().GetSessionVars().RecordRelevantOptFix(fixcontrol.Fix44855)
1037 if fixControlOK && ds.TableStats != nil {
1038 usedColIDs := make([]int64, 0)
1039 // We only consider columns in this index that (1) are used to probe as join key,
Script executed:
# Find the function that calls buildIndexJoinRuntimeProp and trace outerSchema
rg -nB30 'indexJoinPropTS := buildIndexJoinRuntimeProp' pkg/planner/core/exhaust_physical_plans.go | head -60Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
450- // clear rootTask's indexJoinInfo in case of pushing upward, because physical index join is indexJoinInfo's consumer.
451- rt.IndexJoinInfo = nil
452- return physic
453-}
454-
455-// enumerateIndexJoinByOuterIdx will enumerate temporary index joins by index join prop required for its inner child.
456-func enumerateIndexJoinByOuterIdx(super base.LogicalPlan, prop *property.PhysicalProperty, outerIdx int) (joins []base.PhysicalPlan) {
457- ge, p := base.GetGEAndLogicalOp[*logicalop.LogicalJoin](super)
458- stats0, stats1, schema0, schema1 := getJoinChildStatsAndSchema(ge, p)
459- var outerSchema *expression.Schema
460- var outerStats *property.StatsInfo
461- if outerIdx == 0 {
462- outerSchema = schema0
463- outerStats = stats0
464- } else {
465- outerSchema = schema1
466- outerStats = stats1
467- }
468- // need same order
469- all, _ := prop.AllSameOrder()
470- // If the order by columns are not all from outer child, index join cannot promise the order.
471- if !prop.AllColsFromSchema(outerSchema) || !all {
472- return nil
473- }
474- // computed the avgInnerRowCnt
475- var avgInnerRowCnt float64
476- if count := outerStats.RowCount; count > 0 {
477- avgInnerRowCnt = p.EqualCondOutCnt / count
478- }
479- // for pk path
480: indexJoinPropTS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, true)
Script executed:
# What is outerSchema - is it the logical plan schema or the physical plan schema?
rg -nB50 'indexJoinPropTS := buildIndexJoinRuntimeProp' pkg/planner/core/exhaust_physical_plans.go | grep -A20 'outerSchema'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
459- var outerSchema *expression.Schema
460- var outerStats *property.StatsInfo
461- if outerIdx == 0 {
462- outerSchema = schema0
463- outerStats = stats0
464- } else {
465- outerSchema = schema1
466- outerStats = stats1
467- }
468- // need same order
469- all, _ := prop.AllSameOrder()
470- // If the order by columns are not all from outer child, index join cannot promise the order.
471- if !prop.AllColsFromSchema(outerSchema) || !all {
472- return nil
473- }
474- // computed the avgInnerRowCnt
475- var avgInnerRowCnt float64
476- if count := outerStats.RowCount; count > 0 {
477- avgInnerRowCnt = p.EqualCondOutCnt / count
478- }
479- // for pk path
480: indexJoinPropTS := buildIndexJoinRuntimeProp(p, outerIdx, outerSchema, avgInnerRowCnt, true)
Script executed:
# Find getJoinChildStatsAndSchema to understand what schemas are used
rg -nA20 'func getJoinChildStatsAndSchema' pkg/planner/core/exhaust_physical_plans.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
1709:func getJoinChildStatsAndSchema(ge base.GroupExpression, p base.LogicalPlan) (stats0, stats1 *property.StatsInfo, schema0, schema1 *expression.Schema) {
1710- if ge != nil {
1711- g := ge.(*memo.GroupExpression)
1712- stats0, schema0 = g.Inputs[0].GetLogicalProperty().Stats, g.Inputs[0].GetLogicalProperty().Schema
1713- stats1, schema1 = g.Inputs[1].GetLogicalProperty().Stats, g.Inputs[1].GetLogicalProperty().Schema
1714- } else {
1715- stats1, schema1 = p.Children()[1].StatsInfo(), p.Children()[1].Schema()
1716- stats0, schema0 = p.Children()[0].StatsInfo(), p.Children()[0].Schema()
1717- }
1718- return
1719-}
1720-
1721-// If we can use mpp broadcast join, that's our first choice.
1722-func preferMppBCJ(super base.LogicalPlan) bool {
1723- ge, p := base.GetGEAndLogicalOp[*logicalop.LogicalJoin](super)
1724- if len(p.EqualConditions) == 0 && p.SCtx().GetSessionVars().AllowCartesianBCJ == 2 {
1725- return true
1726- }
1727-
1728- onlyCheckChild1 := p.JoinType == base.LeftOuterJoin || p.JoinType == base.SemiJoin || p.JoinType == base.AntiSemiJoin
1729- onlyCheckChild0 := p.JoinType == base.RightOuterJoin
Script executed:
# Now trace back: outerSchema is the child's schema. When we have a projection on outer,
# the outer child IS the projection, so outerSchema is the projection's schema.
# Let's verify this understanding by looking at what happens when outer is a projection
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '174,191p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
174 func collectIndexJoinOuterStaticFilters(p base.LogicalPlan, candidateCols map[int64]struct{}) []expression.Expression {
175 switch x := p.(type) {
176 case *logicalop.DataSource:
177 return filterIndexJoinOuterStaticFilters(candidateCols, x.AllConds)
178 case *logicalop.LogicalSelection:
179 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
180 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
181 case *logicalop.LogicalProjection:
182 return substituteIndexJoinOuterFiltersThroughProjection(x, collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols), candidateCols)
183 case *logicalop.LogicalLimit, *logicalop.LogicalTopN, *logicalop.LogicalSort:
184 return collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
185 case *logicalop.LogicalUnionScan:
186 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
187 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
188 default:
189 return nil
190 }
191 }
Script executed:
# The key issue: when outerChild is a LogicalProjection, its schema contains fresh UIDs
# Let's check if LogicalProjection.Schema() returns a schema with new UIDs or old ones
rg -nA15 'func.*LogicalProjection.*Schema\(' pkg/planner/core/operator/logicalop/logical_projection.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Check if projection's Init or SetSchema allocates fresh UIDs
rg -nB5 -A25 'func.*\(p \*LogicalProjection\).*SetSchema\|func.*\(p \*LogicalProjection\).*Init' pkg/planner/core/operator/logicalop/logical_projection.go | head -60Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let's look at the entire LogicalProjection type definition
cat -n pkg/planner/core/operator/logicalop/logical_projection.go | sed -n '30,100p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
30 )
31
32 // LogicalProjection represents a select fields plan.
33 type LogicalProjection struct {
34 LogicalSchemaProducer `hash64-equals:"true"`
35
36 Exprs []expression.Expression `hash64-equals:"true" shallow-ref:"true"`
37
38 // CalculateNoDelay indicates this Projection is the root Plan and should be
39 // calculated without delay and will not return any result to client.
40 // Currently it is "true" only when the current sql query is a "DO" statement.
41 // See "https://dev.mysql.com/doc/refman/5.7/en/do.html" for more detail.
42 CalculateNoDelay bool `hash64-equals:"true"`
43
44 // Proj4Expand is used for expand to project same column reference, while these
45 // col may be filled with null so we couldn't just eliminate this projection itself.
46 Proj4Expand bool `hash64-equals:"true"`
47 }
48
49 // Init initializes LogicalProjection.
50 func (p LogicalProjection) Init(ctx base.PlanContext, qbOffset int) *LogicalProjection {
51 p.BaseLogicalPlan = NewBaseLogicalPlan(ctx, plancodec.TypeProj, &p, qbOffset)
52 return &p
53 }
54
55 // *************************** start implementation of Plan interface **********************************
56
57 // ExplainInfo implements Plan interface.
58 func (p *LogicalProjection) ExplainInfo() string {
59 eCtx := p.SCtx().GetExprCtx().GetEvalCtx()
60 enableRedactLog := p.SCtx().GetSessionVars().EnableRedactLog
61 return expression.ExplainExpressionList(eCtx, p.Exprs, p.Schema(), enableRedactLog)
62 }
63
64 // ReplaceExprColumns implements base.LogicalPlan interface.
65 func (p *LogicalProjection) ReplaceExprColumns(replace map[string]*expression.Column) {
66 for i, expr := range p.Exprs {
67 p.Exprs[i] = ruleutil.ResolveExprAndReplace(expr, replace)
68 }
69 }
70
71 // *************************** end implementation of Plan interface ************************************
72
73 // *************************** start implementation of logicalPlan interface ***************************
74
75 // HashCode implements base.LogicalPlan.<0th> interface.
76 func (p *LogicalProjection) HashCode() []byte {
77 // PlanType + SelectOffset + ExprNum + [Exprs]
78 // Expressions are commonly `Column`s, whose hashcode has the length 9, so
79 // we pre-alloc 10 bytes for each expr's hashcode.
80 result := make([]byte, 0, 12+len(p.Exprs)*10)
81 result = util.EncodeIntAsUint32(result, plancodec.TypeStringToPhysicalID(p.TP()))
82 result = util.EncodeIntAsUint32(result, p.QueryBlockOffset())
83 result = util.EncodeIntAsUint32(result, len(p.Exprs))
84 for _, expr := range p.Exprs {
85 exprHashCode := expr.HashCode()
86 result = util.EncodeIntAsUint32(result, len(exprHashCode))
87 result = append(result, exprHashCode...)
88 }
89 return result
90 }
91
92 // PredicatePushDown implements base.LogicalPlan.<1st> interface.
93 func (p *LogicalProjection) PredicatePushDown(predicates []expression.Expression) (ret []expression.Expression, retPlan base.LogicalPlan, err error) {
94 if slices.ContainsFunc(p.Exprs, expression.HasAssignSetVarFunc) {
95 _, child, err := p.BaseLogicalPlan.PredicatePushDown(nil)
96 return predicates, child, err
97 }
98 canBePushed, canNotBePushed := breakDownPredicates(p, predicates)
99 remained, child, err := p.BaseLogicalPlan.PredicatePushDown(canBePushed)
100 return append(remained, canNotBePushed...), child, err
Script executed:
# Search for how projection builds its schema - look for BuildLogicalProjectionSchema or similar
rg 'BuildLogicalProjectionSchema\|buildProjectionSchema' --type=go pkg/planner/core/Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 1
Script executed:
# Look at how a LogicalProjection's schema is created when it's constructed
rg -nA5 'LogicalProjection{' pkg/planner/core/ --type=go | head -50Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stderr:
regex parse error:
LogicalProjection{
^
error: unclosed counted repetition
Script executed:
# Test the theory: when outerChild is projection with fresh UIDs,
# and we pass projection.Schema() as outerSchema to buildIndexJoinRuntimeProp,
# then candidateCols gets projection output UIDs (fresh ones)
# But when we recursively call collectIndexJoinOuterStaticFilters on datasource child,
# the datasource has its own old UIDs. Let me verify this scenario exists in tests
rg -l 'index.*join\|IndexJoin' --type=go pkg/planner/core/ | xargs grep -l 'LogicalProjection\|Projection'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 123
Script executed:
# Search for tests related to index join partition pruning
find pkg/planner/core -name '*test*.go' -type f | xargs grep -l 'partition.*prun\|ProbePartitionPruningConds' 2>/dev/null | head -5Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/binaryplan/binary_plan_core_test.go
pkg/planner/core/casetest/cbotest/cbo_test.go
pkg/planner/core/casetest/hint/hint_test.go
pkg/planner/core/casetest/indexmerge/indexmerge_intersection_test.go
pkg/planner/core/casetest/instanceplancache/others_test.go
Script executed:
# Look for test files that might cover index join scenarios
find pkg/planner/core -name '*test*.go' -type f | xargs grep -l 'index.*join\|IndexJoin' 2>/dev/null | head -3Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/cbotest/cbo_test.go
pkg/planner/core/casetest/correlated/correlated_test.go
pkg/planner/core/casetest/instanceplancache/concurrency_test.go
Script executed:
# Let me directly check what happens: when outer is DataSource -> Projection -> IndexJoin
# outerSchema passed to buildIndexJoinRuntimeProp is the Projection's schema
# The projection's schema columns have fresh UIDs allocated during projection construction
# Let's verify by checking how schema is set in projection
rg -nA30 'func.*DeriveStats' pkg/planner/core/operator/logicalop/logical_projection.go | head -50Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
278:func (p *LogicalProjection) DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, reloads []bool) (*property.StatsInfo, bool, error) {
279- childProfile := childStats[0]
280- var reload bool
281- if len(reloads) == 1 {
282- reload = reloads[0]
283- }
284- if !reload && p.StatsInfo() != nil {
285- // Reload GroupNDVs since colGroups may have changed.
286- p.StatsInfo().GroupNDVs = p.getGroupNDVs(childProfile, selfSchema)
287- return p.StatsInfo(), false, nil
288- }
289- p.SetStats(&property.StatsInfo{
290- RowCount: childProfile.RowCount,
291- ColNDVs: make(map[int64]float64, len(p.Exprs)),
292- })
293- cols := make([]*expression.Column, 0, 8)
294- for i, expr := range p.Exprs {
295- cols = expression.ExtractAllColumnsFromExpressionsInUsedSlices(cols, nil, expr)
296- p.StatsInfo().ColNDVs[selfSchema.Columns[i].UniqueID], _ = cardinality.EstimateColsNDVWithMatchedLen(
297- p.SCtx(), cols, childSchema[0], childProfile)
298- cols = cols[:0]
299- }
300- p.StatsInfo().GroupNDVs = p.getGroupNDVs(childProfile, selfSchema)
301- return p.StatsInfo(), true, nil
302-}
303-
304-// ExtractColGroups implements base.LogicalPlan.<12th> interface.
305-func (p *LogicalProjection) ExtractColGroups(colGroups [][]*expression.Column) [][]*expression.Column {
306- if len(colGroups) == 0 {
307- return nil
308- }
Script executed:
# Check if there's any code that builds the schema for LogicalProjection
rg 'buildProjectionSchema\|proj.*Schema.*Append\|proj.*SetSchema' pkg/planner/core/ --type=go -B2 -A2 | head -60Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let me check the actual test files to see if there are tests for index join with outer projections
grep -r 'partition.*prun\|ProbePartitionPruningConds' pkg/planner/core/casetest/ --include='*.go' 2>/dev/null | head -5Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/binaryplan/binary_plan_core_test.go: tk.MustExec(`set @@tidb_partition_prune_mode='` + string(variable.Static) + `'`)
pkg/planner/core/casetest/binaryplan/binary_plan_core_test.go: tk.MustExec(`set @@tidb_partition_prune_mode='` + string(variable.Static) + `'`)
pkg/planner/core/casetest/binaryplan/binary_plan_core_test.go: "set @@tidb_partition_prune_mode='static'",
pkg/planner/core/casetest/binaryplan/binary_plan_core_test.go: "set @@tidb_partition_prune_mode='dynamic'",
pkg/planner/core/casetest/cbotest/cbo_test.go: tk.MustExec("set @@tidb_partition_prune_mode = 'static';")
Script executed:
# Look in the actual test SQL files for index join partition pruning tests
find . -name '*.sql' -path '*/planner/core/*' -type f | xargs grep -l 'index.*partition\|partition.*index' 2>/dev/null | head -3Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let me trace through the logic one more time - when a DataSource column references the same thing
# as a projection output column, do they have the same UID?
# Let's look for how column UIDs are assigned in DataSource
rg -nA10 'func.*NewDataSource\|DataSource.*Init' pkg/planner/core/operator/logicalop/logical_datasource.go | head -40Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Actually, let me think about this differently. Let me find an actual test query
# that would have outer DataSource -> Projection -> IndexJoin and see if it gets partition pruning
rg 'partition' pkg/planner/core/casetest/cbotest/cbo_test.go | head -20Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
tk.MustExec("set @@tidb_partition_prune_mode = 'static';")
tk.MustExec("create table t4 (a int, b int) partition by range (a) (partition p1 values less than (2), partition p2 values less than (3))")
Script executed:
# Let me create a simple test case to verify the issue
# Search for existing test patterns with outer joins and partitions
find pkg/planner/core/casetest -name '*.sql' -type f -exec grep -l 'index.*join\|join.*partition' {} \; 2>/dev/null | head -3Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let's look at the actual SQL test files that test index join
ls -la pkg/planner/core/casetest/*/testdata/ 2>/dev/null | grep -i 'join\|partition' | head -10Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/partition/testdata/:
-rw-r--r-- 1 jailuser jailuser 13712 Apr 24 08:25 integration_partition_suite_in.json
-rw-r--r-- 1 jailuser jailuser 147251 Apr 24 08:25 integration_partition_suite_out.json
-rw-r--r-- 1 jailuser jailuser 147251 Apr 24 08:25 integration_partition_suite_xut.json
-rw-r--r-- 1 jailuser jailuser 15978 Apr 24 08:25 partition_pruner_in.json
-rw-r--r-- 1 jailuser jailuser 115666 Apr 24 08:25 partition_pruner_out.json
-rw-r--r-- 1 jailuser jailuser 115666 Apr 24 08:25 partition_pruner_xut.json
-rw-r--r-- 1 jailuser jailuser 19029 Apr 24 08:25 cdc_join_reorder_suite_in.json
-rw-r--r-- 1 jailuser jailuser 93397 Apr 24 08:25 cdc_join_reorder_suite_out.json
-rw-r--r-- 1 jailuser jailuser 93397 Apr 24 08:25 cdc_join_reorder_suite_xut.json
Script executed:
# Let me check if there's a test file for index_join_partition_pruning
find . -name '*index*join*partition*' -o -name '*partition*prun*' 2>/dev/null | head -10Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
./pkg/planner/core/casetest/partition/partition_pruner_test.go
./pkg/planner/core/casetest/partition/testdata/partition_pruner_in.json
./pkg/planner/core/casetest/partition/testdata/partition_pruner_out.json
./pkg/planner/core/casetest/partition/testdata/partition_pruner_xut.json
./pkg/planner/core/index_join_partition_pruning.go
./pkg/planner/core/rule/rule_partition_pruning_test.go
./pkg/planner/util/partitionpruning
./pkg/planner/util/partitionpruning/partition_prune.go
./tests/integrationtest/r/planner/core/casetest/partition/partition_pruner.result
./tests/integrationtest/r/planner/core/partition_pruner.result
Script executed:
# Let's look at the comments and logic flow more carefully
# The key insight: when filters are collected from datasource, they reference datasource UIDs
# Then when passed to substituteIndexJoinOuterFiltersThroughProjection, the mapping happens
# But the filters list is already empty due to the recursive call
# Let me verify if this is actually an issue by checking the logic flow
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '174,192p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
174 func collectIndexJoinOuterStaticFilters(p base.LogicalPlan, candidateCols map[int64]struct{}) []expression.Expression {
175 switch x := p.(type) {
176 case *logicalop.DataSource:
177 return filterIndexJoinOuterStaticFilters(candidateCols, x.AllConds)
178 case *logicalop.LogicalSelection:
179 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
180 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
181 case *logicalop.LogicalProjection:
182 return substituteIndexJoinOuterFiltersThroughProjection(x, collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols), candidateCols)
183 case *logicalop.LogicalLimit, *logicalop.LogicalTopN, *logicalop.LogicalSort:
184 return collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
185 case *logicalop.LogicalUnionScan:
186 filters := collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
187 return append(filters, filterIndexJoinOuterStaticFilters(candidateCols, x.Conditions)...)
188 default:
189 return nil
190 }
191 }
192
Script executed:
# Let me trace through a concrete example:
# Look at how candidateCols is populated - it gets column UIDs from join conditions
# Those columns would reference the outer child's schema
# If outer child is a projection, those UIDs are projection output UIDs
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '109,142p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
109 func extractIndexJoinOuterPartitionPruningCandidateCols(
110 outerSchema *expression.Schema,
111 innerPartCols []*expression.Column,
112 innerJoinKeys []*expression.Column,
113 outerJoinKeys []*expression.Column,
114 otherConds []expression.Expression,
115 ) map[int64]struct{} {
116 candidateCols := make(map[int64]struct{})
117 // why we care about otherconds: for a join: t1 join t2 on t1.a = t2.a and t1.b > t2.b, if
118 // t2.b is the inner partition column, then the "t1.b > t2.b" condition can also contribute
119 // to pruning the inner partition, just with a different derived pruning condition. So we need
120 // to consider all otherconds instead of just the join keys.
121 for _, cond := range otherConds {
122 for _, innerPartCol := range innerPartCols {
123 bound, ok := extractNormalizedIndexJoinBound(innerPartCol, cond)
124 if !ok || !expression.ExprFromSchema(bound.outerCol, outerSchema) {
125 continue
126 }
127 candidateCols[bound.outerCol.UniqueID] = struct{}{}
128 }
129 }
130 for i, innerJoinKey := range innerJoinKeys {
131 if i >= len(outerJoinKeys) || !expression.ExprFromSchema(outerJoinKeys[i], outerSchema) {
132 continue
133 }
134 for _, innerPartCol := range innerPartCols {
135 if innerJoinKey.EqualColumn(innerPartCol) {
136 candidateCols[outerJoinKeys[i].UniqueID] = struct{}{}
137 break
138 }
139 }
140 }
141 return candidateCols
142 }
Script executed:
# Now verify: when we recursively call with DataSource and candidateCols (projection output UIDs),
# does filterIndexJoinOuterStaticFilters correctly match datasource UIDs against projection output UIDs?
cat -n pkg/planner/core/index_join_partition_pruning.go | sed -n '193,212p'Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
193 func filterIndexJoinOuterStaticFilters(candidateCols map[int64]struct{}, filters []expression.Expression) []expression.Expression {
194 result := make([]expression.Expression, 0, len(filters))
195 seen := make(map[string]struct{}, len(filters))
196 for _, filter := range filters {
197 cols := expression.ExtractColumns(filter)
198 if len(cols) != 1 {
199 continue
200 }
201 if _, ok := candidateCols[cols[0].UniqueID]; !ok {
202 continue
203 }
204 key := string(filter.HashCode())
205 if _, ok := seen[key]; ok {
206 continue
207 }
208 seen[key] = struct{}{}
209 result = append(result, filter)
210 }
211 return result
212 }
Script executed:
# Let me check if there are integration tests or result files that show what partition pruning should produce
grep -r "ProbePartitionPruningConds\|partition.*index\|index.*partition" pkg/planner/core/casetest/partition/testdata/*.json 2>/dev/null | head -5Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/partition/testdata/integration_partition_suite_out.json: "IndexReader root partition:all index:IndexRangeScan",
pkg/planner/core/casetest/partition/testdata/integration_partition_suite_out.json: "│ └─IndexRangeScan cop[tikv] table:t, partition:P0, index:b(b) range:[1,1], keep order:false",
pkg/planner/core/casetest/partition/testdata/integration_partition_suite_out.json: "│ └─IndexRangeScan cop[tikv] table:t, partition:p1, index:b(b) range:[1,1], keep order:false",
pkg/planner/core/casetest/partition/testdata/integration_partition_suite_out.json: " └─IndexRangeScan cop[tikv] table:t, partition:P2, index:b(b) range:[1,1], keep order:false"
pkg/planner/core/casetest/partition/testdata/integration_partition_suite_out.json: "IndexReader root partition:all index:IndexRangeScan",
Script executed:
# Let me look at what should happen when a projection is on the outer side
# Check if there are any test results that show the expected behavior
rg 'Projection.*IndexJoin|outer.*Projection' pkg/planner/core/casetest/ --type=go -lRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/join/join_test.go
Script executed:
# Let me verify the actual issue by checking what join test file has
cat -n pkg/planner/core/casetest/join/join_test.go | head -50Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
1 // Copyright 2024 PingCAP, Inc.
2 //
3 // Licensed under the Apache License, Version 2.0 (the "License");
4 // you may not use this file except in compliance with the License.
5 // You may obtain a copy of the License at
6 //
7 // http://www.apache.org/licenses/LICENSE-2.0
8 //
9 // Unless required by applicable law or agreed to in writing, software
10 // distributed under the License is distributed on an "AS IS" BASIS,
11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 // See the License for the specific language governing permissions and
13 // limitations under the License.
14
15 package join
16
17 import (
18 "fmt"
19 "strconv"
20 "strings"
21 "testing"
22
23 "github.com/pingcap/tidb/pkg/testkit"
24 "github.com/stretchr/testify/require"
25 )
26
27 func TestSemiJoinOrder(t *testing.T) {
28 testkit.RunTestUnderCascades(t, func(t *testing.T, tk *testkit.TestKit, cascades, caller string) {
29 tk.MustExec("use test;")
30 tk.MustExec("create table t1 (col0 int, col1 int);")
31 tk.MustExec("create table t2 (col0 int, col1 int);")
32 tk.MustExec("insert into t1 values (null, 3), (null, 5), (null, null), (1, 1), (1, 2), (1, null), (2, 1), (2, 2), (2, null), (3, 1), (3, 2), (3, 4), (3, null);")
33 tk.MustExec("insert into t2 values (null, 3), (null, 4), (null, null), (1, 1), (3, 1), (3, 3), (3, null), (4, null), (4, 1), (4, 2), (4, 10);")
34 result := testkit.Rows("1 <nil>",
35 "1 1",
36 "1 2",
37 "3 <nil>",
38 "3 1",
39 "3 2",
40 "3 4")
41 tk.MustExec("set tidb_hash_join_version=optimized")
42 tk.MustQuery("select * from t1 where exists (select 1 from t2 where t1.col0 = t2.col0) order by t1.col0, t1.col1;").Check(result)
43 tk.MustQuery("select /*+ HASH_JOIN_BUILD(t1) */ * from t1 where exists (select 1 from t2 where t1.col0 = t2.col0) order by t1.col0, t1.col1;").Check(result)
44 tk.MustQuery("select /*+ HASH_JOIN_BUILD(t2@sel_2) */ * from t1 where exists (select 1 from t2 where t1.col0 = t2.col0) order by t1.col0, t1.col1;").Check(result)
45 tk.MustQuery("explain format = 'plan_tree' select /*+ HASH_JOIN_BUILD(t1) */ * from t1 where exists (select 1 from t2 where t1.col0 = t2.col0) order by t1.col0, t1.col1;").Check(testkit.Rows(
46 "Sort root test.t1.col0, test.t1.col1",
47 "└─HashJoin root semi join, left side:TableReader, equal:[eq(test.t1.col0, test.t2.col0)]",
48 " ├─TableReader(Build) root data:Selection",
49 " │ └─Selection cop[tikv] not(isnull(test.t1.col0))",
50 " │ └─TableFullScan cop[tikv] table:t1 keep order:false, stats:pseudo",
Script executed:
# Search specifically for tests with projections in join outer side
grep -r 'SELECT.*FROM.*SELECT.*JOIN\|Projection.*IndexJoin\|outer.*Projection' pkg/planner/core/casetest/ --include='*.go' -B2 -A2 | head -30Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
pkg/planner/core/casetest/join/join_test.go- "├─TableReader(Build) root data:TableFullScan",
pkg/planner/core/casetest/join/join_test.go- "│ └─TableFullScan cop[tikv] table:tt0 keep order:false, stats:pseudo",
pkg/planner/core/casetest/join/join_test.go: "└─HashJoin(Probe) root left outer join, left side:Projection, equal:[eq(Column, Column)]",
pkg/planner/core/casetest/join/join_test.go- " ├─Projection(Build) root test.tt1.c0, cast(test.tt1.c0, double BINARY)->Column",
pkg/planner/core/casetest/join/join_test.go- " │ └─TableReader root data:TableFullScan",
Script executed:
# Let me look for integration tests that might have this scenario
grep -r 'partition.*index\|partition.*prun' tests/integrationtest/ --include='*.sql' | head -10Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Actually, let me write a minimal analysis script to verify the UID mismatch issue
python3 << 'EOF'
# The issue claim:
# 1. outerSchema is Projection.Schema() with fresh Projection output UIDs
# 2. candidateCols built from outerSchema contains those fresh UIDs
# 3. When recursing to DataSource, we pass those Projection output UIDs
# 4. DataSource.AllConds have DataSource column UIDs
# 5. filterIndexJoinOuterStaticFilters checks if DataSource UID is in candidateCols (Projection output UIDs)
# 6. They don't match, so filters get dropped
# This logic is sound IF:
# - Projections don't preserve child column UIDs (they allocate fresh ones)
# - The code path really does pass candidateCols (Projection UIDs) to DataSource with DataSource UIDs
print("Analysis of the UID mismatch claim:")
print("1. outerSchema comes from join child, which is Projection")
print("2. candidateCols built from Projection schema = Projection output UIDs")
print("3. Recursion to DataSource passes Projection output UIDs")
print("4. DataSource filters have DataSource UIDs")
print("5. Line 201 checks DataSource UID against Projection UIDs")
print("6. Mismatch → filters empty → early return from substituteIndexJoinOuterFiltersThroughProjection")
print("\nThis is a valid concern IF Projections allocate fresh UIDs")
EOFRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
Analysis of the UID mismatch claim:
1. outerSchema comes from join child, which is Projection
2. candidateCols built from Projection schema = Projection output UIDs
3. Recursion to DataSource passes Projection output UIDs
4. DataSource filters have DataSource UIDs
5. Line 201 checks DataSource UID against Projection UIDs
6. Mismatch → filters empty → early return from substituteIndexJoinOuterFiltersThroughProjection
This is a valid concern IF Projections allocate fresh UIDs
Script executed:
# Check the join_test to see if there are any outer projection tests
rg 'Projection\|LogicalProjection' pkg/planner/core/casetest/join/join_test.go -B2 -A2 | head -40Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let me now verify the core issue more directly by checking what happens in the test case
# Let's look at the actual partition pruner test to see if there are any outer projection cases
head -100 pkg/planner/core/casetest/partition/partition_pruner_test.goRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
// Copyright 2023 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package partition
import (
"bytes"
"fmt"
"sort"
"strconv"
"strings"
"testing"
"github.com/pingcap/failpoint"
"github.com/pingcap/tidb/pkg/planner/util/coretestsdk"
"github.com/pingcap/tidb/pkg/sessionctx/vardef"
"github.com/pingcap/tidb/pkg/testkit"
"github.com/pingcap/tidb/pkg/testkit/testdata"
"github.com/pingcap/tidb/pkg/testkit/testfailpoint"
"github.com/stretchr/testify/require"
)
func TestHashPartitionPruner(t *testing.T) {
testfailpoint.Enable(t, "github.com/pingcap/tidb/pkg/planner/core/forceDynamicPrune", `return(true)`)
testkit.RunTestUnderCascades(t, func(t *testing.T, tk *testkit.TestKit, cascades, caller string) {
tk.MustExec("create database test_partition")
tk.MustExec("use test_partition")
tk.MustExec("drop table if exists t1, t2;")
tk.Session().GetSessionVars().EnableClusteredIndex = vardef.ClusteredIndexDefModeIntOnly
tk.MustExec("create table t2(id int, a int, b int, primary key(id, a)) partition by hash(id + a) partitions 10;")
tk.MustExec("create table t1(id int primary key, a int, b int) partition by hash(id) partitions 10;")
tk.MustExec("create table t3(id int, a int, b int, primary key(id, a)) partition by hash(id) partitions 10;")
tk.MustExec("create table t4(d datetime, a int, b int, primary key(d, a)) partition by hash(year(d)) partitions 10;")
tk.MustExec("create table t5(d date, a int, b int, primary key(d, a)) partition by hash(month(d)) partitions 10;")
tk.MustExec("create table t6(a int, b int) partition by hash(a) partitions 3;")
tk.MustExec("create table t7(a int, b int) partition by hash(a + b) partitions 10;")
tk.MustExec("create table t8(a int, b int) partition by hash(a) partitions 6;")
tk.MustExec("create table t9(a bit(1) default null, b int(11) default null) partition by hash(a) partitions 3;") //issue `#22619`
tk.MustExec("create table t10(a bigint unsigned) partition BY hash (a);")
tk.MustExec("create table t11(a int, b int) partition by hash(a + a + a + b) partitions 5")
var input []string
var output []struct {
SQL string
Result []string
}
partitionPrunerData := getPartitionPrunerData()
partitionPrunerData.LoadTestCases(t, &input, &output, cascades, caller)
for i, tt := range input {
testdata.OnRecord(func() {
output[i].SQL = tt
output[i].Result = testdata.ConvertRowsToStrings(tk.MustQuery(tt).Rows())
})
tk.MustQuery(tt).Check(testkit.Rows(output[i].Result...))
}
})
}
type testTablePartitionInfo struct {
Table string
Partitions string
}
// getPartitionInfoFromPlan uses to extract table partition information from the plan tree string. Here is an example, the plan is like below:
//
// "Projection_7 80.00 root test_partition.t1.id, test_partition.t1.a, test_partition.t1.b, test_partition.t2.id, test_partition.t2.a, test_partition.t2.b",
// "└─HashJoin_9 80.00 root CARTESIAN inner join",
// " ├─TableReader_12(Build) 8.00 root partition:p1 data:Selection_11",
// " │ └─Selection_11 8.00 cop[tikv] 1, eq(test_partition.t2.b, 6), in(test_partition.t2.a, 6, 7, 8)",
// " │ └─TableFullScan_10 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo",
// " └─TableReader_15(Probe) 10.00 root partition:p0 data:Selection_14",
// " └─Selection_14 10.00 cop[tikv] 1, eq(test_partition.t1.a, 5)",
// " └─TableFullScan_13 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo"
//
// The return table partition info is: t1: p0; t2: p1
func getPartitionInfoFromPlan(plan []string) string {
infos := make([]testTablePartitionInfo, 0, 2)
info := testTablePartitionInfo{}
for _, row := range plan {
partitions := coretestsdk.GetFieldValue("partition:", row)
if partitions != "" {
info.Partitions = partitions
continue
}
tbl := coretestsdk.GetFieldValue("table:", row)
if tbl != "" {
info.Table = tbl
infos = append(infos, info)
}
Script executed:
# Look at the test data to see if there are any outer projection + index join cases
grep -A5 -B5 'select.*from.*select' pkg/planner/core/casetest/partition/testdata/partition_pruner_in.json | head -40Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
{
"SQL": "select * from t1 join t1 as t2 on t1.id = t2.id where (t1.a=1 or t1.a = 3) and (t2.a = 6 and t2.b = 6)",
"Pruner": "t1: p0; t2: p1"
},
{
"SQL": "select * from t1 where t1.a in (select b from t2 where a in (1,2)) order by a",
"Pruner": "t1: all; t2: p0"
},
{
"SQL": "select * from t1 where t1.a in (select b from t1 where a in (1,2)) order by a",
"Pruner": "t1: all; t1: p0"
},
{
"SQL": "select * from t1 left join t2 on t1.id = t2.id where (t1.a=1 or t1.a = 3) and t2.a in (6,7,8)",
"Pruner": "t1: p0; t2: p1"
--
{
"SQL": "select /*+ INL_HASH_JOIN(t1,t2) */ count(*) from t2 join t1 on t2.b = t1.b where t2.a in (1,2) and t1.a in (1,6) and t1.b in (100,9,6,1)",
"Pruner": "t1: all; t2: p0"
},
{
"SQL": "select * from t1 where a in (1,2,3) union select * from t1 where b in (6,7,8) order by a",
"Pruner": "t1: p0; t1: p1"
},
// For ranges
{
"SQL": "select * from t1 where a < 1 or b < 2",
--
{
"SQL": "select * from t1 join t1 as t2 on t1.id = t2.id where (t1.a<=1 or t1.a <= 3) and (t2.a <= 6 and t2.b <= 6)",
"Pruner": "t1: p0; t2: all"
},
{
"SQL": "select * from t1 where t1.a in (select b from t2 where a BETWEEN 1 AND 2) order by a",
"Pruner": "t1: all; t2: p0"
},
{
"SQL": "select * from t1 where t1.a in (select b from t1 where a BETWEEN 1 AND 2) order by a",
"Pruner": "t1: all; t1: p0"
},
Script executed:
# Let me search for actual test SQL with outer projections
find tests/integrationtest -name '*.sql' -type f -exec grep -l 'select.*from.*select.*partition\|index.*join.*partition' {} \; 2>/dev/null | head -3Repository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
Script executed:
# Let me now create a definitive test to verify the issue
# Check if candidateCols UIDs come from the outer schema (projection output)
cat > /tmp/test_uid_flow.txt << 'EOF'
Tracing the UID flow when outerChild is DataSource -> Projection -> IndexJoin:
1. buildIndexJoinRuntimeProp gets called with outerIdx pointing to outer child
2. outerSchema = outer_child.Schema() = Projection.Schema() (projection output columns)
3. extractIndexJoinOuterPartitionPruningCandidateCols(outerSchema, ...) builds candidateCols:
- Line 124: checks if bound.outerCol is in outerSchema (projection output columns)
- Line 127: candidateCols[bound.outerCol.UniqueID] - stores projection output UID
4. collectIndexJoinOuterStaticFilters(outerChild, candidateCols) called with:
- outerChild = Projection
- candidateCols = {projection_output_UID1, projection_output_UID2, ...}
5. For LogicalProjection case:
- Line 182: collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
where x.Children()[0] = DataSource and candidateCols = projection output UIDs
6. For DataSource case:
- Line 177: filterIndexJoinOuterStaticFilters(candidateCols, x.AllConds)
where x.AllConds have DataSource column UIDs
7. Line 201 check: if _, ok := candidateCols[cols[0].UniqueID]; !ok
- This checks if DataSource UID is in candidateCols (projection output UIDs)
- They don't match → filter is dropped
CONCLUSION: The review's diagnosis is correct.
EOF
cat /tmp/test_uid_flow.txtRepository: pingcap/tidb
Repository: pingcap/tidb
Exit code: 0
stdout:
Tracing the UID flow when outerChild is DataSource -> Projection -> IndexJoin:
1. buildIndexJoinRuntimeProp gets called with outerIdx pointing to outer child
2. outerSchema = outer_child.Schema() = Projection.Schema() (projection output columns)
3. extractIndexJoinOuterPartitionPruningCandidateCols(outerSchema, ...) builds candidateCols:
- Line 124: checks if bound.outerCol is in outerSchema (projection output columns)
- Line 127: candidateCols[bound.outerCol.UniqueID] - stores projection output UID
4. collectIndexJoinOuterStaticFilters(outerChild, candidateCols) called with:
- outerChild = Projection
- candidateCols = {projection_output_UID1, projection_output_UID2, ...}
5. For LogicalProjection case:
- Line 182: collectIndexJoinOuterStaticFilters(x.Children()[0], candidateCols)
where x.Children()[0] = DataSource and candidateCols = projection output UIDs
6. For DataSource case:
- Line 177: filterIndexJoinOuterStaticFilters(candidateCols, x.AllConds)
where x.AllConds have DataSource column UIDs
7. Line 201 check: if _, ok := candidateCols[cols[0].UniqueID]; !ok
- This checks if DataSource UID is in candidateCols (projection output UIDs)
- They don't match → filter is dropped
CONCLUSION: The review's diagnosis is correct.
Outer-side projection silently disables partition pruning due to UID mismatch.
When the outer side has shape DataSource [/Selection] -> Projection -> IndexJoin, partition pruning fails silently. The root cause: candidateCols (populated from the projection's output schema) contains projection output UIDs, but when collectIndexJoinOuterStaticFilters recursively processes the DataSource child (line 182), it attempts to match DataSource column UIDs against projection output UIDs. Since these UIDs don't match, filterIndexJoinOuterStaticFilters drops all filters before substituteIndexJoinOuterFiltersThroughProjection can remap child UIDs to projection output UIDs through its substitution logic (lines 226–233).
The fix should either:
- Collect filters from the projection's child without pre-filtering by
candidateCols, then filter after substitution remaps child UIDs to projection output UIDs; or - Build a child-side
candidateColsby inverting the projection'sExprs -> Schema.Columnsmapping and pass that down recursively.
The same pattern likely applies to collectIndexJoinProbePartitionColumns (line 160) for LogicalProjection / LogicalAggregation cases, warranting consistent treatment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/planner/core/index_join_partition_pruning.go` around lines 181 - 191, The
projection case in collectIndexJoinOuterStaticFilters uses candidateCols
(projection output UIDs) to pre-filter when recursing into the projection child,
causing UID mismatches and dropped filters; fix by either (A) recursing into the
child without applying candidateCols (i.e., collect all child filters), then
call substituteIndexJoinOuterFiltersThroughProjection to remap child UIDs to
projection output UIDs and only then apply
filterIndexJoinOuterStaticFilters(candidateCols,...), or (B) build a child-side
candidateCols by inverting the projection's Exprs->Schema.Columns mapping (map
projection outputs back to source column UIDs) and pass that inverted
candidateCols into the recursive collect call; apply the same pattern to
collectIndexJoinProbePartitionColumns for LogicalProjection and
LogicalAggregation to keep behavior consistent.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
0xPoe
left a comment
There was a problem hiding this comment.
I just went through the test cases and tried to understand the issue and how it works.
This is just the first round of review; I haven't really delved into the changes yet. I will spend more time on this tomorrow.
| }) | ||
| } | ||
|
|
||
| func TestIndexJoinDerivesStaticPartitionPruningCondsFromEqJoinKey(t *testing.T) { |
There was a problem hiding this comment.
If I understand correctly, this is the case we already supported, right?
0xPoe
left a comment
There was a problem hiding this comment.
Thanks!
I just read through the pkg/planner/core/index_join_partition_pruning.go file. It looks good.
| // 1. finding the partition column on the inner subtree, | ||
| // 2. collecting outer-side static filters that can bound join predicates involving that column, | ||
| // 3. folding those bounds back into coarse predicates on the inner partition column. |
There was a problem hiding this comment.
Thanks for adding this comment. It might be worth using some simple examples to explain each case they refer to.
| switch x.FuncName.L { | ||
| case ast.DateAdd, ast.AddDate, ast.DateSub, ast.SubDate: | ||
| args := x.GetArgs() | ||
| if len(args) != 3 { |
There was a problem hiding this comment.
When would this be possible? If this is defensive code, we should consider using intest.Assert.
What problem does this PR solve?
Issue Number: ref #67440
Problem Summary:
For index join over a partitioned probe-side table, the planner only used the probe table's existing pruning conditions. When the join predicates implied a coarse static range on the probe partition key, the probe-side scan could still keep
partition:all.What changed and how does it work?
This change derives coarse static probe-side partition pruning conditions during index join planning and threads them into the probe scan task's partition pruning info.
As shown above, there are two issues here.
First, the probe side of IndexJoin did not carry the extra probe-side partition pruning information derived from the join itself. It still had its normal
PlanPartInfo, but that info only contained the DataSource's existingpruning conditions. As a result,
EXPLAINoften showedpartition:all.Second, the existing runtime pruning on the probe side could only use the dynamic lookup contents built from outer rows. Those lookup contents contain join keys only, not columns that appear only in
other conds. Therefore,for cases where the probe partition key is constrained only through
other conds, executor-side runtime pruning cannot narrow partitions and may still send RPCs to all candidate partitions.To address this, when building IndexJoin, we collect the inner child's partition key and derive additional coarse static pruning conditions from the outer side. These derived conditions are attached to the probe scan's
partition pruning info through
IndexJoinProp.We currently cover two cases:
If the partition key is also a join key, the join predicate is a simple equality, so the outer key's static conditions can be translated directly into pruning conditions on the probe partition key.
If the partition key appears only in
other conds, we first check whether the predicate is monotonic with respect to the outer join key. If it is, we derive a coarse range on the partition key by:The original join predicates are still preserved for correctness. The newly derived predicates are used only for earlier partition pruning.
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
Tests