Skip to content

planner, executor: include relevant opt vars in plan cache key#67836

Open
hawkingrei wants to merge 1 commit into
pingcap:masterfrom
hawkingrei:issue-64338-semi-join-rewrite-cache
Open

planner, executor: include relevant opt vars in plan cache key#67836
hawkingrei wants to merge 1 commit into
pingcap:masterfrom
hawkingrei:issue-64338-semi-join-rewrite-cache

Conversation

@hawkingrei
Copy link
Copy Markdown
Member

@hawkingrei hawkingrei commented Apr 16, 2026

What problem does this PR solve?

Issue Number: close #64338

Problem Summary:

Prepared plan cache can incorrectly reuse a cached plan after tidb_opt_enable_semi_join_rewrite
flips between ON and OFF. The optimizer already records this switch as relevant during plan
building, but the prepared plan cache key did not preserve and hash relevant optimizer vars/fixes
across the prepared statement boundary.

What changed and how does it work?

  • persist sorted relevant optimizer vars/fixes on PlanCacheStmt during
    GeneratePlanCacheStmtWithAST()
  • hash current session values for those persisted vars/fixes in
    newPlanCacheKeyWithMatchedBinding()
  • add a focused regression in testPreparePlanCache4DifferentSystemVars() covering
    tidb_opt_enable_semi_join_rewrite ON/OFF flips with prepared plan cache enabled for
    subqueries

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features
    • Enhanced query plan caching to account for optimizer-related session variables and fix controls, ensuring cached plans are more accurately reused based on relevant configuration settings for improved performance consistency.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 16, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ailinkid for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

This PR resolves a plan cache bug where optimizer-relevant session variables like tidb_opt_enable_semi_join_rewrite were not factored into the cache key, causing stale cached plans to be reused despite variable changes. The fix incorporates these variables and fix controls into cache key generation.

Changes

Cohort / File(s) Summary
Test Updates
pkg/executor/test/plancache/plan_cache_test.go
Extended test to verify plan cache correctly distinguishes between different values of tidb_opt_enable_semi_join_rewrite and tidb_enable_plan_cache_for_subquery. Asserts that cache status transitions correctly when toggling the semi-join rewrite variable between executions.
Plan Cache Key Generation
pkg/planner/core/plan_cache_utils.go
Added RelevantOptVarNames and RelevantOptFixIDs fields to PlanCacheStmt struct. Implemented collection and encoding of optimizer-relevant session variables and fix controls into the plan cache key via new helper functions collectRelevantOptVarsAndFixes, appendRelevantOptVarsAndFixes, and appendRelevantOptVarValue. Cache key now includes a delimited segment encoding each optimizer variable value and fix control ID to prevent cache collisions across variable state changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

  • #67535 — Modifies PlanCacheStmt fields and plan cache key construction logic to extend cache key with hint/binding information, following a similar pattern of enriching the cache key structure.
  • #67134 — Updates plan cache key generation to condition stats-version hash inclusion based on binding/skip-stats settings, addressing related cache-key determinism concerns.

Suggested labels

sig/planner, size/L, approved, lgtm

Suggested reviewers

  • AilinKid
  • terry1purcell
  • qw4990

Poem

🐰 A cache that forgot its rules so tight,
Now remembers variables, day and night!
Semi joins and inner joins in harmony,
When toggles flip—plans dance so free! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: including relevant optimizer variables in the plan cache key to address the semi-join rewrite cache issue.
Description check ✅ Passed The PR description follows the template structure with issue number, problem summary, changes explained, and test checklist completed.
Linked Issues check ✅ Passed The code changes directly address issue #64338 by persisting relevant optimizer variables in the cache key and hashing their values during cache lookup, with a regression test covering the semi-join rewrite toggle scenario.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the plan cache key to include relevant optimizer variables; no unrelated modifications detected in the files changed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/planner/core/plan_cache_utils.go (1)

879-881: Add doc comments on the new exported PlanCacheStmt fields.

Briefly document that these are the sorted set of optimizer vars/fix IDs that influenced planning for this statement and are hashed into the plan-cache key. As per coding guidelines: "Keep exported-symbol doc comments, and prefer semantic constraints over name restatement."

📝 Proposed doc
-	RelevantOptVarNames []string
-	RelevantOptFixIDs   []uint64
+	// RelevantOptVarNames and RelevantOptFixIDs are the sorted, deterministic
+	// sets of optimizer variables / fix-control IDs that influenced this plan
+	// during GeneratePlanCacheStmtWithAST. They are read-only after creation
+	// and are hashed (together with their current session values) into the
+	// plan-cache key so that toggling any of them invalidates cached plans.
+	RelevantOptVarNames []string
+	RelevantOptFixIDs   []uint64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cache_utils.go` around lines 879 - 881, Add concise doc
comments to the exported PlanCacheStmt fields RelevantOptVarNames and
RelevantOptFixIDs explaining they are the sorted sets of optimizer variable
names and optimizer fixed IDs that affected planning for this statement and that
these values are included (hashed) in the plan-cache key; use semantic wording
(e.g., "sorted set of optimizer variable names that influenced planning and are
hashed into the plan-cache key") rather than restating field names, and place
the comments immediately above the RelevantOptVarNames and RelevantOptFixIDs
declarations in the PlanCacheStmt type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/plan_cache_utils.go`:
- Around line 413-417: The current pre-allocation for per-fix budget (hashLen +=
len(stmt.RelevantOptFixIDs) * 64) can under-estimate when OptimizerFixControl
values are longer than ~54 bytes; change the sizing to compute the exact
additional bytes by iterating the relevant fix IDs (stmt.RelevantOptFixIDs) and
summing 8 (codec.EncodeUint) + 1 + len(vars.OptimizerFixControl[fixID]) + 1 for
each fix value, adding that sum to hashLen (instead of the fixed 64 multiplier),
and/or replace the test-only panic/asserion around intest.InTest with a
non-fatal warning log if preferred; update references in the same function that
builds the cache key (uses stmt.RelevantOptVarNames, stmt.RelevantOptFixIDs,
vars.OptimizerFixControl, and hashLen).
- Around line 577-636: The switch in appendRelevantOptVarValue includes
vardef.TiDBOptAlwaysKeepJoinKey but that option is never registered via
RecordRelevantOptVar; either remove the TiDBOptAlwaysKeepJoinKey case from
appendRelevantOptVarValue (dead code) or add a matching
RecordRelevantOptVar(vardef.TiDBOptAlwaysKeepJoinKey) call where other optimizer
vars are registered (place it alongside the other RecordRelevantOptVar(...)
calls in the planner initialization/registration code) so the variable is
actually recorded and the case is used.

---

Nitpick comments:
In `@pkg/planner/core/plan_cache_utils.go`:
- Around line 879-881: Add concise doc comments to the exported PlanCacheStmt
fields RelevantOptVarNames and RelevantOptFixIDs explaining they are the sorted
sets of optimizer variable names and optimizer fixed IDs that affected planning
for this statement and that these values are included (hashed) in the plan-cache
key; use semantic wording (e.g., "sorted set of optimizer variable names that
influenced planning and are hashed into the plan-cache key") rather than
restating field names, and place the comments immediately above the
RelevantOptVarNames and RelevantOptFixIDs declarations in the PlanCacheStmt
type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 367465da-abc7-4216-b8d9-b8da30240031

📥 Commits

Reviewing files that changed from the base of the PR and between 644ebc6 and 9a22c69.

📒 Files selected for processing (2)
  • pkg/executor/test/plancache/plan_cache_test.go
  • pkg/planner/core/plan_cache_utils.go

Comment on lines +413 to +417
if len(stmt.RelevantOptVarNames) > 0 || len(stmt.RelevantOptFixIDs) > 0 {
hashLen++
hashLen += len(stmt.RelevantOptVarNames) * 128
hashLen += len(stmt.RelevantOptFixIDs) * 64
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Per-fix hash budget may under-size the buffer for long OptimizerFixControl values.

Each fix contributes 8 (codec.EncodeUint) + 1 ('=') + len(fixVal) + 1 (';') bytes. fixVal is an arbitrary string from vars.OptimizerFixControl, so any user-set fix whose string value exceeds ~54 bytes will force a realloc and trip the intest.InTest assertion at Lines 526-530. Consider either computing the exact size (summing len(fixVal)s) or widening the per-fix allowance and/or softening the test-only panic to a warning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cache_utils.go` around lines 413 - 417, The current
pre-allocation for per-fix budget (hashLen += len(stmt.RelevantOptFixIDs) * 64)
can under-estimate when OptimizerFixControl values are longer than ~54 bytes;
change the sizing to compute the exact additional bytes by iterating the
relevant fix IDs (stmt.RelevantOptFixIDs) and summing 8 (codec.EncodeUint) + 1 +
len(vars.OptimizerFixControl[fixID]) + 1 for each fix value, adding that sum to
hashLen (instead of the fixed 64 multiplier), and/or replace the test-only
panic/asserion around intest.InTest with a non-fatal warning log if preferred;
update references in the same function that builds the cache key (uses
stmt.RelevantOptVarNames, stmt.RelevantOptFixIDs, vars.OptimizerFixControl, and
hashLen).

Comment on lines +577 to +636
func appendRelevantOptVarValue(hash []byte, vars *variable.SessionVars, varName string) []byte {
switch varName {
case vardef.TiDBOptIndexScanCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.IndexScanCostFactor))
case vardef.TiDBOptIndexReaderCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.IndexReaderCostFactor))
case vardef.TiDBOptTableReaderCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TableReaderCostFactor))
case vardef.TiDBOptTableFullScanCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TableFullScanCostFactor))
case vardef.TiDBOptTableRangeScanCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TableRangeScanCostFactor))
case vardef.TiDBOptTableRowIDScanCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TableRowIDScanCostFactor))
case vardef.TiDBOptTableTiFlashScanCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TableTiFlashScanCostFactor))
case vardef.TiDBOptIndexLookupCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.IndexLookupCostFactor))
case vardef.TiDBOptIndexMergeCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.IndexMergeCostFactor))
case vardef.TiDBOptSortCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.SortCostFactor))
case vardef.TiDBOptTopNCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.TopNCostFactor))
case vardef.TiDBOptLimitCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.LimitCostFactor))
case vardef.TiDBOptStreamAggCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.StreamAggCostFactor))
case vardef.TiDBOptHashAggCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.HashAggCostFactor))
case vardef.TiDBOptMergeJoinCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.MergeJoinCostFactor))
case vardef.TiDBOptHashJoinCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.HashJoinCostFactor))
case vardef.TiDBOptIndexJoinCostFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.IndexJoinCostFactor))
case vardef.TiDBOptOrderingIdxSelRatio:
return codec.EncodeUint(hash, math.Float64bits(vars.OptOrderingIdxSelRatio))
case vardef.TiDBOptRiskEqSkewRatio:
return codec.EncodeUint(hash, math.Float64bits(vars.RiskEqSkewRatio))
case vardef.TiDBOptRiskRangeSkewRatio:
return codec.EncodeUint(hash, math.Float64bits(vars.RiskRangeSkewRatio))
case vardef.TiDBOptRiskGroupNDVSkewRatio:
return codec.EncodeUint(hash, math.Float64bits(vars.RiskGroupNDVSkewRatio))
case vardef.TiDBOptCartesianJoinOrderThreshold:
return codec.EncodeUint(hash, math.Float64bits(vars.CartesianJoinOrderThreshold))
case vardef.TiDBOptSelectivityFactor:
return codec.EncodeUint(hash, math.Float64bits(vars.SelectivityFactor))
case vardef.TiDBOptPreferRangeScan:
return append(hash, bool2Byte(vars.GetAllowPreferRangeScan()))
case vardef.TiDBOptEnableNoDecorrelateInSelect:
return append(hash, bool2Byte(vars.EnableNoDecorrelateInSelect))
case vardef.TiDBOptEnableSemiJoinRewrite:
return append(hash, bool2Byte(vars.EnableSemiJoinRewrite))
case vardef.TiDBOptAlwaysKeepJoinKey:
return append(hash, bool2Byte(vars.AlwaysKeepJoinKey))
default:
return hash
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Enumerate all RecordRelevantOptVar(...) call sites and the constant/name argument used.
rg -nP --type=go -C1 '\bRecordRelevantOptVar\s*\(' 
# Cross-reference with the case labels in appendRelevantOptVarValue
ast-grep --pattern $'func appendRelevantOptVarValue($_, $_, $_) []byte {
  $$$
}'

Repository: pingcap/tidb

Length of output: 16784


Remove the unused TiDBOptAlwaysKeepJoinKey case or add its corresponding RecordRelevantOptVar call.

The switch in appendRelevantOptVarValue handles TiDBOptAlwaysKeepJoinKey (line 631–632) but this variable is never recorded anywhere in the codebase via RecordRelevantOptVar(). Either remove the dead case or add the missing recording call in the planner where this variable is checked. All other recorded optimizer variables have matching cases in this function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cache_utils.go` around lines 577 - 636, The switch in
appendRelevantOptVarValue includes vardef.TiDBOptAlwaysKeepJoinKey but that
option is never registered via RecordRelevantOptVar; either remove the
TiDBOptAlwaysKeepJoinKey case from appendRelevantOptVarValue (dead code) or add
a matching RecordRelevantOptVar(vardef.TiDBOptAlwaysKeepJoinKey) call where
other optimizer vars are registered (place it alongside the other
RecordRelevantOptVar(...) calls in the planner initialization/registration code)
so the variable is actually recorded and the case is used.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 76.00000% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.1084%. Comparing base (644ebc6) to head (9a22c69).
⚠️ Report is 74 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67836        +/-   ##
================================================
- Coverage   77.5976%   77.1084%   -0.4893%     
================================================
  Files          1982       1965        -17     
  Lines        548894     549006       +112     
================================================
- Hits         425929     423330      -2599     
- Misses       122160     125674      +3514     
+ Partials        805          2       -803     
Flag Coverage Δ
integration 41.0485% <42.0000%> (+6.7084%) ⬆️
unit 76.3103% <76.0000%> (-0.0303%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.4164% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9964% <ø> (-10.5280%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hawkingrei hawkingrei added the AI-Correction Bugfix by AI label Apr 17, 2026
@hawkingrei
Copy link
Copy Markdown
Member Author

/retest

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 18, 2026

@hawkingrei: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen 9a22c69 link true /test pull-unit-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-Correction Bugfix by AI do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tidb_opt_enable_semi_join_rewrite is ignored once the rewritten plan gets cached

1 participant