Skip to content

ddl-test#4906

Open
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:ddl-0424
Open

ddl-test#4906
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:ddl-0424

Conversation

@hongyunyan
Copy link
Copy Markdown
Collaborator

@hongyunyan hongyunyan commented Apr 24, 2026

Reset syncpoint WAITING bookkeeping when a skipped syncpoint is shortcut into the pass phase so stale barrier coverage does not linger.

Reconcile forwarded dispatchers for skipped syncpoints during resend and add focused tests for the shortcut and cleanup paths.

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved handling of replication checkpoint synchronization and dispatcher state management during forwarding operations
    • Enhanced reconciliation logic for distributed dispatcher coordination
  • Tests

    • Added test coverage for synchronization point reconciliation scenarios

Reset syncpoint WAITING bookkeeping when a skipped syncpoint is shortcut into the pass phase so stale barrier coverage does not linger.

Reconcile forwarded dispatchers for skipped syncpoints during resend and add focused tests for the shortcut and cleanup paths.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wk989898 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Introduces syncpoint-specific forward-reconciliation logic with lastForwardReconcileTime tracking and helper methods. Updates checkBlockedDispatchers() to take a shortcut path when forwarded checkpointTs is detected for syncpoints. Adds rate-limited periodic reconciliation (10s) in resend() to mark dispatchers done and advance events through PASS/DONE phases.

Changes

Cohort / File(s) Summary
Forward-Reconciliation Implementation
maintainer/barrier_event.go
Adds lastForwardReconcileTime field, new helper methods for lazy range coverage, WAITING bookkeeping reset, and dispatcher marking. Updates checkBlockedDispatchers() to provide syncpoint shortcut when forwarded checkpointTs detected. Modifies resend() for rate-limited periodic reconciliation of forwarded dispatchers.
Syncpoint Shortcut Behavior Tests
maintainer/barrier_event_test.go, maintainer/barrier_test.go
New test TestShortcutSyncPointToPassPhaseResetsWaitingCoverage() validates syncpoint shortcut behavior and waiting-coverage reset. New test TestSkippedSyncPointEventIsRemovedByReconcile() validates reconciliation behavior when syncpoint is skipped and event transitions through pass/done phases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

lgtm, approved, size/L

Suggested reviewers

  • 3AceShowHand
  • wk989898
  • lidezhu

Poem

🐰 Forward-reconciling through the barrier gates so bright,
Where syncpoints take their shortcut path and dispatchers find their light,
Ten seconds ticking, bookkeeping reset with care,
From WAITING to DONE, the state machine's everywhere! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes the initial summary of changes but lacks completion of required template sections including specific issue number, detailed explanation of changes, answers to compatibility questions, and release note content. Complete the required template sections: provide a specific issue number (replace 'close #xxx'), explain what changed in detail, answer the compatibility and documentation questions, and provide release note content or explicitly mark as 'None'.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'ddl-test' is vague and does not clearly describe the main changes made in the pull request, which involve syncpoint WAITING bookkeeping reset and forward reconciliation. Replace the title with a more descriptive name that reflects the primary change, such as 'Reset syncpoint WAITING bookkeeping on shortcut to pass phase' or 'Add syncpoint forward reconciliation during resend'.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to shortcut syncpoint events in the barrier logic when dispatchers have already advanced past the barrier's commit timestamp. It adds state tracking for reconciliation timing and helper methods to identify relevant replications and reset event coverage. The review feedback highlights opportunities to reduce code duplication in the dispatcher check logic and to replace a magic number with a named constant for the reconciliation interval.

@@ -498,54 +586,89 @@ func (be *BarrierEvent) checkBlockedDispatchers() {
replications := be.spanController.GetTasksByTableID(tableId)
for _, replication := range replications {
if forwardBarrierEvent(replication, be) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic inside this if block is almost identical to the ones in the InfluenceType_DB and InfluenceType_All cases below. This duplication makes the code harder to read and maintain.

Consider extracting this logic into a private helper method to improve code clarity and reduce redundancy. The helper method could take the replication and any case-specific logging fields as arguments.

For example, you could introduce a method like handleForwardedDispatcher:

func (be *BarrierEvent) handleForwardedDispatcher(replication *replica.SpanReplication, extraLogFields ...zap.Field) {
	if be.isSyncPoint {
		be.shortcutSyncPointToPassPhase()
		fields := []zap.Field{
			zap.String("changefeed", be.cfID.Name()),
			zap.Uint64("commitTs", be.commitTs),
			zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs),
			zap.String("dispatcher", replication.ID.String()),
			zap.Int64("mode", be.mode),
		}
		fields = append(fields, extraLogFields...)
		log.Info("one related dispatcher has forward checkpointTs, shortcut syncpoint to pass phase", fields...)
	} else {
		be.selected.Store(true)
		be.writerDispatcherAdvanced = true
		fields := []zap.Field{
			zap.String("changefeed", be.cfID.Name()),
			zap.Uint64("commitTs", be.commitTs),
			zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs),
			zap.String("dispatcher", replication.ID.String()),
			zap.Int64("mode", be.mode),
		}
		fields = append(fields, extraLogFields...)
		log.Info("one related dispatcher has forward checkpointTs, means the block event can be advanced", fields...)
	}
}

Then you can simplify this block and the others to a single call, for example:
be.handleForwardedDispatcher(replication, zap.Int64("tableId", tableId))


msgs = []*messaging.TargetMessage{be.newWriterActionMessage(stm.GetNodeID(), mode)}
} else {
if be.isSyncPoint && time.Since(be.lastForwardReconcileTime) > time.Second*10 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The duration time.Second*10 is a magic number. It's better to define it as a named constant to improve readability and maintainability. This makes it easier to understand the purpose of the value and to change it in one place if needed.

Consider defining a constant at the package level, for example:

const forwardReconcileInterval = 10 * time.Second

And then use it here.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
maintainer/barrier_event.go (2)

582-675: Consider collapsing the triplicated forward-detected branches.

checkBlockedDispatchers now has three structurally identical case blocks (Normal / DB / All), each with the same if be.isSyncPoint { shortcut } else { advance } switch and near-duplicate log lines. The body just differs in the relevant replication list and a couple of extra log fields. Consider extracting something like:

♻️ Suggested helper
func (be *BarrierEvent) onForwardedReplicationDetected(replication *replica.SpanReplication, extra ...zap.Field) {
	base := []zap.Field{
		zap.String("changefeed", be.cfID.Name()),
		zap.Uint64("commitTs", be.commitTs),
		zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs),
		zap.String("dispatcher", replication.ID.String()),
		zap.Int64("mode", be.mode),
	}
	if be.isSyncPoint {
		be.shortcutSyncPointToPassPhase()
		log.Info("one related dispatcher has forwarded checkpointTs, shortcut syncpoint to pass phase", append(base, extra...)...)
		return
	}
	be.selected.Store(true)
	be.writerDispatcherAdvanced = true
	log.Info("one related dispatcher has forwarded checkpointTs, block event can be advanced", append(base, extra...)...)
}

Then each case just finds the first forwarded replication and calls the helper. Reduces the risk of future drift between the three branches (e.g. adding a new log field in one but forgetting the others). Keeping it optional since the current structure is still correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/barrier_event.go` around lines 582 - 675, The three case branches
in checkBlockedDispatchers duplicate the same "found forwarded replication"
logic; extract that logic into a helper (e.g.
BarrierEvent.onForwardedReplicationDetected(replication
*replica.SpanReplication, extra ...zap.Field)) and call it from each case after
forwardBarrierEvent returns true. The helper should inspect be.isSyncPoint and
either call be.shortcutSyncPointToPassPhase() and log, or set
be.selected.Store(true) and be.writerDispatcherAdvanced = true and log, merging
common zap fields (changefeed, commitTs, checkpointTs, dispatcher, mode) and
appending case-specific fields (tableId/schemaID) before returning. Update calls
in checkBlockedDispatchers to use spanController.GetTasksByTableID,
GetTasksBySchemaID, and GetAllTasks as before and invoke the new helper when
forwardBarrierEvent(replication, be) is true.

243-261: Minor: short-circuit path doesn't filter nils.

dedupReplications filters nil entries only when len(replications) > 1. If any caller ever returns a single-element slice containing nil, it would flow through to reconcileForwardedDispatchers and NPE inside forwardBarrierEvent on replication.GetStatus(). Today none of the collectRelevantReplications paths can actually produce that, but the short-circuit is a subtle landmine if a caller ever changes. Cheap to harden:

🛡️ Suggested tweak
 func dedupReplications(replications []*replica.SpanReplication) []*replica.SpanReplication {
-	if len(replications) <= 1 {
-		return replications
-	}
-
 	seen := make(map[common.DispatcherID]struct{}, len(replications))
 	result := make([]*replica.SpanReplication, 0, len(replications))
 	for _, replication := range replications {

(Drop the short-circuit so nil-filtering always runs; the allocations are trivial at this call frequency.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/barrier_event.go` around lines 243 - 261, Remove the early-return
short-circuit in dedupReplications so nil entries are always filtered; instead
always iterate over the input slice, skip nil replication entries, and perform
the dedupe via the seen map (keep the existing seen map and result append
logic). This hardens dedupReplications (referenced by
reconcileForwardedDispatchers / forwardBarrierEvent) to avoid a single-element
[nil] passing through and causing an NPE on replication.GetStatus().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@maintainer/barrier_event.go`:
- Around line 582-675: The three case branches in checkBlockedDispatchers
duplicate the same "found forwarded replication" logic; extract that logic into
a helper (e.g. BarrierEvent.onForwardedReplicationDetected(replication
*replica.SpanReplication, extra ...zap.Field)) and call it from each case after
forwardBarrierEvent returns true. The helper should inspect be.isSyncPoint and
either call be.shortcutSyncPointToPassPhase() and log, or set
be.selected.Store(true) and be.writerDispatcherAdvanced = true and log, merging
common zap fields (changefeed, commitTs, checkpointTs, dispatcher, mode) and
appending case-specific fields (tableId/schemaID) before returning. Update calls
in checkBlockedDispatchers to use spanController.GetTasksByTableID,
GetTasksBySchemaID, and GetAllTasks as before and invoke the new helper when
forwardBarrierEvent(replication, be) is true.
- Around line 243-261: Remove the early-return short-circuit in
dedupReplications so nil entries are always filtered; instead always iterate
over the input slice, skip nil replication entries, and perform the dedupe via
the seen map (keep the existing seen map and result append logic). This hardens
dedupReplications (referenced by reconcileForwardedDispatchers /
forwardBarrierEvent) to avoid a single-element [nil] passing through and causing
an NPE on replication.GetStatus().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58a8a181-1960-4b91-a2a2-66eb14deb64d

📥 Commits

Reviewing files that changed from the base of the PR and between 1cc0b28 and 6323303.

📒 Files selected for processing (3)
  • maintainer/barrier_event.go
  • maintainer/barrier_event_test.go
  • maintainer/barrier_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant