ddl-test by hongyunyan · Pull Request #4906 · pingcap/ticdc

hongyunyan · 2026-04-24T09:25:51Z

Reset syncpoint WAITING bookkeeping when a skipped syncpoint is shortcut into the pass phase so stale barrier coverage does not linger.

Reconcile forwarded dispatchers for skipped syncpoints during resend and add focused tests for the shortcut and cleanup paths.

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved handling of replication checkpoint synchronization and dispatcher state management during forwarding operations
- Enhanced reconciliation logic for distributed dispatcher coordination
Tests
- Added test coverage for synchronization point reconciliation scenarios

Reset syncpoint WAITING bookkeeping when a skipped syncpoint is shortcut into the pass phase so stale barrier coverage does not linger. Reconcile forwarded dispatchers for skipped syncpoints during resend and add focused tests for the shortcut and cleanup paths.

ti-chi-bot · 2026-04-24T09:25:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wk989898 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-24T09:26:04Z

📝 Walkthrough

Walkthrough

Introduces syncpoint-specific forward-reconciliation logic with lastForwardReconcileTime tracking and helper methods. Updates checkBlockedDispatchers() to take a shortcut path when forwarded checkpointTs is detected for syncpoints. Adds rate-limited periodic reconciliation (10s) in resend() to mark dispatchers done and advance events through PASS/DONE phases.

Changes

Cohort / File(s)	Summary
Forward-Reconciliation Implementation `maintainer/barrier_event.go`	Adds `lastForwardReconcileTime` field, new helper methods for lazy range coverage, WAITING bookkeeping reset, and dispatcher marking. Updates `checkBlockedDispatchers()` to provide syncpoint shortcut when forwarded checkpointTs detected. Modifies `resend()` for rate-limited periodic reconciliation of forwarded dispatchers.
Syncpoint Shortcut Behavior Tests `maintainer/barrier_event_test.go`, `maintainer/barrier_test.go`	New test `TestShortcutSyncPointToPassPhaseResetsWaitingCoverage()` validates syncpoint shortcut behavior and waiting-coverage reset. New test `TestSkippedSyncPointEventIsRemovedByReconcile()` validates reconciliation behavior when syncpoint is skipped and event transitions through pass/done phases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

maintainer,dispatcher: remove flush from the action and flush all enqueued dml events before report to maintainer #4389: Modifies maintainer/barrier_event.go with changes to resend(), checkBlockedDispatchers(), and event-state handling for writer/flush/syncpoint progression—directly overlapping functionality.

Suggested labels

lgtm, approved, size/L

Suggested reviewers

3AceShowHand
wk989898
lidezhu

Poem

🐰 Forward-reconciling through the barrier gates so bright,
Where syncpoints take their shortcut path and dispatchers find their light,
Ten seconds ticking, bookkeeping reset with care,
From WAITING to DONE, the state machine's everywhere! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description includes the initial summary of changes but lacks completion of required template sections including specific issue number, detailed explanation of changes, answers to compatibility questions, and release note content.	Complete the required template sections: provide a specific issue number (replace 'close `#xxx`'), explain what changed in detail, answer the compatibility and documentation questions, and provide release note content or explicitly mark as 'None'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'ddl-test' is vague and does not clearly describe the main changes made in the pull request, which involve syncpoint WAITING bookkeeping reset and forward reconciliation.	Replace the title with a more descriptive name that reflects the primary change, such as 'Reset syncpoint WAITING bookkeeping on shortcut to pass phase' or 'Add syncpoint forward reconciliation during resend'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to shortcut syncpoint events in the barrier logic when dispatchers have already advanced past the barrier's commit timestamp. It adds state tracking for reconciliation timing and helper methods to identify relevant replications and reset event coverage. The review feedback highlights opportunities to reduce code duplication in the dispatcher check logic and to replace a magic number with a named constant for the reconciliation interval.

gemini-code-assist · 2026-04-24T09:29:22Z

@@ -498,54 +586,89 @@ func (be *BarrierEvent) checkBlockedDispatchers() {
 			replications := be.spanController.GetTasksByTableID(tableId)
 			for _, replication := range replications {
 				if forwardBarrierEvent(replication, be) {


The logic inside this if block is almost identical to the ones in the InfluenceType_DB and InfluenceType_All cases below. This duplication makes the code harder to read and maintain.

Consider extracting this logic into a private helper method to improve code clarity and reduce redundancy. The helper method could take the replication and any case-specific logging fields as arguments.

For example, you could introduce a method like handleForwardedDispatcher:

func (be *BarrierEvent) handleForwardedDispatcher(replication *replica.SpanReplication, extraLogFields ...zap.Field) { if be.isSyncPoint { be.shortcutSyncPointToPassPhase() fields := []zap.Field{ zap.String("changefeed", be.cfID.Name()), zap.Uint64("commitTs", be.commitTs), zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs), zap.String("dispatcher", replication.ID.String()), zap.Int64("mode", be.mode), } fields = append(fields, extraLogFields...) log.Info("one related dispatcher has forward checkpointTs, shortcut syncpoint to pass phase", fields...) } else { be.selected.Store(true) be.writerDispatcherAdvanced = true fields := []zap.Field{ zap.String("changefeed", be.cfID.Name()), zap.Uint64("commitTs", be.commitTs), zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs), zap.String("dispatcher", replication.ID.String()), zap.Int64("mode", be.mode), } fields = append(fields, extraLogFields...) log.Info("one related dispatcher has forward checkpointTs, means the block event can be advanced", fields...) } }

Then you can simplify this block and the others to a single call, for example:
be.handleForwardedDispatcher(replication, zap.Int64("tableId", tableId))

gemini-code-assist · 2026-04-24T09:29:22Z


 		msgs = []*messaging.TargetMessage{be.newWriterActionMessage(stm.GetNodeID(), mode)}
 	} else {
+		if be.isSyncPoint && time.Since(be.lastForwardReconcileTime) > time.Second*10 {


The duration time.Second*10 is a magic number. It's better to define it as a named constant to improve readability and maintainability. This makes it easier to understand the purpose of the value and to change it in one place if needed.

Consider defining a constant at the package level, for example:

const forwardReconcileInterval = 10 * time.Second

And then use it here.

ti-chi-bot · 2026-04-24T09:35:59Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

coderabbitai

🧹 Nitpick comments (2)

maintainer/barrier_event.go (2)
582-675: Consider collapsing the triplicated forward-detected branches.

checkBlockedDispatchers now has three structurally identical case blocks (Normal / DB / All), each with the same if be.isSyncPoint { shortcut } else { advance } switch and near-duplicate log lines. The body just differs in the relevant replication list and a couple of extra log fields. Consider extracting something like:
♻️ Suggested helper
func (be *BarrierEvent) onForwardedReplicationDetected(replication *replica.SpanReplication, extra ...zap.Field) {
	base := []zap.Field{
		zap.String("changefeed", be.cfID.Name()),
		zap.Uint64("commitTs", be.commitTs),
		zap.Uint64("checkpointTs", replication.GetStatus().CheckpointTs),
		zap.String("dispatcher", replication.ID.String()),
		zap.Int64("mode", be.mode),
	}
	if be.isSyncPoint {
		be.shortcutSyncPointToPassPhase()
		log.Info("one related dispatcher has forwarded checkpointTs, shortcut syncpoint to pass phase", append(base, extra...)...)
		return
	}
	be.selected.Store(true)
	be.writerDispatcherAdvanced = true
	log.Info("one related dispatcher has forwarded checkpointTs, block event can be advanced", append(base, extra...)...)
}
Then each case just finds the first forwarded replication and calls the helper. Reduces the risk of future drift between the three branches (e.g. adding a new log field in one but forgetting the others). Keeping it optional since the current structure is still correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/barrier_event.go` around lines 582 - 675, The three case branches
in checkBlockedDispatchers duplicate the same "found forwarded replication"
logic; extract that logic into a helper (e.g.
BarrierEvent.onForwardedReplicationDetected(replication
*replica.SpanReplication, extra ...zap.Field)) and call it from each case after
forwardBarrierEvent returns true. The helper should inspect be.isSyncPoint and
either call be.shortcutSyncPointToPassPhase() and log, or set
be.selected.Store(true) and be.writerDispatcherAdvanced = true and log, merging
common zap fields (changefeed, commitTs, checkpointTs, dispatcher, mode) and
appending case-specific fields (tableId/schemaID) before returning. Update calls
in checkBlockedDispatchers to use spanController.GetTasksByTableID,
GetTasksBySchemaID, and GetAllTasks as before and invoke the new helper when
forwardBarrierEvent(replication, be) is true.
243-261: Minor: short-circuit path doesn't filter nils.

dedupReplications filters nil entries only when len(replications) > 1. If any caller ever returns a single-element slice containing nil, it would flow through to reconcileForwardedDispatchers and NPE inside forwardBarrierEvent on replication.GetStatus(). Today none of the collectRelevantReplications paths can actually produce that, but the short-circuit is a subtle landmine if a caller ever changes. Cheap to harden:
🛡️ Suggested tweak
 func dedupReplications(replications []*replica.SpanReplication) []*replica.SpanReplication {
-	if len(replications) <= 1 {
-		return replications
-	}
-
 	seen := make(map[common.DispatcherID]struct{}, len(replications))
 	result := make([]*replica.SpanReplication, 0, len(replications))
 	for _, replication := range replications {
(Drop the short-circuit so nil-filtering always runs; the allocations are trivial at this call frequency.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/barrier_event.go` around lines 243 - 261, Remove the early-return
short-circuit in dedupReplications so nil entries are always filtered; instead
always iterate over the input slice, skip nil replication entries, and perform
the dedupe via the seen map (keep the existing seen map and result append
logic). This hardens dedupReplications (referenced by
reconcileForwardedDispatchers / forwardBarrierEvent) to avoid a single-element
[nil] passing through and causing an NPE on replication.GetStatus().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@maintainer/barrier_event.go`:
- Around line 582-675: The three case branches in checkBlockedDispatchers
duplicate the same "found forwarded replication" logic; extract that logic into
a helper (e.g. BarrierEvent.onForwardedReplicationDetected(replication
*replica.SpanReplication, extra ...zap.Field)) and call it from each case after
forwardBarrierEvent returns true. The helper should inspect be.isSyncPoint and
either call be.shortcutSyncPointToPassPhase() and log, or set
be.selected.Store(true) and be.writerDispatcherAdvanced = true and log, merging
common zap fields (changefeed, commitTs, checkpointTs, dispatcher, mode) and
appending case-specific fields (tableId/schemaID) before returning. Update calls
in checkBlockedDispatchers to use spanController.GetTasksByTableID,
GetTasksBySchemaID, and GetAllTasks as before and invoke the new helper when
forwardBarrierEvent(replication, be) is true.
- Around line 243-261: Remove the early-return short-circuit in
dedupReplications so nil entries are always filtered; instead always iterate
over the input slice, skip nil replication entries, and perform the dedupe via
the seen map (keep the existing seen map and result append logic). This hardens
dedupReplications (referenced by reconcileForwardedDispatchers /
forwardBarrierEvent) to avoid a single-element [nil] passing through and causing
an NPE on replication.GetStatus().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58a8a181-1960-4b91-a2a2-66eb14deb64d

📥 Commits

Reviewing files that changed from the base of the PR and between 1cc0b28 and 6323303.

📒 Files selected for processing (3)

maintainer/barrier_event.go
maintainer/barrier_event_test.go
maintainer/barrier_test.go

ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 24, 2026

ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2026

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddl-test#4906

ddl-test#4906
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:ddl-0424

hongyunyan commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hongyunyan commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Release Notes

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hongyunyan commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading