Skip to content

eventservice: decouple dispatcher ready/handshake from scan triggering and activate the control plane immediately after register/reset#4840

Open
zier-one wants to merge 1 commit intopingcap:masterfrom
zier-one:20260417-speedup-2
Open

eventservice: decouple dispatcher ready/handshake from scan triggering and activate the control plane immediately after register/reset#4840
zier-one wants to merge 1 commit intopingcap:masterfrom
zier-one:20260417-speedup-2

Conversation

@zier-one
Copy link
Copy Markdown
Contributor

@zier-one zier-one commented Apr 16, 2026

What problem does this PR solve?

Issue Number: close #4873

This PR fixes a control-plane timing issue in the event service. For normal dispatchers, ReadyEvent after registration and HandshakeEvent after reset were still dependent on a later resolved-ts notification.

What is changed and how it works?

This PR extracts dispatcher control-plane activation into activateDispatcherControlPlane(), which is responsible for:

  • sending ReadyEvent for dispatchers in epoch == 0;
  • sending HandshakeEvent for dispatchers that are ready but not handshaked yet.

The broker now calls this helper proactively at the key transition points instead of waiting for the next scan/notifier trigger:

  • immediately after a normal dispatcher is registered successfully;
  • immediately after a reset installs the new epoch;
  • from the table-trigger tick path as the shared control-plane helper, while preserving existing behavior.

scanReady() still uses the same helper, so the existing scan path remains intact. The change only removes the hidden coupling that required another resolved-ts advance before the dispatcher could enter ready/handshake.

The PR also adds regression tests covering immediate ready after registration, immediate handshake after reset, and unchanged table-trigger registration behavior.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.


Fix a stall where a dispatcher could remain stuck after registration or reset when no new upstream resolved-ts arrived. The event service now emits the required ready/handshake control-plane events immediately instead of waiting for a later scan notification.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed error handling during dispatcher initialization to ensure proper cleanup when registration fails.
    • Improved reliability of event sequencing during dispatcher lifecycle operations.
  • Tests

    • Expanded test coverage for event emission timing and dispatcher state transitions.
    • Hardened tests for edge cases in schema synchronization failures.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 16, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The event broker decouples dispatcher ready/handshake event progression from scan-task range computation by introducing an activateDispatcherControlPlane helper. This helper is now called proactively during dispatcher creation and reset, and in the scan readiness path, enabling immediate event emission rather than waiting for upstream resolved-ts advancement.

Changes

Cohort / File(s) Summary
Dispatcher Control Plane Activation
pkg/eventservice/event_broker.go
Introduces activateDispatcherControlPlane helper method to handle ready/handshake progression independently of scan-task computation. Refactors ticker loop to call this helper before DDL fetching; modifies scanReady to advance control plane before scan-range eligibility checks; adds proactive activation in addDispatcher and resetDispatcher paths. Persists resolved-ts from schema-store into stat.
Lifecycle Event Tests
pkg/eventservice/event_broker_test.go
Adds collectWrapEventTypes helper for draining wrap-event channels. New test cases validate immediate ReadyEvent emission on dispatcher registration, immediate HandshakeEvent on dispatcher reset with epoch switch, and absence of immediate ReadyEvent for table-trigger dispatchers. Hardens inactive-dispatcher heartbeat test and strengthens schema-store failure test to verify dispatcher rollback and no queued events on registration failure.

Sequence Diagram(s)

sequenceDiagram
    participant Broker as Event Broker
    participant DispMgr as Dispatcher Mgr
    participant ControlPlane as Control Plane
    participant EventCollector as Event Collector

    Note over Broker,EventCollector: addDispatcher (New Flow)
    Broker->>DispMgr: addDispatcher(task)
    DispMgr->>ControlPlane: activateDispatcherControlPlane()
    ControlPlane->>EventCollector: emit ReadyEvent
    Note over DispMgr: Ready immediately

    Note over Broker,EventCollector: resetDispatcher (New Flow)
    Broker->>DispMgr: resetDispatcher(task)
    DispMgr->>ControlPlane: activateDispatcherControlPlane()
    ControlPlane->>EventCollector: emit HandshakeEvent
    Note over DispMgr: Handshake immediately

    Note over Broker,EventCollector: scanReady Path (Refactored)
    Broker->>DispMgr: scanReady(task)
    DispMgr->>ControlPlane: activateDispatcherControlPlane()
    ControlPlane->>EventCollector: emit ReadyEvent (if needed)
    DispMgr->>Broker: return eligibility after control-plane advance
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • asddongmen
  • wk989898
  • 3AceShowHand

Poem

🐰 A dispatcher awaits no more,
Control plane hops right to the door,
Ready and Handshake, swift they fly,
No need to wait for time to sigh—
Activation brings them promptly near! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: decoupling dispatcher ready/handshake from scan triggering and activating the control plane immediately after register/reset, which aligns with the changeset's core objective.
Description check ✅ Passed The PR description follows the template with clear issue reference (close #4873), detailed explanation of problem and solution, test validation, and release notes addressing the fix.
Linked Issues check ✅ Passed The code changes fully address issue #4873 by extracting activateDispatcherControlPlane() helper and calling it proactively at registration and reset points, eliminating the coupling to resolved-ts notifications for ready/handshake emission.
Out of Scope Changes check ✅ Passed All changes are scoped to the dispatcher control-plane activation objective: the new activateDispatcherControlPlane() helper, its integration into broker paths, and supporting test updates for verifying the timing fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Apr 16, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

Hi @zier-one. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 16, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the event broker to decouple dispatcher control plane activation from scan task generation by introducing the activateDispatcherControlPlane method. This ensures that ready and handshake signals are sent immediately upon dispatcher registration or reset. A potential issue was identified in addDispatcher where a handshake event might be sent without necessary table metadata if the dispatcher is initialized with an epoch greater than zero, as the table information is not yet populated at that stage.

}
c.dispatchers.Store(id, dispatcherPtr)
c.metricsCollector.metricDispatcherCount.Inc()
c.activateDispatcherControlPlane(dispatcher)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling activateDispatcherControlPlane here for a dispatcher with epoch > 0 will trigger a HandshakeEvent. However, dispatcher was initialized with nil for startTableInfo at line 1349. This means the handshake event will be sent without table information, which might cause issues for the receiver if it expects metadata for a newly registered table. Consider if addDispatcher should fetch table info when epoch > 0, similar to how resetDispatcher does at line 1531.

@zier-one zier-one changed the title WIP eventservice: decouple dispatcher ready/handshake from scan triggering and activate the control plane immediately after register/reset Apr 23, 2026
@zier-one zier-one force-pushed the 20260417-speedup-2 branch from d4a649b to ed419ca Compare April 23, 2026 14:40
@zier-one zier-one marked this pull request as ready for review April 23, 2026 15:22
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/eventservice/event_broker_test.go (1)

681-699: ⚠️ Potential issue | 🔴 Critical

Filter the new ReadyEvent before asserting the batch resolved event.

addDispatcher can now emit a ReadyEvent, so outputCh no longer guarantees the next message is TypeBatchResolvedTs; CI is already failing with actual type 3. Use the same filtering pattern as the heartbeat test. As per coding guidelines, **/*_test.go: favor deterministic tests and use testify/require.

Proposed fix
-	msg := <-outputCh
-	require.Equal(t, msg.Type, messaging.TypeBatchResolvedTs)
+	deadline, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+	for {
+		select {
+		case msg := <-outputCh:
+			if msg.Type != messaging.TypeBatchResolvedTs {
+				continue
+			}
+			return
+		case <-deadline.Done():
+			require.Fail(t, "expected batch resolved event")
+			return
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/eventservice/event_broker_test.go` around lines 681 - 699, The test
currently assumes the next message from outputCh is a BatchResolvedTs but
addDispatcher may emit a ReadyEvent first, so change the assertion to drain
messages from outputCh until you encounter messaging.TypeBatchResolvedTs: after
invoking broker.handleResolvedTs, loop receiving msg := <-outputCh and
ignore/discard messages of the ReadyEvent/type (check msg.Type or msg.GetType()
as in the heartbeat test) using require.NotNil, and only then
require.Equal(msg.Type, messaging.TypeBatchResolvedTs); keep references to
broker.addDispatcher, broker.getDispatcher, broker.handleResolvedTs, outputCh,
and messaging.TypeBatchResolvedTs to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/eventservice/event_broker_test.go`:
- Around line 681-699: The test currently assumes the next message from outputCh
is a BatchResolvedTs but addDispatcher may emit a ReadyEvent first, so change
the assertion to drain messages from outputCh until you encounter
messaging.TypeBatchResolvedTs: after invoking broker.handleResolvedTs, loop
receiving msg := <-outputCh and ignore/discard messages of the ReadyEvent/type
(check msg.Type or msg.GetType() as in the heartbeat test) using require.NotNil,
and only then require.Equal(msg.Type, messaging.TypeBatchResolvedTs); keep
references to broker.addDispatcher, broker.getDispatcher,
broker.handleResolvedTs, outputCh, and messaging.TypeBatchResolvedTs to locate
the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ae110ec-5dd9-4ed2-ac2e-b68f14919a8b

📥 Commits

Reviewing files that changed from the base of the PR and between 94cd0a7 and ed419ca.

📒 Files selected for processing (2)
  • pkg/eventservice/event_broker.go
  • pkg/eventservice/event_broker_test.go

@asddongmen asddongmen self-assigned this Apr 24, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator

lidezhu commented Apr 24, 2026

  1. The Ready event is used to indicate that the incremental scan has finished. We deliberately use it to avoid having too many incremental scan tasks running at the same time.
  2. It is OK to send the Handshake event immediately after a reset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/needs-triage-completed first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

3 participants