eventseivice: improve scan window by asddongmen · Pull Request #4901 · pingcap/ticdc

asddongmen · 2026-04-24T05:45:29Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Before

After

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot · 2026-04-24T05:45:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ti-chi-bot · 2026-04-24T05:45:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sdojjy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-24T05:45:37Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ddf074be-721a-4a64-b57d-796c97ecb2a5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request replaces the legacy scan interval adjustment logic with a new adaptiveScanWindowController that utilizes dual Exponential Moving Averages (EMA) and a pressure score mechanism for more robust congestion control. The update includes a new simulation test suite and trace-based validation. Feedback highlights a potential race condition during interval updates and suggests replacing hardcoded scaling factors with named constants. Furthermore, it is recommended to transition the EMA and pressure score decay to time-weighted calculations to maintain consistency across varying reporting frequencies.

gemini-code-assist · 2026-04-24T05:47:10Z

 	current := time.Duration(c.scanInterval.Load())
-	if current != defaultScanInterval {
-		c.scanInterval.Store(int64(defaultScanInterval))
-		metrics.EventServiceScanWindowIntervalGaugeVec.WithLabelValues(c.changefeedID.String()).Set(defaultScanInterval.Seconds())
-
-		log.Info("scan interval reset to default",
-			zap.Stringer("changefeedID", c.changefeedID),
-			zap.Duration("oldInterval", current),
-			zap.Duration("newInterval", defaultScanInterval))
+	decision := c.scanWindowController.OnCongestionReport(now, current, c.maxScanInterval(), scanWindowReport{
+		usageRatio:         normalizeUsageRatio(usageRatio),
+		memoryReleaseCount: memoryReleaseCount,
+	})
+	if decision.newInterval == current {
+		return
 	}

-	c.lastAdjustTime.Store(now)
-	c.lastTrendAdjustTime.Store(now)
+	c.scanInterval.Store(int64(decision.newInterval))


There is a potential race condition when updating scanInterval. The current interval is loaded at line 257 and the new decision is stored at line 266. Since these operations are not atomic with respect to each other, concurrent calls to updateMemoryUsage (which can happen if multiple collectors report congestion for the same changefeed) might result in a decision based on a stale current value, leading to inconsistent interval adjustments. Consider synchronizing the load, calculation, and store operations, for example by passing the atomic variable into the controller or wrapping the call in a mutex.

gemini-code-assist · 2026-04-24T05:47:10Z

+		newInterval := max(scaleDuration(current, 3, 4), defaultScanInterval)
+		c.noteAdjustmentLocked(now, true)
+		return scanWindowDecision{
+			newInterval:   newInterval,
+			maxInterval:   maxInterval,
+			reason:        scanWindowDecisionHighPressure,
+			usage:         usage,
+			fastUsageEMA:  c.fastUsageEMA,
+			slowUsageEMA:  c.slowUsageEMA,
+			pressureScore: c.pressureScore,
+		}
+	}
+
+	if c.shouldReduceForSustainedPressureLocked(now, usage) {
+		newInterval := max(scaleDuration(current, 9, 10), defaultScanInterval)
+		c.noteAdjustmentLocked(now, true)
+		return scanWindowDecision{
+			newInterval:   newInterval,
+			maxInterval:   maxInterval,
+			reason:        scanWindowDecisionSustainedPressure,
+			usage:         usage,
+			fastUsageEMA:  c.fastUsageEMA,
+			slowUsageEMA:  c.slowUsageEMA,
+			pressureScore: c.pressureScore,
+		}
+	}
+
+	if !c.allowedToIncreaseLocked(now, usage) {
+		return scanWindowDecision{
+			newInterval:   current,
+			maxInterval:   maxInterval,
+			reason:        scanWindowDecisionNone,
+			usage:         usage,
+			fastUsageEMA:  c.fastUsageEMA,
+			slowUsageEMA:  c.slowUsageEMA,
+			pressureScore: c.pressureScore,
+		}
+	}
+
+	if c.isVeryLowPressureLocked(usage) {
+		effectiveMaxInterval := maxScanInterval
+		newInterval := min(scaleDuration(current, 3, 2), effectiveMaxInterval)
+		if newInterval > current {
+			c.noteAdjustmentLocked(now, false)
+			return scanWindowDecision{
+				newInterval:   newInterval,
+				maxInterval:   effectiveMaxInterval,
+				reason:        scanWindowDecisionVeryLowRecovery,
+				usage:         usage,
+				fastUsageEMA:  c.fastUsageEMA,
+				slowUsageEMA:  c.slowUsageEMA,
+				pressureScore: c.pressureScore,
+			}
+		}
+	}
+
+	if current < maxInterval && c.isLowPressureLocked(usage) {
+		newInterval := min(scaleDuration(current, 5, 4), maxInterval)
+		if newInterval > current {
+			c.noteAdjustmentLocked(now, false)
+			return scanWindowDecision{
+				newInterval:   newInterval,
+				maxInterval:   maxInterval,
+				reason:        scanWindowDecisionLowRecovery,
+				usage:         usage,
+				fastUsageEMA:  c.fastUsageEMA,
+				slowUsageEMA:  c.slowUsageEMA,
+				pressureScore: c.pressureScore,
+			}
+		}
+	}


The scaling factors used for interval adjustments (e.g., 3/4, 9/10, 3/2, 5/4) are hardcoded magic numbers. These should be defined as named constants (e.g., scanWindowHighPressureScaleNumerator, etc.) to improve readability and facilitate future tuning of the adaptive algorithm.

gemini-code-assist · 2026-04-24T05:47:10Z

+func (c *adaptiveScanWindowController) updateUsageEMALocked(value float64) {
+	if !c.emaInitialized {
+		c.fastUsageEMA = value
+		c.slowUsageEMA = value
+		c.emaInitialized = true
+		return
+	}
+	c.fastUsageEMA = ema(c.fastUsageEMA, value, scanWindowFastUsageAlpha)
+	c.slowUsageEMA = ema(c.slowUsageEMA, value, scanWindowSlowUsageAlpha)
+}


The EMA calculation previous + alpha*(value-previous) assumes a constant sampling interval. If congestion reports arrive at irregular frequencies, the EMA will be biased. For more robust behavior in a distributed environment, consider a time-weighted EMA where the smoothing factor accounts for the time elapsed since the last sample.

gemini-code-assist · 2026-04-24T05:47:10Z

+func (c *adaptiveScanWindowController) updatePressureScoreLocked(usage memoryUsageStats) {
+	switch {
+	case c.fastUsageEMA >= scanWindowHighPressureThreshold ||
+		c.slowUsageEMA >= scanWindowHighPressureThreshold ||
+		usage.max >= memoryUsageHighThreshold:
+		c.pressureScore = min(c.pressureScore+2, scanWindowPressureScoreCeiling)
+	case c.fastUsageEMA >= scanWindowModeratePressureThreshold ||
+		c.slowUsageEMA >= scanWindowModeratePressureThreshold ||
+		usage.avg >= scanWindowModeratePressureThreshold:
+		c.pressureScore = min(c.pressureScore+1, scanWindowPressureScoreCeiling)
+	case c.fastUsageEMA < 0.30 && c.slowUsageEMA < 0.25 && usage.last < 0.30:
+		c.pressureScore = maxFloat64(0, c.pressureScore-1.5)
+	default:
+		c.pressureScore = maxFloat64(0, c.pressureScore-0.5)
+	}
+}


The pressure score logic uses several magic numbers for increments and decrements. Furthermore, the constant decay in the default case (-0.5) makes the pressure score's recovery rate dependent on the reporting frequency rather than actual time. This could lead to premature recovery if reports are frequent, or sluggish recovery if they are sparse. Consider making the decay time-proportional.

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot · 2026-04-24T10:58:52Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

eventseivice: improve scan window

23b17ae

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 24, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 24, 2026

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

asddongmen added 3 commits April 24, 2026 14:12

eventseivice: add more metrics

14ab445

Signed-off-by: dongmen <414110582@qq.com>

eventseivice: refine scan window 2

0f5d87e

Signed-off-by: dongmen <414110582@qq.com>

eventseivice: add trace test

91be4f7

Signed-off-by: dongmen <414110582@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eventseivice: improve scan window#4901

eventseivice: improve scan window#4901
asddongmen wants to merge 4 commits intopingcap:masterfrom
asddongmen:0424-scanwindow

asddongmen commented Apr 24, 2026 •

edited

Loading

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

asddongmen commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asddongmen commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading