Skip to content

sink: optimize causality slot mapping#4903

Open
wk989898 wants to merge 1 commit intopingcap:masterfrom
wk989898:slot-conflict-detector
Open

sink: optimize causality slot mapping#4903
wk989898 wants to merge 1 commit intopingcap:masterfrom
wk989898:slot-conflict-detector

Conversation

@wk989898
Copy link
Copy Markdown
Collaborator

@wk989898 wk989898 commented Apr 24, 2026

What problem does this PR solve?

Issue Number: ref #4582

What is changed and how it works?

This pull request optimizes the causality slot mapping by introducing a specialized mapping function that uses bitmasking for power-of-two slot counts, improving performance on the hot path.

Check List

Tests

  • Unit test

  • Integration test

  • Manual test (add detailed scripts or steps below)
    The selected power-of-two slot mapper reduces slot index cost from 8.39 ns/op to 3.46 ns/op, about 2.4x faster than variable modulo. In sortHashes, keeping dedup while only switching the mapper improves the benchmark by about 1.4x to 1.8x depending on hash count.
    hashes_8 unique:
    old_modulo_dedup 545.240 ns/op
    selected_mapper_dedup 366.320 ns/op 1.49x faster
    selected_mapper_no_dedup 340.500 ns/op 1.60x faster

    hashes_8 with_duplicates:
    old_modulo_dedup 489.960 ns/op
    selected_mapper_dedup 345.460 ns/op 1.42x faster
    selected_mapper_no_dedup 336.380 ns/op 1.46x faster

    hashes_64 unique:
    old_modulo_dedup 8108.800 ns/op
    selected_mapper_dedup 4721.400 ns/op 1.72x faster
    selected_mapper_no_dedup 4533.600 ns/op 1.79x faster

    hashes_64 with_duplicates:
    old_modulo_dedup 7490.400 ns/op
    selected_mapper_dedup 4355.200 ns/op 1.72x faster
    selected_mapper_no_dedup 4202.000 ns/op 1.78x faster

    hashes_1024 unique:
    old_modulo_dedup 266482.000 ns/op
    selected_mapper_dedup 149083.600 ns/op 1.79x faster
    selected_mapper_no_dedup 144516.600 ns/op 1.84x faster

    hashes_1024 with_duplicates:
    old_modulo_dedup 250754.600 ns/op
    selected_mapper_dedup 138317.200 ns/op 1.81x faster
    selected_mapper_no_dedup 136423.000 ns/op 1.84x faster

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Tests

    • Added comprehensive unit test suite for slot indexing across various configurations
    • Introduced performance benchmarks for slot selection and hash sorting operations
  • Refactor

    • Optimized internal slot selection mechanism for improved efficiency
    • Enhanced hash ordering for concurrent lock acquisition
  • Documentation

    • Updated documentation for slot configuration defaults

@ti-chi-bot ti-chi-bot Bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

The slot-selection mechanism in the causality package is optimized by preselecting appropriate mapping functions at creation time: bitmask operations for power-of-two slot counts and modulo arithmetic otherwise. Hash ordering logic is refactored, with new test and benchmark suites added to validate behavior and measure performance.

Changes

Cohort / File(s) Summary
Core slot logic
downstreamadapter/sink/mysql/causality/slot.go
Preselects slot-mapping function (getSlot) during construction using bitmask for power-of-two counts or modulo otherwise. Refactors hash ordering from sortAndDedupHashes to sortHashes for lock acquisition, separating concerns of sorting by slot index and deduplication. Adds zero-value validation to NewSlots.
Testing infrastructure
downstreamadapter/sink/mysql/causality/slot_test.go, downstreamadapter/sink/mysql/causality/slot_benchmark_test.go
Introduces unit tests validating getSlotByMask, getSlotByModulo, and NewSlots behavior across power-of-two and non-power-of-two configurations. Adds benchmarks measuring slot index selection methods and hash sorting performance with varying hash counts and deduplication patterns.
Documentation
downstreamadapter/sink/mysql/sink.go
Updates inline comment for defaultConflictDetectorSlots to clarify that the power-of-two requirement is due to bitmask optimization in the causality slot mapper's hot path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through slots so fine,
Bitmask magic, power-of-two design,
Hashes sorted swift and true,
No dedup delays to slow the view!
Benchmarks prove the gains we made,
In causality's grand arcade! 🚀

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description includes an Issue Number reference but omits critical details: performance regression/compatibility impact is unanswered, and release notes section is incomplete. Complete the unanswered sections: explicitly state whether this causes performance regression or breaks compatibility, and provide a release note following the style guide or state 'None' if not applicable.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'sink: optimize causality slot mapping' accurately and specifically describes the main change—optimizing the causality slot mapper by preselecting the appropriate mapping function at creation time and improving performance.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

@wk989898: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review d01dbaa link true /test pull-error-log-review

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the causality slot mapping by introducing a specialized mapping function that uses bitmasking for power-of-two slot counts, improving performance on the hot path. It also includes comprehensive benchmarks and unit tests for the new mapping logic. However, a critical regression was found in the sortHashes function, which removed the deduplication logic previously present in sortAndDedupHashes. This omission could lead to self-dependency panics in the Node structure when duplicate hashes are processed.

Comment on lines 175 to 179
sort.Slice(hashes, func(i, j int) bool {
return hashes[i]%numSlots < hashes[j]%numSlots
return getSlot(hashes[i]) < getSlot(hashes[j])
})

// Dedup hashes
last := hashes[0]
j := 1
for i, hash := range hashes {
if i == 0 {
// skip first one, start checking duplication from 2nd one
continue
}
if hash == last {
continue
}
last = hash
hashes[j] = hash
j++
}
hashes = hashes[:j]

return hashes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The sortHashes function is missing the deduplication logic that was present in the original sortAndDedupHashes. If the input hashes contains duplicate values, the Add method will attempt to add the same node to the dependency graph multiple times, causing it to depend on itself and triggering a panic in Node.dependOn. Additionally, the sort should use the hash value as a secondary key to ensure identical hashes are adjacent for deduplication.

	sort.Slice(hashes, func(i, j int) bool {
		si, sj := getSlot(hashes[i]), getSlot(hashes[j])
		if si != sj {
			return si < sj
		}
		return hashes[i] < hashes[j]
	})

	if len(hashes) < 2 {
		return hashes
	}

	j := 0
	for i := 1; i < len(hashes); i++ {
		if hashes[i] != hashes[j] {
			j++
			hashes[j] = hashes[i]
		}
	}
	return hashes[:j+1]

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
downstreamadapter/sink/mysql/causality/slot.go (1)

170-180: Optional: precompute slot indices to avoid O(N log N) closure calls in the comparator.

sort.Slice's comparator invokes getSlot twice per comparison. For the hot-path slice lengths seen in the benchmarks (up to ~1024), precomputing the slot index once per element and sorting indices (or pairs) typically gives a meaningful speedup and also removes the indirect call from the comparator. Not required — bench numbers in slot_benchmark_test.go should tell whether it's worth doing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@downstreamadapter/sink/mysql/causality/slot.go` around lines 170 - 180, The
comparator for sort.Slice in sortHashes calls getSlot repeatedly; precompute
each element's slot once to avoid O(N log N) closure calls by creating a
temporary slice of structs or pairs (e.g., {hash uint64, slot int}) populated by
calling getSlot(hash) for every hash, sort that temporary slice by the
precomputed slot (using sort.Slice), then build and return a slice of hashes
from the sorted pairs; update function sortHashes and reference getSlot only
during the precompute step.
downstreamadapter/sink/mysql/causality/slot_test.go (1)

22-49: Consider also covering the zero‑slot panic and the non‑power‑of‑two mask path.

Expected values are correct, but the new log.Panic("causality slot count must be positive") branch in NewSlots and the isPowerOfTwo false path (e.g. NewSlots(7)) are not exercised here. A small require.Panics(...) test plus a non‑power‑of‑two getSlot case would lock in the selection logic going forward.

🧪 Proposed additional test
 func TestNewSlotIndexFunc(t *testing.T) {
 	t.Parallel()

 	powerOfTwoSlots := NewSlots(16)
 	require.Equal(t, uint64(3), powerOfTwoSlots.getSlot(19))
 	require.Equal(t, uint64(15), powerOfTwoSlots.getSlot(31))

 	nonPowerOfTwoSlots := NewSlots(6)
 	require.Equal(t, uint64(1), nonPowerOfTwoSlots.getSlot(19))
 	require.Equal(t, uint64(5), nonPowerOfTwoSlots.getSlot(11))
 }
+
+func TestNewSlotsZeroPanics(t *testing.T) {
+	t.Parallel()
+	require.Panics(t, func() { NewSlots(0) })
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@downstreamadapter/sink/mysql/causality/slot_test.go` around lines 22 - 49,
Add tests to cover the panic and non-power-of-two branching: add a
require.Panics(...) that calls NewSlots(0) to exercise the log.Panic("causality
slot count must be positive") branch, and add another case that constructs a
Slots with a non-power-of-two value (e.g. NewSlots(7)) and asserts the expected
getSlot result to exercise the isPowerOfTwo == false path (same target function:
NewSlots and method getSlot; you can reuse existing patterns from
TestNewSlotIndexFunc).
downstreamadapter/sink/mysql/causality/slot_benchmark_test.go (1)

28-74: Indexing assumes len(hashes) is a power of two.

hashes[i&(len(hashes)-1)] only wraps correctly because makeBenchmarkSlotHashes(4096) happens to return a power‑of‑two length. If the 4096 constant is ever tweaked (e.g. to 1000) the wrap math is silently wrong. Consider either asserting the length in the helper or using i%len(hashes) — branch-predicted modulo on a constant-per-iteration divisor is fine in benchmark harness code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@downstreamadapter/sink/mysql/causality/slot_benchmark_test.go` around lines
28 - 74, The benchmark currently indexes the hashes slice using bitmask wrapping
(hashes[i&(len(hashes)-1)]) inside BenchmarkSlotIndex which only works when
makeBenchmarkSlotHashes(...) returns a power-of-two length; update the benchmark
to avoid the silent bug by either (A) adding an explicit runtime assertion in
makeBenchmarkSlotHashes that len(hashes) is a power of two (so callers like
BenchmarkSlotIndex get a clear failure), or (B) change all occurrences inside
BenchmarkSlotIndex to use safe wrapping via i%len(hashes) (e.g., replace
hashes[i&(len(hashes)-1)] with hashes[i%len(hashes)]); reference symbols:
makeBenchmarkSlotHashes, BenchmarkSlotIndex, and the inner loops that currently
use i&(len(hashes)-1).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/mysql/causality/slot_benchmark_test.go`:
- Around line 176-191: dedupSortedHashes currently dereferences hashes[0]
without checking length and uses a range loop with an i==0 continue; fix by
returning early for len(hashes) <= 1 to avoid panic, then iterate from index 1
(e.g., for i := 1; i < len(hashes); i++) comparing hashes[i] to last, compacting
into hashes[j] as before; keep the function signature and return hashes[:j]
unchanged so callers of dedupSortedHashes and any future copies won't panic on
empty input.

---

Nitpick comments:
In `@downstreamadapter/sink/mysql/causality/slot_benchmark_test.go`:
- Around line 28-74: The benchmark currently indexes the hashes slice using
bitmask wrapping (hashes[i&(len(hashes)-1)]) inside BenchmarkSlotIndex which
only works when makeBenchmarkSlotHashes(...) returns a power-of-two length;
update the benchmark to avoid the silent bug by either (A) adding an explicit
runtime assertion in makeBenchmarkSlotHashes that len(hashes) is a power of two
(so callers like BenchmarkSlotIndex get a clear failure), or (B) change all
occurrences inside BenchmarkSlotIndex to use safe wrapping via i%len(hashes)
(e.g., replace hashes[i&(len(hashes)-1)] with hashes[i%len(hashes)]); reference
symbols: makeBenchmarkSlotHashes, BenchmarkSlotIndex, and the inner loops that
currently use i&(len(hashes)-1).

In `@downstreamadapter/sink/mysql/causality/slot_test.go`:
- Around line 22-49: Add tests to cover the panic and non-power-of-two
branching: add a require.Panics(...) that calls NewSlots(0) to exercise the
log.Panic("causality slot count must be positive") branch, and add another case
that constructs a Slots with a non-power-of-two value (e.g. NewSlots(7)) and
asserts the expected getSlot result to exercise the isPowerOfTwo == false path
(same target function: NewSlots and method getSlot; you can reuse existing
patterns from TestNewSlotIndexFunc).

In `@downstreamadapter/sink/mysql/causality/slot.go`:
- Around line 170-180: The comparator for sort.Slice in sortHashes calls getSlot
repeatedly; precompute each element's slot once to avoid O(N log N) closure
calls by creating a temporary slice of structs or pairs (e.g., {hash uint64,
slot int}) populated by calling getSlot(hash) for every hash, sort that
temporary slice by the precomputed slot (using sort.Slice), then build and
return a slice of hashes from the sorted pairs; update function sortHashes and
reference getSlot only during the precompute step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5aead7b-ea05-4dcb-b179-ca6f670d99b5

📥 Commits

Reviewing files that changed from the base of the PR and between 94cd0a7 and d01dbaa.

📒 Files selected for processing (4)
  • downstreamadapter/sink/mysql/causality/slot.go
  • downstreamadapter/sink/mysql/causality/slot_benchmark_test.go
  • downstreamadapter/sink/mysql/causality/slot_test.go
  • downstreamadapter/sink/mysql/sink.go

Comment on lines +176 to +191
func dedupSortedHashes(hashes []uint64) []uint64 {
last := hashes[0]
j := 1
for i, hash := range hashes {
if i == 0 {
continue
}
if hash == last {
continue
}
last = hash
hashes[j] = hash
j++
}
return hashes[:j]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

dedupSortedHashes has a latent panic on empty input and an awkward loop.

hashes[0] is accessed before any length check; currently safe because both callers guard len == 0, but a copy of this helper elsewhere would panic. The for i, hash := range hashes { if i == 0 { continue } ... } shape is also harder to read than a plain index loop.

♻️ Proposed cleanup
 func dedupSortedHashes(hashes []uint64) []uint64 {
-	last := hashes[0]
-	j := 1
-	for i, hash := range hashes {
-		if i == 0 {
-			continue
-		}
-		if hash == last {
-			continue
-		}
-		last = hash
-		hashes[j] = hash
-		j++
-	}
-	return hashes[:j]
+	if len(hashes) == 0 {
+		return hashes
+	}
+	j := 1
+	for i := 1; i < len(hashes); i++ {
+		if hashes[i] != hashes[j-1] {
+			hashes[j] = hashes[i]
+			j++
+		}
+	}
+	return hashes[:j]
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func dedupSortedHashes(hashes []uint64) []uint64 {
last := hashes[0]
j := 1
for i, hash := range hashes {
if i == 0 {
continue
}
if hash == last {
continue
}
last = hash
hashes[j] = hash
j++
}
return hashes[:j]
}
func dedupSortedHashes(hashes []uint64) []uint64 {
if len(hashes) == 0 {
return hashes
}
j := 1
for i := 1; i < len(hashes); i++ {
if hashes[i] != hashes[j-1] {
hashes[j] = hashes[i]
j++
}
}
return hashes[:j]
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@downstreamadapter/sink/mysql/causality/slot_benchmark_test.go` around lines
176 - 191, dedupSortedHashes currently dereferences hashes[0] without checking
length and uses a range loop with an i==0 continue; fix by returning early for
len(hashes) <= 1 to avoid panic, then iterate from index 1 (e.g., for i := 1; i
< len(hashes); i++) comparing hashes[i] to last, compacting into hashes[j] as
before; keep the function signature and return hashes[:j] unchanged so callers
of dedupSortedHashes and any future copies won't panic on empty input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant