perf(y): same-length fast path in CompareKeys by shaunpatterson · Pull Request #2283 · dgraph-io/badger

shaunpatterson · 2026-05-23T18:09:30Z

Summary

y.CompareKeys is the dominant comparator for the LSM merge path and is called on every step of every forward/reverse scan that spans multiple iterators.

When both keys have identical total length, the user-key portion has the same length too (since every key carries an 8-byte timestamp suffix), so a single bytes.Compare over the full key buffers is equivalent to comparing user-keys then timestamps. This short-circuits the common case (matching timestamp widths) into one SIMD-vectorized library call instead of two sub-slice operations + two calls.

When lengths differ, falls back to the previous split-compare path.

Measurement

Composite (74-benchmark stable subset): -1.88% ns/op, median of 3 runs. Improvements concentrate in iterator/merge-heavy benchmarks (BenchmarkReadMerged, BenchmarkRead, BenchmarkReadAndBuild).

Test plan

go test -short -race ./y/ ./table/ — all existing tests pass
go vet ./...
CI

New tests in `y/y_test.go`

TestCompareKeys — 7 hand-picked cases covering:
- identical keys (same length, same ts)
- same total length, different user-keys
- same total length + same user-key, different timestamps (newer-first ordering)
- different user-key lengths (e.g. a<ts> vs aa<ts> — newer should sort higher)
- timestamp tie-break only triggers when user-keys match
- antisymmetry: CompareKeys(b, a) == -CompareKeys(a, b)
TestCompareKeysFuzz — 5000 randomized pairs cross-checked against a referenceCompareKeys that always splits user-key from ts. ~30% of cases force equal user-key lengths to exercise the fast path; ~20% force matching timestamps to exercise the tie-break.

🤖 Generated with Claude Code

When both keys have identical total length, the user-key portion has the same length too, so a single bytes.Compare over the full key buffers is equivalent to comparing user-keys then timestamps. This short-circuits the common case (identical timestamp width on both sides) into one SIMD-vectorized compare instead of two sub-slice + two calls. When lengths differ, fall back to the previous split-compare path. Adds TestCompareKeys with hand-picked cases that exercise: - identical keys - same-length / different user-key - same-length / same user-key / different timestamps - different-length user keys (a<ts> vs aa<ts>) - timestamp tie-break only triggered when user-keys match And TestCompareKeysFuzz cross-checking 5000 randomized pairs against a reference implementation that always splits user-key from ts. Measured -1.88% composite ns/op across the 74-benchmark stable subset.

When total lengths differ, the user-key lengths also differ, so the user-key compare can never return 0. The trailing ts tiebreak was dead code after the same-length fast path. Drop it for clarity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shaunpatterson requested a review from a team as a code owner May 23, 2026 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(y): same-length fast path in CompareKeys#2283

perf(y): same-length fast path in CompareKeys#2283
shaunpatterson wants to merge 2 commits into
dgraph-io:mainfrom
shaunpatterson:perf/comparekeys-same-length

shaunpatterson commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

shaunpatterson commented May 23, 2026

Summary

Measurement

Test plan

New tests in y/y_test.go

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

New tests in `y/y_test.go`