Skip to content

perf(table): word-wise LCP in Builder.keyDiff#2282

Open
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:perf/keydiff-word-wise-lcp
Open

perf(table): word-wise LCP in Builder.keyDiff#2282
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:perf/keydiff-word-wise-lcp

Conversation

@shaunpatterson
Copy link
Copy Markdown

Summary

Replaces the byte-by-byte longest-common-prefix loop in Builder.keyDiff with an 8-byte word-wise loop using binary.LittleEndian.Uint64 + bits.TrailingZeros64. The Go compiler lowers Uint64 to a single unaligned 64-bit load on amd64/arm64, and TrailingZeros64 maps to a single CPU instruction (TZCNT-class), so each iteration covers 8 bytes for the cost of one load + xor + tzcnt.

A byte-by-byte tail loop handles the remaining < 8 bytes when key lengths aren't multiples of 8.

The bit math: on little-endian, the lowest-position byte in memory is the lowest-order byte of the loaded uint64. TrailingZeros64(a^c) >> 3 therefore gives the byte index of the first differing byte directly — no second-pass byte scan needed.

Measurement

  • BenchmarkBuilder/no_compression: -12.5%
  • Composite (74-benchmark stable subset): -1.02% ns/op, median of 3 runs

Test plan

  • go test -short -race ./table/ — all existing builder/table tests pass
  • go vet ./...
  • CI

New tests in table/builder_test.go

  • TestKeyDiff — 19 hand-picked cases covering:
    • empty inputs (empty base, empty new)
    • identical keys at various lengths (3, 8, 16 bytes)
    • difference at every byte position inside the first word (bytes 0, 3, 7)
    • difference crossing the word boundary (byte 8 of a 16-byte key)
    • tail-only paths (keys with length < 8, or remainder < 8 after the word loop)
    • asymmetric base vs new lengths (new longer, new shorter, full-prefix match)
    • binary keys containing zero bytes
  • TestKeyDiffMatchesNaive — 2000 randomized inputs cross-checked against a byte-wise reference, with ~50% of cases forced to share a random prefix to stress the LCP boundary.

🤖 Generated with Claude Code

Replace the byte-by-byte longest common prefix loop in Builder.keyDiff
with an 8-byte word-wise loop using binary.LittleEndian.Uint64 +
bits.TrailingZeros64. The Go compiler lowers Uint64 to a single
unaligned 64-bit load on amd64/arm64, and TrailingZeros64 maps to a
single instruction (TZCNT/CLZ-based), so each iteration covers 8
bytes for the cost of one load + xor + popcount-class op.

A byte-by-byte tail loop handles keys whose remaining length is < 8.

Adds TestKeyDiff covering: empty inputs, equal keys, diffs at every
byte position within and across word boundaries, length boundaries
(7/8/9/16+), and asymmetric base vs new lengths. Adds
TestKeyDiffMatchesNaive cross-checking against a byte-wise reference
over 2000 randomized cases.

Measured -1.0% composite ns/op across the 74-benchmark stable subset
(builder workloads see the largest single-benchmark improvement, e.g.
BenchmarkBuilder/no_compression -12.5%).
@shaunpatterson shaunpatterson requested a review from a team as a code owner May 23, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant