perf(table): word-wise LCP in Builder.keyDiff#2282
Open
shaunpatterson wants to merge 1 commit into
Open
Conversation
Replace the byte-by-byte longest common prefix loop in Builder.keyDiff with an 8-byte word-wise loop using binary.LittleEndian.Uint64 + bits.TrailingZeros64. The Go compiler lowers Uint64 to a single unaligned 64-bit load on amd64/arm64, and TrailingZeros64 maps to a single instruction (TZCNT/CLZ-based), so each iteration covers 8 bytes for the cost of one load + xor + popcount-class op. A byte-by-byte tail loop handles keys whose remaining length is < 8. Adds TestKeyDiff covering: empty inputs, equal keys, diffs at every byte position within and across word boundaries, length boundaries (7/8/9/16+), and asymmetric base vs new lengths. Adds TestKeyDiffMatchesNaive cross-checking against a byte-wise reference over 2000 randomized cases. Measured -1.0% composite ns/op across the 74-benchmark stable subset (builder workloads see the largest single-benchmark improvement, e.g. BenchmarkBuilder/no_compression -12.5%).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the byte-by-byte longest-common-prefix loop in
Builder.keyDiffwith an 8-byte word-wise loop usingbinary.LittleEndian.Uint64+bits.TrailingZeros64. The Go compiler lowersUint64to a single unaligned 64-bit load on amd64/arm64, andTrailingZeros64maps to a single CPU instruction (TZCNT-class), so each iteration covers 8 bytes for the cost of one load + xor + tzcnt.A byte-by-byte tail loop handles the remaining < 8 bytes when key lengths aren't multiples of 8.
The bit math: on little-endian, the lowest-position byte in memory is the lowest-order byte of the loaded
uint64.TrailingZeros64(a^c) >> 3therefore gives the byte index of the first differing byte directly — no second-pass byte scan needed.Measurement
BenchmarkBuilder/no_compression: -12.5%ns/op, median of 3 runsTest plan
go test -short -race ./table/— all existing builder/table tests passgo vet ./...New tests in
table/builder_test.goTestKeyDiff— 19 hand-picked cases covering:TestKeyDiffMatchesNaive— 2000 randomized inputs cross-checked against a byte-wise reference, with ~50% of cases forced to share a random prefix to stress the LCP boundary.🤖 Generated with Claude Code