Skip to content

perf(iterator): lazy-allocate Item.slice#2292

Open
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:perf/item-slice-lazy
Open

perf(iterator): lazy-allocate Item.slice#2292
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:perf/item-slice-lazy

Conversation

@shaunpatterson
Copy link
Copy Markdown

Summary

newItem eagerly allocated a y.Slice wrapper per Item even though both users of item.slice (yieldItemValue and prefetchValue) already nil-check and lazy-init it before any access. Iterators that never read values (KeyOnly, or AllVersions=true with PrefetchValues=false — e.g. dgraph's posting list rollup walking ~8 versions per cache miss) were paying a 24-byte allocation per Item creation for a struct they never touched.

Drop the eager new(y.Slice) and rely on the existing lazy-init. Add a regression test covering all three value-reading paths (PrefetchValues=true, synchronous Item.Value, ValueCopy) to ensure the lazy initialization stays correct.

Benchmark — BenchmarkRollupKeyIterator (Apple M4 Max, 5×3s)

ns/op B/op allocs/op
before ~910 873 18
after ~855 825 16

~6% faster, 48 B/op saved, 2 fewer allocations per iterator setup.

Test plan

  • go test ./... passes
  • New TestRegressionLazyItemSliceValueRead covers all three value-read paths

🤖 Generated with Claude Code

newItem eagerly allocated a y.Slice wrapper per Item even though both
users of item.slice (yieldItemValue and prefetchValue) already nil-check
and lazy-init it before any access. Iterators that never read values
(KeyOnly, or AllVersions=true with PrefetchValues=false — e.g. dgraph's
posting list rollup walking ~8 versions per cache miss) were paying a
24-byte allocation per Item creation for a struct they never touched.

Drop the eager new(y.Slice) and rely on the existing lazy-init. Add a
regression test covering all three value-reading paths
(PrefetchValues=true, synchronous Item.Value, ValueCopy) to ensure the
lazy initialization stays correct.

BenchmarkRollupKeyIterator (Apple M4 Max, 5×3s):
  Before:  ~910 ns/op  873 B/op  18 allocs/op
  After:   ~855 ns/op  825 B/op  16 allocs/op

~6% faster, 48 B/op saved, 2 fewer allocations per iterator setup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaunpatterson shaunpatterson requested a review from a team as a code owner May 26, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant