Use bulk writes instead of per-element writes in vector serialization#681
Use bulk writes instead of per-element writes in vector serialization#681r-devulap wants to merge 7 commits into
Conversation
Replace inefficient element-by-element write loops with bulk write operations in MemorySegmentVectorProvider: - writeFloatVector: Extract underlying float array and use writeFloats() instead of looping with writeFloat() - writeByteSequence: Extract underlying byte array and use write() instead of looping with writeByte()
|
Before you submit for review:
If you did not complete any of these, then please explain below. |
tlwillke
left a comment
There was a problem hiding this comment.
writeFloatVector is fine. writeByteSequence has a bug. You are ignoring slice offsets. MemorySegmentByteSequence supports slicing, but .heapBase().get() returns the original unsliced array. I have suggested a change that fixes this and added minimal testing.
Once this is addressed and the tests pass, it is good to go.
On performance, while I appreciate the end-to-end benchmarking and consider it necessary, I would also like to see a unit-level microbenchmark isolating writeFloatVector and writeByteSequence to quantify the exact bulk-write speedup.
…eSequence ,heapBase().get() ignores slicing and returns the base of the original ByteArray.
|
Had to move |
4d5f59d to
3c0057b
Compare
…tVector and writeByteSequence
|
Added microbenchmarks. Let me know if that is what you what you were looking for. Here is the comparison with main branch on a x8i.24xlarge (96 core Intel GNR) |
tlwillke
left a comment
There was a problem hiding this comment.
Thanks for the fix and adding the microbenchmarks. Looks good.
Replace inefficient element-by-element write loops with bulk write operations in MemorySegmentVectorProvider:
With highway as the SIMD backend #668 , the scalar writes show up as a bottleneck when constructing index on a AWS instance x8i.24xlarge (2-socket 96 core Intel GNR) .