feat(ipc): Avoid zero-filling IPC reads with typed buffer handling#9971
feat(ipc): Avoid zero-filling IPC reads with typed buffer handling#9971pchintar wants to merge 1 commit into
Conversation
b59bbe2 to
e8845d0
Compare
292cb21 to
cabe3f2
Compare
|
run benchmark ipc_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing ipc-typed-buffer-reads (cabe3f2) to 1ffd202 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
alamb
left a comment
There was a problem hiding this comment.
Thanks @pchintar -- this is getting closer but I think this still may cause issues.
I think any time we allocate (or reallcoate) a Vec<u8> we lose the alignment guarantee's. There are still several places in this PR that do Vec::new(...)
Poking around I found a way to to get an aligned vector that maybe we can try here
So instead of this code
let mut buf = MutableBuffer::from_len_zeroed(total_len);
reader.read_exact(&mut buf)?;
Ok(buf.into())We could do something like
let mut buf = aligned_vec(total_len);
reader.read_exact(&mut buf)?;
Ok(buf.into())And as long as aligned_vec was aligned to (1<<6) (64) I think it would be equivalent to the current mutable buffer. Maybe we should align to 128 to guarantee views work 🤔
| /// reads. | ||
| enum IpcBufferSource<'a> { | ||
| Buffer(&'a Buffer), | ||
| } |
There was a problem hiding this comment.
Stylistically if you want to make a wrapper type I think a more common pattern is a an "new type" struct -- something like
struct IpcBufferSource<'a>(&'a Buffer)That way you can still define methods on it, but you don't have to use match all over the place - you can just refer to the inner field as .0
There was a problem hiding this comment.
this is fixed now, thnx
| compression: Option<CompressionCodec>, | ||
| decompression_context: &mut DecompressionContext, | ||
| ) -> Result<Buffer, ArrowError> { | ||
| let byte_len = len |
There was a problem hiding this comment.
I found the naming a little confusing here as there are three buffers
I found this a little confusing as there three buffers -- self::Buffer, buf (the input buffer) and the output buffer
Can we maybe name them something more specific to distinguish them? I am in particular confused about the buffer in self and the one passed in via the argument
There was a problem hiding this comment.
I have replaced with more exact variable names & also added more documentation here
| ArrowError::IpcError("Buffer count mismatched with metadata".to_string()) | ||
| })?; | ||
|
|
||
| self.data.read_typed_buffer::<T>( |
There was a problem hiding this comment.
this part looks good, so far
| // Some invalid or legacy IPC inputs may contain shorter buffers than | ||
| // implied by the schema. Preserve the existing behavior and let array | ||
| // construction/validation report the error. | ||
| if buffer.len() <= byte_len || buffer.as_ptr().align_offset(std::mem::align_of::<T>()) != 0 |
There was a problem hiding this comment.
This doens't seem right -- our change should ensure that the buffer is always aligned correctly to the otutput requirements. The fact you have to check afterwards suggests this is not the case
There was a problem hiding this comment.
I removed the "buffer.as_ptr().align_offset(std::mem::align_of::()) != 0" part as it was redundant
| buf.len() | ||
| ))); | ||
| } | ||
| Ok(Buffer::from_vec(buf)) |
There was a problem hiding this comment.
I think this Vec and the one below (still) has no guaranteed alignment.
There was a problem hiding this comment.
Yes, but this is always safe though & there's a net gain in speed nonetheless, so for
aligned case:
old/current path: zero-fill + read
new path: read only
=> faster
unaligned case:
old/current path: zero-fill + read
new path: read + alignment copy
=> ~same almost
51fd033 to
f1d1c3a
Compare
|
@adriangbot and @Dandandan could you pls run benchmark ipc_reader |
|
run benchmark ipc_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing ipc-typed-buffer-reads (f1d1c3a) to 1ffd202 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
|
||
| fn deref(&self) -> &[u8] { | ||
| // Safety: `ptr` points to `len` bytes owned by this AlignedVec. | ||
| unsafe { std::slice::from_raw_parts(self.ptr.as_ptr(), self.len) } |
There was a problem hiding this comment.
This looks like it could expose uninitialized data from safe code. I think this implementation has the same soundness issue as the one using Vec::with_capactiy and Vec::set_len.
There was a problem hiding this comment.
I agree, this still exposes the allocation as initialized [u8] through Deref/DerefMut before the reader has actually filled it, so it has the same issue as the earlier Vec::with_capacity + set_len approach.
I also explored the aligned-allocation direction separately by allocating through an aligned backing type and then converting back to Vec<u8>, but that ran into layout/deallocation issues under Miri because the allocation alignment was no longer preserved through the Vec<u8> ownership path.
There was a problem hiding this comment.
🤔
Taking a step back I think what we would have to do is to change the structure of the reader from "read a Page and then convert it to the right kind of buffer" so that any time the code needs to read a page we pass in the resulting type.
I think you partly did this in this PR, but for some reasons I don't fully understand the read_page is still in terms of Vec rather than Vec<T>
One thing I did notice was that when the pages are compressed, they need to be read as Vec (or at least the alignment isn't critiical as they get decompressed again immediately)
There was a problem hiding this comment.
Yeah, I see the remaining gap now: next_typed_buffer<T>() centralizes the typed length handling, but it still receives data from the existing byte-oriented body buffer path.
I’ll rework this revision to remove the unsound aligned allocation experiment and keep the typed-buffer handling focused on the fixed-width decode paths. For compressed buffers I’ll leave the existing byte-oriented path unchanged, since decompression already materializes a new buffer anyway.
There was a problem hiding this comment.
To make the deeper Vec<T> version fully safe, the implementation would have to be something like:
uncompressed fixed-width buffer
-> read/slice bytes
-> copy/convert into Vec<T>
-> Buffer::from_vec(Vec<T>)
That would produce a properly typed/aligned final buffer, but it would also add a byte->typed copy for every uncompressed fixed-width buffer and that could diminish the speed gains imo.
The version I don’t think is safe is:
Vec<T>::with_capacity(len)
-> expose spare capacity as bytes
-> Read::read_exact(...)
-> Buffer::from_vec(Vec<T>)
because that still needs an uninitialized read path on stable Rust.
So the alignment tradeoff is this: a fully typed/aligned read path for uncompressed fixed-width buffers would need either an uninitialized read path into Vec<T>/typed storage, or a safe but extra byte->typed copy before Buffer::from_vec(Vec<T>).
So for this revision I’m using neither of both by trying to avoid adding a new copy to the uncompressed hot path, and keep the change focused on next_typed_buffer<T>() doing the typed physical length handling before array construction.
f1d1c3a to
6c63301
Compare
Which issue does this PR close?
Rationale for this change
This PR is a follow-up to the alignment concerns raised in #9778 when using
Vec<u8>for IPC body reads to replace the currentMutableBuffer::from_len_zeroedin IPC Reader.My earlier approach showed that reading directly into
Vec<u8>could substantially reduce redundant zero-filling in IPC reader paths, but some decode paths still relied on fixed-width typed buffers that could require additional alignment handling cost later during array construction.This PR keeps the
Vec<u8>-based read path for IPC message and block bodies, while adding typed IPC buffer handling for fixed-width physical buffers before array construction.This preserves the existing alignment behavior for those fixed-width decode paths while avoiding the additional alignment handling/copying costs that could otherwise occur later during array construction.
The typed-buffer handling now covers:
These paths now read their physical buffers through
next_typed_buffer::<T>()so the expected physical buffer lengths are derived from the native value type before array construction.Exception Case:
UnionArrayrequiresScalarBufferinputs directly, and external IPC producers may provide unaligned union type id or offset buffers. The updated Union handling therefore preserves the aligned fast path while falling back to alignedScalarBufferconstruction only for unaligned external union buffers.Container types such as
Struct,FixedSizeList,RunEndEncoded, and similar nested/container arrays were intentionally left on their existing decode paths because they do not directly own fixed-width value buffers at that level. Their child arrays continue to decode recursively through the updated typed-buffer paths where applicable.Are these changes tested?
Yes.
The existing IPC reader test suite was run with:
cargo test -p arrow-ipc --libIPC reader benchmark was also run with:
The non-compressed, non-mmap IPC reader paths showed consistent improvements locally. Compressed and mmap-heavy paths were mostly neutral, as expected.
Are there any user-facing changes?
No.