dd: optimize O_DIRECT buffer alignment to reduce syscall overhead#9104
dd: optimize O_DIRECT buffer alignment to reduce syscall overhead#9104naoNao89 wants to merge 7 commits intouutils:mainfrom
Conversation
9f131bd to
fecebff
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
Implement page-aligned buffer allocation and optimize O_DIRECT flag handling to match GNU dd behavior. Key changes: - Add allocate_aligned_buffer() for page-aligned memory allocation - Update buffer allocation to use aligned buffers - Modify handle_o_direct_write() to only remove O_DIRECT for partial blocks - Add Output::write_with_o_direct_handling() for proper O_DIRECT handling - Add comprehensive unit and integration tests Fixes uutils#6078
fecebff to
2560240
Compare
…IRECT on ARM O_DIRECT requires page-aligned buffers and writes. The conv=sync flag pads output to block size, which may not be page-aligned, causing EINVAL errors on ARM systems. The core O_DIRECT functionality is already well-tested by: - test_o_direct_with_aligned_buffer_full_blocks - test_o_direct_with_partial_final_block - test_o_direct_various_block_sizes
|
GNU testsuite comparison: |
|
I need more dopamine when stuck on a bug, so new PRs might be good :)) |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
| /// This function allocates a `Vec<u8>` with proper alignment to support O_DIRECT | ||
| /// without triggering EINVAL errors. | ||
| #[cfg(any(target_os = "linux", target_os = "android"))] | ||
| fn allocate_aligned_buffer(size: usize) -> Vec<u8> { |
There was a problem hiding this comment.
@sylvestre is this something we could move to a more central location? Or is this the only place where we need aligned memory allocations?
There was a problem hiding this comment.
which programs will use it ? thanks
- Remove dead code: non-Linux stub for handle_o_direct_write The stub was unreachable since write_with_o_direct_handling already has a non-Linux stub that doesn't call this helper function. - Fix clippy::ptr-as-ptr lint error Replace unsafe `as *mut u8` cast with safer `.cast::<u8>()` method in allocate_aligned_buffer function. Addresses review comments and CI/CD failures in PR uutils#9104.
|
GNU testsuite comparison: |
Removed redundant buffer initialization in allocate_aligned_buffer that was causing performance regression, especially for large block sizes. - Eliminated O(n) write_bytes overhead that scaled with buffer size - Fixes 29.36% regression for 1M blocks and 6.22% for 64K blocks - Buffer is correctly filled during copy operations, making pre-init redundant
|
conflicting |
Fixes #6078
page-aligned buffers + smarter O_DIRECT handling. Theory says 5x fewer syscalls. 🗿
Checklist: