fix: Ensure target file size is properly respected in vortex sink#33
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Vortex DataFusion sink write path to better respect target_file_size across both partitioned and non-partitioned outputs, replacing reliance on DataFusion’s demux behavior with sink-managed routing and file rotation. It also adds an expanded test suite covering file splitting and multi-partition write scenarios.
Changes:
- Implement custom file rotation and partition routing in
VortexSink, including a compression-ratio-based estimate to approximate target compressed size. - Add a concurrency cap for partition writers and new tests for partitioned writes, file splitting, and write-id consistency.
- Force a single sink input stream via a repartition step to ensure one coordinated write per statement.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
vortex-datafusion/src/tests/nested_projection.rs |
Marks nested projection tests as ignored. |
vortex-datafusion/src/persistent/sink.rs |
Replaces the prior sink writing/demux approach with custom per-file/per-partition writers, rotation logic, and extensive tests. |
vortex-datafusion/src/persistent/format.rs |
Adds a repartition-to-1 step before DataSinkExec to ensure a single coordinated sink write. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit 6c3be0f.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
* fix: Ensure target file sizes are respected * fix: Improve robustness for streams split over multiple cores * wip * wip * wip * wip * refactor: Improve robustness * refactor: Improve concurrent writer robustness for memory bounds * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/format.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/tests/nested_projection.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * review: Address comments * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * review: Address comments * Revert "Update vortex-datafusion/src/persistent/sink.rs" This reverts commit 6c3be0f. * refactor: Simplify * test: Add file writer finish notify test * review: Address comments --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Ensure target file size is properly respected in vortex sink (#33) * fix: Ensure target file sizes are respected * fix: Improve robustness for streams split over multiple cores * wip * wip * wip * wip * refactor: Improve robustness * refactor: Improve concurrent writer robustness for memory bounds * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/format.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update vortex-datafusion/src/tests/nested_projection.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * review: Address comments * Update vortex-datafusion/src/persistent/sink.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * review: Address comments * Revert "Update vortex-datafusion/src/persistent/sink.rs" This reverts commit 6c3be0f. * refactor: Simplify * test: Add file writer finish notify test * review: Address comments --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update cargo.lock --------- Co-authored-by: William <98815791+peasee@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.