Skip to content

Commit 8a48a87

Browse files
ariel-miculasmartin-galamb
authored
perf: optimize object store requests when reading JSON (#20823)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change This is an alternative approach to - #19687 Instead of reading the entire range in the json FileOpener, implement an AlignedBoundaryStream which scans the range for newlines as the FileStream requests data from the stream, by wrapping the original stream returned by the ObjectStore. This eliminated the overhead of the extra two get_opts requests needed by calculate_range and more importantly, it allows for efficient read-ahead implementations by the underlying ObjectStore. Previously this was inefficient because the streams opened by calculate_range included a stream from `(start - 1)` to file_size and another one from `(end - 1)` to end_of_file, just to find the two relevant newlines. ## What changes are included in this PR? Added the AlignedBoundaryStream which wraps a stream returned by the object store and finds the delimiting newlines for a particular file range. Notably it doesn't do any standalone reads (unlike the calculate_range function), eliminating two calls to get_opts. ## Are these changes tested? Yes, added unit tests. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 7c3b22c commit 8a48a87

File tree

4 files changed

+1672
-34
lines changed

4 files changed

+1672
-34
lines changed

0 commit comments

Comments
 (0)