Commit f5b3811
committed
perf: optimize the json newline scanning
This is an alternative approach to
#19687
Instead of reading the entire range in the json FileOpener, implement an
AlignedBoundaryStream which scans the range for newlines as the
FileStream requests data from the stream, by wrapping the original
stream returned by the ObjectStore.
This eliminated the overhead of the extra two get_opts requests needed
by calculate_range and more importantly, it allows for efficient
read-ahead implementations by the underlying ObjectStore. Previously
this was inefficient because the streams opened by calculate_range
included a stream from (start - 1) to file_size and another one from
(end - 1) to end_of_file, just to find the two relevant newlines.1 parent fd145c4 commit f5b3811
6 files changed
Lines changed: 1176 additions & 28 deletions
File tree
- datafusion/datasource-json
- src
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
188 | | - | |
| 188 | + | |
189 | 189 | | |
190 | 190 | | |
191 | 191 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
| |||
0 commit comments