Skip to content

Commit f5b3811

Browse files
committed
perf: optimize the json newline scanning
This is an alternative approach to #19687 Instead of reading the entire range in the json FileOpener, implement an AlignedBoundaryStream which scans the range for newlines as the FileStream requests data from the stream, by wrapping the original stream returned by the ObjectStore. This eliminated the overhead of the extra two get_opts requests needed by calculate_range and more importantly, it allows for efficient read-ahead implementations by the underlying ObjectStore. Previously this was inefficient because the streams opened by calculate_range included a stream from (start - 1) to file_size and another one from (end - 1) to end_of_file, just to find the two relevant newlines.
1 parent fd145c4 commit f5b3811

6 files changed

Lines changed: 1176 additions & 28 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ strum = "0.28.0"
185185
strum_macros = "0.28.0"
186186
tempfile = "3"
187187
testcontainers-modules = { version = "0.15" }
188-
tokio = { version = "1.48", features = ["macros", "rt", "sync"] }
188+
tokio = { version = "1.48", features = ["macros", "rt", "sync", "fs"] }
189189
tokio-stream = "0.1"
190190
tokio-util = "0.7"
191191
url = "2.5.7"

datafusion/datasource-json/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ futures = { workspace = true }
4646
object_store = { workspace = true }
4747
tokio = { workspace = true }
4848
tokio-stream = { workspace = true, features = ["sync"] }
49+
tokio-util = { workspace = true, features = ["io"] }
4950

5051
# Note: add additional linter rules in lib.rs.
5152
# Rust does not support workspace + new linter rules in subcrates yet

0 commit comments

Comments
 (0)