Skip to content

Commit da05287

Browse files
RatulDawaralamb
andauthored
Fix FileStream scanning_total to include sync next-file open time (#20627)
## Summary - include synchronous `start_next_file()` / `FileOpener::open()` setup time in `time_elapsed_scanning_total` - keep existing `time_opening` and scanning timers lifecycle intact - avoid timer overlap by scoping the temporary timer before calling `time_scanning_total.start()` ## Details In `FileStreamState::Open`, `start_next_file()` is invoked before `time_scanning_total.start()`. If `open()` performs synchronous work before returning the future, that time was previously unaccounted for in `time_elapsed_scanning_total`. This change wraps the `start_next_file()` call in a scoped timer on the same `time_scanning_total` metric so the missing segment is recorded. - Fixes #20571 ## Validation I tested by reading CSV files via AWS S3. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 4bac1cf commit da05287

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

datafusion/datasource/src/file_stream.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,15 @@ impl FileStream {
127127
self.file_stream_metrics.files_opened.add(1);
128128
// include time needed to start opening in `start_next_file`
129129
self.file_stream_metrics.time_opening.stop();
130-
let next = self.start_next_file().transpose();
130+
let next = {
131+
let scanning_total_metric = self
132+
.file_stream_metrics
133+
.time_scanning_total
134+
.metrics
135+
.clone();
136+
let _timer = scanning_total_metric.timer();
137+
self.start_next_file().transpose()
138+
};
131139
self.file_stream_metrics.time_scanning_until_data.start();
132140
self.file_stream_metrics.time_scanning_total.start();
133141

0 commit comments

Comments
 (0)