Skip to content

Commit c2824b5

Browse files
RatulDawar2010YOUY01rluvaton
authored
test(sqllogictest): stabilize parquet output_rows_skew with WITH ORDER (apache#21898)
## Summary Adds `WITH ORDER (x)` to `CREATE EXTERNAL TABLE skew_parquet` / `skew_parquet_single` in `explain_analyze.slt` so `FileScanConfig` preserves scan ordering (`preserve_order`), keeping per-partition `output_rows` stable under dynamic file scheduling (PR apache#21351). ## Related - Follow-up to flaky skew assertions discussed around apache#21866 / apache#21850. ## Testing - `cargo test -p datafusion-sqllogictest --test sqllogictests -- explain_analyze` (recommended before merge) Sqllogictest-only change. Made with [Cursor](https://cursor.com) --------- Co-authored-by: Yongting You <2010youy01@gmail.com> Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
1 parent e18b1cf commit c2824b5

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

datafusion/sqllogictest/test_files/explain_analyze.slt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,12 @@ EXPLAIN ANALYZE SELECT * FROM generate_series(100);
2424
Plan with Metrics LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=<slt:ignore>, output_bytes=<slt:ignore>]
2525

2626
# --------------------------------------------
27-
# Test parquet-only output_rows_skew metric
27+
# Test parquet-only output_rows_skew metric (WITH ORDER(x) → preserve_order scan)
2828
# --------------------------------------------
2929

30+
# Parquet supports dynamic work scheduling by default; per-partition data
31+
# sources might steal work from siblings, making row counts per partition
32+
# non-deterministic. Forcing `WITH ORDER` disables this behavior.
3033
statement ok
3134
set datafusion.explain.analyze_level = dev;
3235

@@ -48,6 +51,7 @@ STORED AS PARQUET;
4851
statement ok
4952
CREATE EXTERNAL TABLE skew_parquet
5053
STORED AS PARQUET
54+
WITH ORDER (x)
5155
LOCATION 'test_files/scratch/explain_analyze/output_rows_skew';
5256

5357
# All partition's output_rows: [4]
@@ -70,6 +74,7 @@ STORED AS PARQUET;
7074
statement ok
7175
CREATE EXTERNAL TABLE skew_parquet
7276
STORED AS PARQUET
77+
WITH ORDER (x)
7378
LOCATION 'test_files/scratch/explain_analyze/output_rows_skew';
7479

7580
query TT
@@ -98,6 +103,7 @@ STORED AS PARQUET;
98103
statement ok
99104
CREATE EXTERNAL TABLE skew_parquet
100105
STORED AS PARQUET
106+
WITH ORDER (x)
101107
LOCATION 'test_files/scratch/explain_analyze/output_rows_skew';
102108

103109
query TT
@@ -109,6 +115,7 @@ Plan with Metrics DataSourceExec: <slt:ignore>output_rows_skew=84.31%<slt:ignore
109115
statement ok
110116
CREATE EXTERNAL TABLE skew_parquet_single
111117
STORED AS PARQUET
118+
WITH ORDER (x)
112119
LOCATION 'test_files/scratch/explain_analyze/output_rows_skew/f4.parquet';
113120

114121
query TT

0 commit comments

Comments
 (0)