Skip to content

Commit ae3e8c7

Browse files
adriangbclaude
andcommitted
test(sort-pushdown): cover DESC multi-row-group files with a shared min
Add Test 8b: a DESC scan over a file whose row groups share the same min value (`[10,8,8,8]` → row groups `[10,8]` and `[8,8]`), listed out of order on disk so the Inexact→Exact upgrade fires and SortExec is eliminated. Without the fix this returns `8,8,10,8,...` (id=10 in third position): the opener reorders row groups ASC-by-min then reverses, mis-ordering the two min=8 row groups, and the eliminated SortExec no longer masks it. The test asserts both the clean plan (no runtime reorder hints) and the correct DESC result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4eeced5 commit ae3e8c7

1 file changed

Lines changed: 72 additions & 0 deletions

File tree

datafusion/sqllogictest/test_files/sort_pushdown.slt

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1486,6 +1486,78 @@ SELECT * FROM desc_reversed_parquet ORDER BY id DESC;
14861486
2 200
14871487
1 100
14881488

1489+
# Test 8b: DESC with multiple row groups per file sharing a min value.
1490+
# Regression test for the Inexact→Exact upgrade: when SortExec is eliminated
1491+
# the files must be read in natural order. The opener's runtime row-group
1492+
# reorder (sort ASC-by-min then reverse) mis-orders two row groups in one file
1493+
# that share the same min — so the upgrade must NOT leave those hints active.
1494+
#
1495+
# File b_high is DESC-sorted [10,8,8,8] written with 2 rows per row group:
1496+
# RG0 = [10, 8] (min 8, max 10)
1497+
# RG1 = [ 8, 8] (min 8, max 8)
1498+
# Both row groups have min=8. Naively reordering RGs ASC-by-min then reversing
1499+
# yields [RG1, RG0] → 8,8,10,8 (wrong). Natural order [RG0, RG1] is correct.
1500+
1501+
statement ok
1502+
CREATE TABLE rg_desc_high(id INT, value INT) AS VALUES (10, 100), (8, 801), (8, 802), (8, 803);
1503+
1504+
statement ok
1505+
CREATE TABLE rg_desc_low(id INT, value INT) AS VALUES (3, 300), (2, 200), (1, 100);
1506+
1507+
query I
1508+
COPY (SELECT * FROM rg_desc_high ORDER BY id DESC)
1509+
TO 'test_files/scratch/sort_pushdown/rg_desc/b_high.parquet'
1510+
OPTIONS ('format.max_row_group_size' '2');
1511+
----
1512+
4
1513+
1514+
query I
1515+
COPY (SELECT * FROM rg_desc_low ORDER BY id DESC)
1516+
TO 'test_files/scratch/sort_pushdown/rg_desc/a_low.parquet'
1517+
OPTIONS ('format.max_row_group_size' '2');
1518+
----
1519+
3
1520+
1521+
# Files named so filesystem order [a_low, b_high] is wrong for DESC → the
1522+
# Inexact path fires, stats reorder makes file groups [b_high, a_low]
1523+
# non-overlapping, and the upgrade eliminates SortExec.
1524+
statement ok
1525+
CREATE EXTERNAL TABLE rg_desc_parquet(id INT, value INT)
1526+
STORED AS PARQUET
1527+
LOCATION 'test_files/scratch/sort_pushdown/rg_desc/'
1528+
WITH ORDER (id DESC);
1529+
1530+
# SortExec eliminated, files reordered, NO sort_order_for_reorder /
1531+
# reverse_row_groups (natural read is correct after the upgrade).
1532+
query TT
1533+
EXPLAIN SELECT id FROM rg_desc_parquet ORDER BY id DESC;
1534+
----
1535+
logical_plan
1536+
01)Sort: rg_desc_parquet.id DESC NULLS FIRST
1537+
02)--TableScan: rg_desc_parquet projection=[id]
1538+
physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/sort_pushdown/rg_desc/b_high.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/sort_pushdown/rg_desc/a_low.parquet]]}, projection=[id], output_ordering=[id@0 DESC], file_type=parquet
1539+
1540+
# Results must be in DESC order — id=10 first.
1541+
query I
1542+
SELECT id FROM rg_desc_parquet ORDER BY id DESC;
1543+
----
1544+
10
1545+
8
1546+
8
1547+
8
1548+
3
1549+
2
1550+
1
1551+
1552+
statement ok
1553+
DROP TABLE rg_desc_parquet;
1554+
1555+
statement ok
1556+
DROP TABLE rg_desc_high;
1557+
1558+
statement ok
1559+
DROP TABLE rg_desc_low;
1560+
14891561
# Test 9: Multi-column sort key validation
14901562
# Files have (category, id) ordering. Files share a boundary value on category='B'
14911563
# so column-level min/max statistics overlap on the primary key column.

0 commit comments

Comments
 (0)