Skip to content

Commit f2455fb

Browse files
committed
test: Add comprehensive multi-column IN (LeftSemi) tests
Add test coverage for multi-column IN subqueries to verify that the struct expression support works correctly for both negated (NOT IN) and non-negated (IN) cases. Tests added to subquery.slt: - Test 1: Basic two-column IN - Test 2: Multi-column IN with no matches - Test 3: Multi-column IN with NULL values (verifies non-null-aware behavior) - Test 4: Three-column IN - Test 5: Correlated multi-column IN - Test 6: Verify logical plan shows LeftSemi with multiple join conditions - Test 7: Multi-column IN with empty subquery - Test 8: Multi-column IN with WHERE clause in subquery These tests complement the multi-column NOT IN tests in null_aware_anti_join.slt and verify that struct decomposition (converting `(a, b) IN (SELECT x, y ...)` into `a = x AND b = y`) works correctly for LeftSemi joins. Key differences from NOT IN: - IN uses LeftSemi join (not null-aware) - IN does not use CollectLeft partition mode - NULL values don't match in regular semi joins (two-valued logic) Related to multi-column null-aware anti join implementation.
1 parent 9c6e98f commit f2455fb

File tree

1 file changed

+193
-0
lines changed

1 file changed

+193
-0
lines changed

datafusion/sqllogictest/test_files/subquery.slt

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1528,3 +1528,196 @@ logical_plan
15281528
20)--------SubqueryAlias: set_cmp_s
15291529
21)----------Projection: column1 AS v
15301530
22)------------Values: (Int64(5)), (Int64(NULL))
1531+
1532+
#############
1533+
## Multi-column IN (LeftSemi) Tests
1534+
#############
1535+
## These tests verify that multi-column IN subqueries work correctly
1536+
## Multi-column IN uses LeftSemi join (not null-aware)
1537+
1538+
#############
1539+
## Test 1: Basic two-column IN
1540+
#############
1541+
1542+
statement ok
1543+
CREATE TABLE multi_in_left(a INT, b INT, value TEXT) AS VALUES
1544+
(1, 2, 'match'),
1545+
(3, 4, 'no_match'),
1546+
(5, 6, 'match');
1547+
1548+
statement ok
1549+
CREATE TABLE multi_in_right(x INT, y INT) AS VALUES
1550+
(1, 2),
1551+
(5, 6);
1552+
1553+
# Should return rows where (a, b) matches (x, y)
1554+
query IIT rowsort
1555+
SELECT * FROM multi_in_left
1556+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right);
1557+
----
1558+
1 2 match
1559+
5 6 match
1560+
1561+
#############
1562+
## Test 2: Multi-column IN with no matches
1563+
#############
1564+
1565+
statement ok
1566+
CREATE TABLE multi_in_right_no_match(x INT, y INT) AS VALUES
1567+
(10, 20),
1568+
(30, 40);
1569+
1570+
# Should return empty result
1571+
query IIT rowsort
1572+
SELECT * FROM multi_in_left
1573+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right_no_match);
1574+
----
1575+
1576+
#############
1577+
## Test 3: Multi-column IN with NULL values
1578+
#############
1579+
## Note: Unlike NOT IN, regular IN does NOT use null-aware semantics
1580+
## NULL = NULL is always FALSE (not unknown) in regular semi joins
1581+
1582+
statement ok
1583+
CREATE TABLE multi_in_left_null(a INT, b INT, value TEXT) AS VALUES
1584+
(1, 2, 'x'),
1585+
(3, NULL, 'y'),
1586+
(NULL, 6, 'z');
1587+
1588+
statement ok
1589+
CREATE TABLE multi_in_right_null(x INT, y INT) AS VALUES
1590+
(1, 2),
1591+
(NULL, 4);
1592+
1593+
# Should return only (1, 2, 'x')
1594+
# (3, NULL, 'y') doesn't match because NULL doesn't equal anything
1595+
# (NULL, 6, 'z') doesn't match because NULL doesn't equal anything
1596+
query IIT rowsort
1597+
SELECT * FROM multi_in_left_null
1598+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right_null);
1599+
----
1600+
1 2 x
1601+
1602+
#############
1603+
## Test 4: Three-column IN
1604+
#############
1605+
1606+
statement ok
1607+
CREATE TABLE three_col_left(a INT, b INT, c INT, value TEXT) AS VALUES
1608+
(1, 2, 3, 'match1'),
1609+
(4, 5, 6, 'no_match'),
1610+
(7, 8, 9, 'match2');
1611+
1612+
statement ok
1613+
CREATE TABLE three_col_right(x INT, y INT, z INT) AS VALUES
1614+
(1, 2, 3),
1615+
(7, 8, 9);
1616+
1617+
# Should return rows with matching three-column tuples
1618+
query IIIT rowsort
1619+
SELECT * FROM three_col_left
1620+
WHERE (a, b, c) IN (SELECT x, y, z FROM three_col_right);
1621+
----
1622+
1 2 3 match1
1623+
7 8 9 match2
1624+
1625+
#############
1626+
## Test 5: Correlated multi-column IN
1627+
#############
1628+
1629+
statement ok
1630+
CREATE TABLE correlated_outer(id INT, a INT, b INT) AS VALUES
1631+
(1, 10, 20),
1632+
(2, 30, 40),
1633+
(3, 10, 20);
1634+
1635+
statement ok
1636+
CREATE TABLE correlated_inner(id INT, x INT, y INT) AS VALUES
1637+
(1, 10, 20),
1638+
(1, 30, 40),
1639+
(2, 50, 60),
1640+
(3, 10, 20);
1641+
1642+
# Should return outer rows where (a, b) matches (x, y) for the same id
1643+
query III rowsort
1644+
SELECT * FROM correlated_outer o
1645+
WHERE (a, b) IN (
1646+
SELECT x, y FROM correlated_inner i WHERE i.id = o.id
1647+
);
1648+
----
1649+
1 10 20
1650+
3 10 20
1651+
1652+
#############
1653+
## Test 6: Verify logical plan shows LeftSemi join with multiple conditions
1654+
#############
1655+
1656+
query TT
1657+
EXPLAIN SELECT * FROM multi_in_left
1658+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right);
1659+
----
1660+
logical_plan
1661+
01)LeftSemi Join: multi_in_left.a = __correlated_sq_1.x, multi_in_left.b = __correlated_sq_1.y
1662+
02)--TableScan: multi_in_left projection=[a, b, value]
1663+
03)--SubqueryAlias: __correlated_sq_1
1664+
04)----TableScan: multi_in_right projection=[x, y]
1665+
1666+
#############
1667+
## Test 7: Multi-column IN with empty subquery
1668+
#############
1669+
1670+
statement ok
1671+
CREATE TABLE multi_in_right_empty(x INT, y INT);
1672+
1673+
# Should return empty result (empty subquery)
1674+
query IIT rowsort
1675+
SELECT * FROM multi_in_left
1676+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right_empty);
1677+
----
1678+
1679+
#############
1680+
## Test 8: Multi-column IN with WHERE clause in subquery
1681+
#############
1682+
1683+
# Should return only rows matching filtered subquery
1684+
query IIT rowsort
1685+
SELECT * FROM multi_in_left
1686+
WHERE (a, b) IN (SELECT x, y FROM multi_in_right WHERE x > 0 AND y < 10);
1687+
----
1688+
1 2 match
1689+
5 6 match
1690+
1691+
#############
1692+
## Cleanup
1693+
#############
1694+
1695+
statement ok
1696+
DROP TABLE multi_in_left;
1697+
1698+
statement ok
1699+
DROP TABLE multi_in_right;
1700+
1701+
statement ok
1702+
DROP TABLE multi_in_right_no_match;
1703+
1704+
statement ok
1705+
DROP TABLE multi_in_left_null;
1706+
1707+
statement ok
1708+
DROP TABLE multi_in_right_null;
1709+
1710+
statement ok
1711+
DROP TABLE three_col_left;
1712+
1713+
statement ok
1714+
DROP TABLE three_col_right;
1715+
1716+
statement ok
1717+
DROP TABLE correlated_outer;
1718+
1719+
statement ok
1720+
DROP TABLE correlated_inner;
1721+
1722+
statement ok
1723+
DROP TABLE multi_in_right_empty;

0 commit comments

Comments
 (0)