Commit ffd1f91
committed
Fix strict metrics evaluator over-pruning files for NotEqualTo/NotIn with partial nulls
_StrictMetricsEvaluator.visit_not_equal and visit_not_in short-circuited on
_can_contain_nulls / _can_contain_nans (null/NaN count > 0) and returned
ROWS_MUST_MATCH without checking the value bounds. A file holding any null or
NaN was therefore reported as fully matching the predicate, even when a
non-null value inside the bounds did not match.
This drives _DeleteFiles (table/update/snapshot.py): ROWS_MUST_MATCH drops the
whole data file without rewriting it. So delete(NotEqualTo("x", 5)) against a
file with stats [null, 5] and bounds lower=upper=5 would delete the entire
file, silently losing the row with value 5 that should have survived.
Every other strict ROWS_MUST_MATCH path already guards on the "only" variants
(_contains_nulls_only / _contains_nans_only), matching the reference
StrictMetricsEvaluator. Switch both methods to the same guard so that an
all-null/all-NaN column still short-circuits to ROWS_MUST_MATCH (those rows
satisfy not-equal/not-in), while a partially-null column falls through to the
bounds check.
Update the existing NotIn-on-some-nulls test that encoded the buggy result and
add a regression test covering the [null, value] / bounds-include-literal case
for both NotEqualTo and NotIn.
Fixes #3498 (partially)1 parent d101879 commit ffd1f91
2 files changed
Lines changed: 45 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1668 | 1668 | | |
1669 | 1669 | | |
1670 | 1670 | | |
1671 | | - | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
1672 | 1675 | | |
1673 | 1676 | | |
1674 | 1677 | | |
| |||
1728 | 1731 | | |
1729 | 1732 | | |
1730 | 1733 | | |
1731 | | - | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
1732 | 1738 | | |
1733 | 1739 | | |
1734 | 1740 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1523 | 1523 | | |
1524 | 1524 | | |
1525 | 1525 | | |
1526 | | - | |
| 1526 | + | |
1527 | 1527 | | |
1528 | 1528 | | |
1529 | 1529 | | |
1530 | 1530 | | |
1531 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
1532 | 1568 | | |
1533 | 1569 | | |
1534 | 1570 | | |
| |||
0 commit comments