Commit fe7c749
fix: correct NOT STARTS WITH projection for truncated partitions (#3528)
closes #3493
## Summary
Fixes incorrect projection of `NOT STARTS WITH` predicates for truncated
string/binary partition fields. The current implementation unsafely
truncates the filter literal without checking its length relative to the
truncate width.
## Root Cause
The `TruncateTransform.project` method calls `_truncate_array` which
blindly truncates the literal for both `STARTS WITH` and `NOT STARTS
WITH` predicates:
```python
elif isinstance(pred, BoundNotStartsWith):
return NotStartsWith(Reference(name), _transform_literal(transform, boundary))
```
For `NOT STARTS WITH "hello"` with `truncate[2]`, this produces:
- Current (unsafe): `NOT STARTS WITH "he"`
- Problem: The truncated partition contains all values starting with
"he" (from "hello", "heat", "hear", etc.), so we cannot safely exclude
all non-"hello" rows
## Solution
Add special handling for `BoundNotStartsWith` in the `project` method
following the Java/Go reference behavior:
- **prefix_length < truncate_width**: Keep original `NOT STARTS WITH`
literal (safe)
- **prefix_length == truncate_width**: Project to `!=` instead (safe
equality check)
- **prefix_length > truncate_width**: Return `None` (no inclusive
projection possible)
### pyiceberg/transforms.py
- Add explicit `NOT STARTS WITH` handling before calling
`_truncate_array`
- Check literal length vs truncate width and apply correct projection
rules
### tests/test_transforms.py
- Update `test_projection_truncate_string_not_starts_with` to expect
`None` (prefix_length > width is unsafe)
- Add `test_projection_truncate_string_not_starts_with_shorter_literal`
(prefix_length == width → `!=`)
- Add `test_projection_truncate_string_not_starts_with_original_literal`
(prefix_length < width → original)
## Validation
- `make lint` ✓ (all pre-commit hooks pass)
- `pytest tests/test_transforms.py` → 280 passed ✓
- All 13 string truncate projection tests pass
---------
Co-authored-by: Gayathri Srividya Rajavarapu <gayathrir@Gayathris-MacBook-Air.local>
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>1 parent a38bbe3 commit fe7c749
2 files changed
Lines changed: 27 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
813 | 813 | | |
814 | 814 | | |
815 | 815 | | |
816 | | - | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
817 | 826 | | |
818 | 827 | | |
819 | 828 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1025 | 1025 | | |
1026 | 1026 | | |
1027 | 1027 | | |
1028 | | - | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
1029 | 1043 | | |
1030 | | - | |
1031 | | - | |
| 1044 | + | |
| 1045 | + | |
1032 | 1046 | | |
1033 | 1047 | | |
1034 | 1048 | | |
| |||
0 commit comments