Commit fb85e35
fix: Prefer numeric in type coercion for comparisons (#20426)
## Which issue does this PR close?
- Closes #15161.
## Rationale for this change
In a comparison between a numeric column and a string literal (e.g.,
`WHERE int_col < '10'`), we previously coerced the numeric column to be
a string type. This resulted in doing a lexicographic comparison, which
results in incorrect query results.
Instead, we split type coercion into two situations: type coercion for
comparisons (including `IN` lists, `BETWEEN`, and `CASE WHEN`), where we
want string->numeric coercion, and type coercion for places like `UNION`
or `CASE ... THEN/ELSE`, where DataFusion's traditional behavior has
been to tolerate type mismatching by coercing values to strings.
Here is a (not necessarily exhaustive) summary of the behavioral changes
(old -> new):
```
Comparisons (=, <, >, etc.):
float_col = '5' : string (wrong: '5'!='5.0') -> numeric
int_col > '100' : string (wrong: '325'<'100') -> numeric
int_col = 'hello' : string, always false -> cast error
int_col = '' : string, always false -> cast error
int_col = '99.99' : string, always false -> cast error
Dict(Int) = '5' : string -> numeric
REE(Int) = '5' : string -> numeric
struct(int)=struct(str): int field to Utf8 -> str field to int
IN lists:
float_col IN ('1.0') : string (wrong: '1.0'!='1') -> numeric
str_col IN ('a', 1) : coerce to Utf8 -> coerce to Int64
CASE:
CASE str WHEN float : coerce to Utf8 -> coerce to Float
LIKE / regex:
Dict(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int)
REE(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int)
Dict(Int) ~ '5' : coerce to Utf8 -> error (matches int)
REE(Int) ~ '5' : error (no REE) -> error (REE added)
REE(Utf8) ~ '5' : error (no REE) -> works (REE added)
```
## What changes are included in this PR?
* Update `comparison_coercion` to coerce strings to numerics
* Remove previous `comparison_coercion_numeric` function
* Add a new function, `type_union_coercion`, and use it when appropriate
* Add support for REE types with regexp operators (this was unsupported
for no good reason I can see)
* Add unit and SLT tests for new coercion behavior
* Update existing SLT tests for changes in coercion behavior
* Fix the ClickBench unparser tests to avoid comparing int fields with
non-numeric string literals
## Are these changes tested?
Yes. New tests added, existing tests pass.
## Are there any user-facing changes?
Yes, see table above. In most cases the new behavior should be more
sensible and less error-prone, but it will likely break some user code.
---------
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>1 parent 777ecb8 commit fb85e35
File tree
24 files changed
+1065
-393
lines changed- datafusion
- core
- src
- tests
- expr_api
- sql
- expr-common/src
- type_coercion
- binary/tests
- expr/src/type_coercion
- functions/src/core
- optimizer/src/analyzer
- physical-expr/src/expressions
- sqllogictest/test_files
- string
- substrait/tests/cases
- docs/source/library-user-guide/upgrading
24 files changed
+1065
-393
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3574 | 3574 | | |
3575 | 3575 | | |
3576 | 3576 | | |
3577 | | - | |
3578 | | - | |
| 3577 | + | |
| 3578 | + | |
3579 | 3579 | | |
3580 | 3580 | | |
3581 | 3581 | | |
3582 | | - | |
3583 | 3582 | | |
3584 | 3583 | | |
3585 | 3584 | | |
3586 | | - | |
3587 | | - | |
3588 | | - | |
3589 | | - | |
| 3585 | + | |
3590 | 3586 | | |
3591 | | - | |
| 3587 | + | |
3592 | 3588 | | |
3593 | 3589 | | |
3594 | 3590 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
342 | 342 | | |
343 | 343 | | |
344 | 344 | | |
345 | | - | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
346 | 348 | | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
359 | 365 | | |
360 | 366 | | |
361 | 367 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
145 | 149 | | |
146 | 150 | | |
147 | 151 | | |
148 | | - | |
| 152 | + | |
149 | 153 | | |
150 | 154 | | |
151 | 155 | | |
152 | 156 | | |
153 | | - | |
154 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
155 | 165 | | |
156 | 166 | | |
157 | 167 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
730 | 730 | | |
731 | 731 | | |
732 | 732 | | |
733 | | - | |
| 733 | + | |
734 | 734 | | |
735 | 735 | | |
736 | 736 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
| 161 | + | |
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| |||
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
187 | | - | |
| 187 | + | |
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
195 | | - | |
| 195 | + | |
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
200 | | - | |
| 200 | + | |
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
208 | | - | |
| 208 | + | |
| 209 | + | |
209 | 210 | | |
210 | 211 | | |
211 | | - | |
| 212 | + | |
212 | 213 | | |
213 | 214 | | |
214 | | - | |
| 215 | + | |
215 | 216 | | |
216 | 217 | | |
217 | 218 | | |
218 | | - | |
| 219 | + | |
219 | 220 | | |
220 | 221 | | |
221 | 222 | | |
| |||
233 | 234 | | |
234 | 235 | | |
235 | 236 | | |
236 | | - | |
| 237 | + | |
237 | 238 | | |
238 | 239 | | |
239 | 240 | | |
| |||
246 | 247 | | |
247 | 248 | | |
248 | 249 | | |
249 | | - | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
| |||
0 commit comments