Skip to content

Commit 43657dd

Browse files
committed
Add upgrade notes for change
1 parent 7667bc1 commit 43657dd

File tree

1 file changed

+62
-0
lines changed
  • docs/source/library-user-guide/upgrading

1 file changed

+62
-0
lines changed

docs/source/library-user-guide/upgrading/53.0.0.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,68 @@ let sort_proto = serialize_physical_sort_expr(
440440
);
441441
```
442442

443+
### String/numeric comparison coercion now prefers numeric types
444+
445+
Previously, comparing a numeric column with a string value (e.g.,
446+
`WHERE int_col > '100'`) coerced both sides to strings and performed a
447+
lexicographic comparison. This produced incorrect results — for example,
448+
`325 > '100'` was `false` under string comparison because `'3' < '1'` is
449+
`false` lexicographically.
450+
451+
DataFusion now coerces the string side to the numeric type in comparison
452+
contexts (`=`, `<`, `>`, `<=`, `>=`, `<>`, `IN`, `BETWEEN`, `CASE .. WHEN`).
453+
454+
**Who is affected:**
455+
456+
- Queries that compare numeric values with string values
457+
- Queries that use `IN` lists with mixed string and numeric types
458+
- Queries that use `CASE expr WHEN` with mixed string and numeric types
459+
460+
**Behavioral changes (old → new):**
461+
462+
| Expression | Old behavior | New behavior |
463+
|---|---|---|
464+
| `int_col > '100'` | Lexicographic (incorrect) | Numeric (correct) |
465+
| `float_col = '5'` | String `'5' != '5.0'` (incorrect) | Numeric `5.0 = 5.0` (correct) |
466+
| `int_col = 'hello'` | String comparison, always false | Cast error |
467+
| `str_col IN ('a', 1)` | Coerce to Utf8 | Cast error (`'a'` cannot be cast to Int64) |
468+
| `float_col IN ('1.0')` | String `'1.0' != '1'` (incorrect) | Numeric `1.0 = 1.0` (correct) |
469+
| `CASE str_col WHEN 1.0` | Coerce to Utf8 | Coerce to Float64 |
470+
| `SELECT 1 UNION SELECT 'a'` | Coerce to Utf8 | Coerce to Utf8 (unchanged) |
471+
472+
**Migration guide:**
473+
474+
Most queries will produce more correct results with no changes needed.
475+
However, queries that relied on the old string-comparison behavior may need
476+
adjustment:
477+
478+
- **Queries comparing numeric columns with non-numeric strings** (e.g.,
479+
`int_col = 'hello'` or `int_col > text_col` where `text_col` contains
480+
non-numeric values) will now produce a cast error instead of silently
481+
returning no rows.
482+
- **Mixed-type `IN` lists** (e.g., `str_col IN ('a', 1)`) are now rejected. Use
483+
consistent types in the list, or add an explicit `CAST`.
484+
- **Queries comparing integer columns with decimal strings** (e.g.,
485+
`int_col = '99.99'`) will now produce a cast error because `'99.99'`
486+
cannot be cast to an integer. Use a float column or adjust the literal.
487+
488+
See [#15161](https://github.com/apache/datafusion/issues/15161) and
489+
[PR #20426](https://github.com/apache/datafusion/pull/20426) for details.
490+
491+
### `comparison_coercion_numeric` removed, replaced by `comparison_coercion`
492+
493+
The `comparison_coercion_numeric` function has been removed. Its behavior
494+
(preferring numeric types for string/numeric comparisons) is now the default in
495+
`comparison_coercion`. A new function, `type_union_coercion`, handles contexts
496+
where string types are preferred (`UNION`, `CASE THEN/ELSE`, `NVL2`).
497+
498+
**Who is affected:**
499+
500+
- Crates that call `comparison_coercion_numeric` directly
501+
- Crates that call `comparison_coercion` and relied on its old
502+
string-preferring behavior
503+
- Crates that call `get_coerce_type_for_case_expression`
504+
443505
### `generate_series` and `range` table functions changed
444506

445507
The `generate_series` and `range` table functions now return an empty set when the interval is invalid, instead of an error.

0 commit comments

Comments
 (0)