@@ -440,6 +440,68 @@ let sort_proto = serialize_physical_sort_expr(
440440);
441441```
442442
443+ ### String/numeric comparison coercion now prefers numeric types
444+
445+ Previously, comparing a numeric column with a string value (e.g.,
446+ ` WHERE int_col > '100' ` ) coerced both sides to strings and performed a
447+ lexicographic comparison. This produced incorrect results — for example,
448+ ` 325 > '100' ` was ` false ` under string comparison because ` '3' < '1' ` is
449+ ` false ` lexicographically.
450+
451+ DataFusion now coerces the string side to the numeric type in comparison
452+ contexts (` = ` , ` < ` , ` > ` , ` <= ` , ` >= ` , ` <> ` , ` IN ` , ` BETWEEN ` , ` CASE .. WHEN ` ).
453+
454+ ** Who is affected:**
455+
456+ - Queries that compare numeric values with string values
457+ - Queries that use ` IN ` lists with mixed string and numeric types
458+ - Queries that use ` CASE expr WHEN ` with mixed string and numeric types
459+
460+ ** Behavioral changes (old → new):**
461+
462+ | Expression | Old behavior | New behavior |
463+ | ---| ---| ---|
464+ | ` int_col > '100' ` | Lexicographic (incorrect) | Numeric (correct) |
465+ | ` float_col = '5' ` | String ` '5' != '5.0' ` (incorrect) | Numeric ` 5.0 = 5.0 ` (correct) |
466+ | ` int_col = 'hello' ` | String comparison, always false | Cast error |
467+ | ` str_col IN ('a', 1) ` | Coerce to Utf8 | Cast error (` 'a' ` cannot be cast to Int64) |
468+ | ` float_col IN ('1.0') ` | String ` '1.0' != '1' ` (incorrect) | Numeric ` 1.0 = 1.0 ` (correct) |
469+ | ` CASE str_col WHEN 1.0 ` | Coerce to Utf8 | Coerce to Float64 |
470+ | ` SELECT 1 UNION SELECT 'a' ` | Coerce to Utf8 | Coerce to Utf8 (unchanged) |
471+
472+ ** Migration guide:**
473+
474+ Most queries will produce more correct results with no changes needed.
475+ However, queries that relied on the old string-comparison behavior may need
476+ adjustment:
477+
478+ - ** Queries comparing numeric columns with non-numeric strings** (e.g.,
479+ ` int_col = 'hello' ` or ` int_col > text_col ` where ` text_col ` contains
480+ non-numeric values) will now produce a cast error instead of silently
481+ returning no rows.
482+ - ** Mixed-type ` IN ` lists** (e.g., ` str_col IN ('a', 1) ` ) are now rejected. Use
483+ consistent types in the list, or add an explicit ` CAST ` .
484+ - ** Queries comparing integer columns with decimal strings** (e.g.,
485+ ` int_col = '99.99' ` ) will now produce a cast error because ` '99.99' `
486+ cannot be cast to an integer. Use a float column or adjust the literal.
487+
488+ See [ #15161 ] ( https://github.com/apache/datafusion/issues/15161 ) and
489+ [ PR #20426 ] ( https://github.com/apache/datafusion/pull/20426 ) for details.
490+
491+ ### ` comparison_coercion_numeric ` removed, replaced by ` comparison_coercion `
492+
493+ The ` comparison_coercion_numeric ` function has been removed. Its behavior
494+ (preferring numeric types for string/numeric comparisons) is now the default in
495+ ` comparison_coercion ` . A new function, ` type_union_coercion ` , handles contexts
496+ where string types are preferred (` UNION ` , ` CASE THEN/ELSE ` , ` NVL2 ` ).
497+
498+ ** Who is affected:**
499+
500+ - Crates that call ` comparison_coercion_numeric ` directly
501+ - Crates that call ` comparison_coercion ` and relied on its old
502+ string-preferring behavior
503+ - Crates that call ` get_coerce_type_for_case_expression `
504+
443505### ` generate_series ` and ` range ` table functions changed
444506
445507The ` generate_series ` and ` range ` table functions now return an empty set when the interval is invalid, instead of an error.
0 commit comments