Skip to content

Incorrect cast of integer columns to utf8 when comparing with utf8 constant #15161

@scsmithr

Description

@scsmithr

Describe the bug

A comparison like column1 < '10' (where column1 is an int64) will cast column1 to utf8 instead of casting the utf8 constant to an integer.

Typically string constants in a sql query are treated as unknown, and preference should be on casting the "unknown" value to a target type.

To Reproduce

DataFusion CLI v46.0.0
> create table t1 as (values (1), (2), (3));
0 row(s) fetched. 
Elapsed 0.026 seconds.

> select * from t1 where column1 < '10';
+---------+
| column1 |
+---------+
| 1       |
+---------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> select * from t1 where column1 < 'hello';
+---------+
| column1 |
+---------+
| 1       |
| 2       |
| 3       |
+---------+
3 row(s) fetched. 
Elapsed 0.007 seconds.

> select arrow_typeof(column1) from t1 limit 1;
+--------------------------+
| arrow_typeof(t1.column1) |
+--------------------------+
| Int64                    |
+--------------------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

> explain select * from t1 where column1 < '10';
+---------------+-------------------------------------------------------+
| plan_type     | plan                                                  |
+---------------+-------------------------------------------------------+
| logical_plan  | Filter: CAST(t1.column1 AS Utf8) < Utf8("10")         |
|               |   TableScan: t1 projection=[column1]                  |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192           |
|               |   FilterExec: CAST(column1@0 AS Utf8) < 10            |
|               |     DataSourceExec: partitions=1, partition_sizes=[1] |
|               |                                                       |
+---------------+-------------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.009 seconds.

Expected behavior

column1 not to be cast to a string.

Postgres output:

postgres=# create table t1 as (values (1), (2), (3));
SELECT 3
postgres=# select * from t1 where column1 < '10';
 column1 
---------
       1
       2
       3
(3 rows)

postgres=# select * from t1 where column1 < 'hello';
ERROR:  invalid input syntax for type integer: "hello"
LINE 1: select * from t1 where column1 < 'hello';

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions