Skip to content

[flink][spark] Partition pruning gives wrong results for non-STRING partition keys with range predicates #3292

@fresh-borzoni

Description

@fresh-borzoni

Search before asking

  • I searched in the issues and found nothing similar.

Description

When pushing down partition filters, both the Spark and Flink connectors stringify everything, predicate literals and partition values alike, before evaluating. That makes equality work fine, but range comparisons fall back to string order.

So with an INT partition column and partitions pt=2, pt=10, a query WHERE pt > 2 lexicographically compares "10" < "2" and drops pt=10. Rows go missing.

We can build the partition row with typed values using the existing PartitionUtils.parseValueOfType and skip stringification for both Flink/Spark.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions