chore: Cast module refactor boolean module#3491
Conversation
|
@andygrove , @comphead , This is the first part of refactoring out cast ops by data type (and move common functions to utilities). I also went ahead and added tests and benchmarks. Subsequent PRs will follow a similar approach with refactor and all the cast ops are static dispatch with room for additional standardization in the future |
|
CI passed. I believe the PR is ready for your review @andygrove , @comphead |
|
Rebased with main and moved cast_boolean_to_decimal to boolean cast file |
|
Thanks @coderfender. Could you fill in the PR description to cover what is included in this PR, since it looks like a combination of functional changes, refactoring, tests, and benchmarks. I am wondering if would be better to just do the refactoring in this PR and then follow up with a separate PR for the functional changes? |
|
@andygrove , thank you for the review. Flow on main branch :
Feature
Let me create a new PR just for the functional changes / bug fixes to keep changes isolated. However, it might be easier to first merge the functional changes and then continue refactoring IMO |
19d5899 to
b9d75d5
Compare
|
@andygrove , thank you for removing dead cast code in comet. Given that we removed dead cast code, I believe this PR is ready for a review . I was wondering if you would prefer splitting the PR into smaller PRs (one for benchmarks, another for tests, and another for utils) or would you think we can go ahead with this one PR ? |
72ac5f6 to
aac9fc6
Compare
| ), | ||
| DataType::Int8 | DataType::Int16 | DataType::Int32 | DataType::Int64 => { | ||
| matches!( | ||
| is_identity_cast(from_type, to_type) |
There was a problem hiding this comment.
What is the reason for extracting from_type == to_type into a dedicated function that is only called here?
There was a problem hiding this comment.
This was largely due to epeated cast check (no longer an issue after your changes to remove deadcode are merged to main) . I will go ahead and remove this
9cfed4e to
a629848
Compare
a629848 to
05ece88
Compare
| | DataType::Float64 | ||
| | DataType::Utf8 | ||
| ), | ||
| DataType::Boolean => is_df_cast_from_bool_spark_compatible(to_type), |
There was a problem hiding this comment.
This is much easier to review now 👍
There was a problem hiding this comment.
Thank you . I will stick to only code refactoring and plan other cleanups / enhancements in different PRs
andygrove
left a comment
There was a problem hiding this comment.
LGTM pending CI. Thanks @coderfender
|
Thank you for the approval @andygrove |
Which issue does this PR close?
Part 1 of #3459
Rationale for this change
Cast module is currently 3700 LOC and the goal is to refactor this into separate files based on datatype
What changes are included in this PR?
Functional changes :
can_cast_to_booleanwould not support cast toUtf8but the matches macro did. This was mostly due to code duplication among various modules. Alsocan_cast_from_booleanto make it consistentUpdate : No functional changes are contained in this PR
Refactoring Changes
(From) Boolean refactor:
Code :
Unit tests:
Benchmarks:
Created new benchmarks to assess performance
Utils.rs
spark_cast_postprocess(), cast_overflow(), invalid_value()toutils.rsNo functional changes have been made and only code is moved around to the right modules
How are these changes tested?