refactor(shim): add new shim infra with enableIfVer macro to avoid shims duplication#4657
refactor(shim): add new shim infra with enableIfVer macro to avoid shims duplication#4657rluvaton wants to merge 5 commits into
enableIfVer macro to avoid shims duplication#4657Conversation
9f6e6c7 to
4e36d4f
Compare
|
@andygrove can you please review? I think this will improve the scala code quality greatly |
|
Maybe @mbutrovich can look at it as well? |
|
Thanks @rluvaton. I will take a look at this soon, but I am concerned that this is more complex than the current approach. How well does this approach work with IDEs, such as IntelliJ? |
|
+0 for this, I prefer the separated source directories approach; introducing potentially incompatible classes or methods into a single source file seems more difficult to maintain. |
I think this will: Reduce line countyou don't need to duplicate shims code when they are the same Make it easier to readWhen you look at specific shim directory like Adding shims from specific spark version and upfor example, with this pr you could write: @enableIfVer(spark = ">= 3.5.1")
def isHashPartitioningLike(partitioning: Partitioning): Boolean = {
import org.apache.spark.sql.catalyst.plans.physical.HashPartitioningLike
partitioning match {
case hashLike: HashPartitioningLike => true
case par => false
}
}
@enableIfVer(spark = "< 3.5.1")
def isHashPartitioningLike(partitioning: Partitioning): Boolean = {
import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
partitioning match {
case hash: HashPartitioning => true
case par => false
}
} |
you can use wildcard imports to get around it in most cases or scope the imports to the used code. |
Note
This does not remove all the shims in favor of the new infra as it will be very large PR
instead it moves some stuff to show how it is being used and the process of doing so
A small macro-annotation library that lets a single source file carry code for multiple
Spark versions. You annotate a member (
def/class/object) with a version range, and at *compile time* the macro keeps it, drops it, or tweaks it based on the version the build is
targeting.
The problem it solves
Version-specific code used to live in parallel source folders (
src/main/spark-3.4/…,spark-3.5/…,spark-4.0/…) - so a whole file got copied per version just because one method differed, or because a type likeWindowGroupLimitExeconly exists in Spark 3.5+.@enableIfVerlets those differences live inline, in one file: no duplication, and the version logic sits right next to the code it gates.How you use it
The targeted versions are supplied by the build per "dimension" (
spark). You write a semver rangeThe four behaviors
@enableIfVer@implementIfVer@enableOverrideIfVeroverride(for a member that only overrides on some versions)Ranges
Matched by semver4j: full
major.minor.patchversions with>>=<<==!=, space = AND,||= OR,A - Bhyphen ranges, x-ranges / partials and more (FYI, it supportspackage.jsonformat)What it gives us
version) is removed before type-checking, so each build only ever sees valid code.
"not enabled" state - all from one unified
@enableIfVernamespace.Caveats
@enableIfVerannotates definitions only - not statements (e.g. a baretest(...)call). Togate a statement, wrap it in a
defand annotate that.limitation). For whole top-level classes that are entirely version-specific, the version-specific
source folder is still the right tool.
Apache Auron (Blaze)
most of the annotation code taken from Blaze/auron in sparkver.scala but updated the syntax of matching versions and replace
@sparkverEnableMemberswith@implementIfVerthat remove the inheritance as well - useful for abstract classBlaze uses
/to split between supported versions so for example like this:This however use semver match syntax that is more widely known so the same code for us will be:
and if you want to automatically have this function for every spark version above 3.0 and not only until 4.1, you can do:
the syntax is matching the one used in
package.jsonalso, Auron name is
sparkver, I namedenableIfVerinstead so future shims (like iceberg/delta/etc) can use the same infra