refactor(shim): add new shim infra with `enableIfVer` macro to avoid shims duplication by rluvaton · Pull Request #4657 · apache/datafusion-comet

rluvaton · 2026-06-15T14:00:58Z

Note

This does not remove all the shims in favor of the new infra as it will be very large PR
instead it moves some stuff to show how it is being used and the process of doing so

A small macro-annotation library that lets a single source file carry code for multiple
Spark versions. You annotate a member (def / class / object) with a version range, and at *
compile time* the macro keeps it, drops it, or tweaks it based on the version the build is
targeting.

The problem it solves

Version-specific code used to live in parallel source folders (src/main/spark-3.4/…, spark-3.5/…, spark-4.0/…) - so a whole file got copied per version just because one method differed, or because a type like WindowGroupLimitExec only exists in Spark 3.5+. @enableIfVer lets those differences live inline, in one file: no duplication, and the version logic sits right next to the code it gates.

How you use it

The targeted versions are supplied by the build per "dimension" (spark). You write a semver range

import org.apache.comet.enableIfVer

// keep this method only on Spark 3.5+
@enableIfVer(spark = ">=3.5.0")
def onlyOn35Plus(): Unit =
...

// two same-named methods, one per version range - exactly one survives per build
@enableIfVer(spark = "<3.5.0") def isWGL(p: SparkPlan): Boolean = false
@enableIfVer(spark = ">=3.5.0") def isWGL(p: SparkPlan): Boolean = p.isInstanceOf[WindowGroupLimitExec]

The four behaviors

Annotation	On a non-matching version
`@enableIfVer`	drops the member entirely
`@implementIfVer`	keeps the class/object but empties its body and remove the inheritance (so a type that references 3.5-only symbols still exists on 3.4)
`@enableOverrideIfVer`	strips `override` (for a member that only overrides on some versions)

Ranges

Matched by semver4j: full major.minor.patch versions with > >= < <= = !=, space = AND, || = OR, A - B hyphen ranges, x-ranges / partials and more (FYI, it supports package.json format)

What it gives us

No more per-version source-folder duplication for small differences.
One file, version logic inline - easier to read and maintain.
Compile-time safety: non-matching code (and its references to types that don't exist in that
version) is removed before type-checking, so each build only ever sees valid code.
Extensible & general: any number of dimensions, combined with AND/OR, plus an explicit
"not enabled" state - all from one unified @enableIfVer namespace.

Caveats

@enableIfVer annotates definitions only - not statements (e.g. a bare test(...) call). To
gate a statement, wrap it in a def and annotate that.
It cannot be applied to a top-level class that has a companion object (a macro-paradise
limitation). For whole top-level classes that are entirely version-specific, the version-specific
source folder is still the right tool.

Apache Auron (Blaze)

most of the annotation code taken from Blaze/auron in sparkver.scala but updated the syntax of matching versions and replace @sparkverEnableMembers with @implementIfVer that remove the inheritance as well - useful for abstract class

Blaze uses / to split between supported versions so for example like this:

@sparkver("3.0 / 3.1 / 3.2 / 3.3 / 3.4 / 3.5 / 4.0 / 4.1")
override def simpleString(maxFields: Int): String =
  s"$nodeName (${basedHiveScan.simpleString(maxFields)})"

This however use semver match syntax that is more widely known so the same code for us will be:

@enableIfVer(spark = ">=3.0.0 <4.2.0")
override def simpleString(maxFields: Int): String =
  s"$nodeName (${basedHiveScan.simpleString(maxFields)})"

and if you want to automatically have this function for every spark version above 3.0 and not only until 4.1, you can do:

@enableIfVer(spark = ">=3.0.0")
override def simpleString(maxFields: Int): String =
  s"$nodeName (${basedHiveScan.simpleString(maxFields)})"

the syntax is matching the one used in package.json

also, Auron name is sparkver, I named enableIfVer instead so future shims (like iceberg/delta/etc) can use the same infra

…im duplication

rluvaton · 2026-06-16T13:43:29Z

@andygrove can you please review? I think this will improve the scala code quality greatly

rluvaton · 2026-06-16T15:55:11Z

Maybe @mbutrovich can look at it as well?

andygrove · 2026-06-16T18:50:50Z

Thanks @rluvaton. I will take a look at this soon, but I am concerned that this is more complex than the current approach. How well does this approach work with IDEs, such as IntelliJ?

wForget · 2026-06-17T02:29:59Z

+0 for this, I prefer the separated source directories approach; introducing potentially incompatible classes or methods into a single source file seems more difficult to maintain.

rluvaton · 2026-06-17T10:18:20Z

Thanks @rluvaton. I will take a look at this soon, but I am concerned that this is more complex than the current approach.

I think this will:

Reduce line count

you don't need to duplicate shims code when they are the same

Make it easier to read

When you look at specific shim directory like spark 3.5 you should also pay attention to the shim directory spark 3.x instead of just looking at the shim code and understand right away

Adding shims from specific spark version and up

for example, HashPartitioningLike was added in spark 4 and backported to spark 3.5 and spark 3.4 and released in spark 3.5.1
in the current approach, how would you create a function that check if spark partitioning is hash based if the HashPartitioningLike was not backported to spark 3.4 which you still have shims for?

with this pr you could write:

@enableIfVer(spark = ">= 3.5.1")
def isHashPartitioningLike(partitioning: Partitioning): Boolean = {
  import org.apache.spark.sql.catalyst.plans.physical.HashPartitioningLike
  
  partitioning match {
    case hashLike: HashPartitioningLike => true
    case par => false
  }
}

@enableIfVer(spark = "< 3.5.1")
def isHashPartitioningLike(partitioning: Partitioning): Boolean = {
  import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
  
  partitioning match {
    case hash: HashPartitioning => true
    case par => false
  }
}

rluvaton · 2026-06-17T10:19:51Z

+0 for this, I prefer the separated source directories approach; introducing potentially incompatible classes or methods into a single source file seems more difficult to maintain.

you can use wildcard imports to get around it in most cases or scope the imports to the used code.

refactor(shim): add new shim infra with enableIfVer macro to avoid sh…

4e36d4f

…im duplication

rluvaton force-pushed the add-annotation-based-shims branch from 9f6e6c7 to 4e36d4f Compare June 15, 2026 14:01

fix ci

66576cf

rluvaton requested review from andygrove and comphead and removed request for comphead June 15, 2026 15:22

rluvaton added 2 commits June 15, 2026 19:29

fix ci

f3c6e53

include pre-release

f917793

rluvaton requested review from comphead and mbutrovich June 15, 2026 17:13

fix ci

4720c2e

rluvaton mentioned this pull request Jun 16, 2026

suggestion to improve spark shims - use annotation like Apache auron to match different spark versions apache/iceberg#16835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(shim): add new shim infra with `enableIfVer` macro to avoid shims duplication#4657

refactor(shim): add new shim infra with `enableIfVer` macro to avoid shims duplication#4657
rluvaton wants to merge 5 commits into
apache:mainfrom
rluvaton:add-annotation-based-shims

rluvaton commented Jun 15, 2026 •

edited

Loading

Uh oh!

rluvaton commented Jun 16, 2026

Uh oh!

rluvaton commented Jun 16, 2026

Uh oh!

andygrove commented Jun 16, 2026

Uh oh!

wForget commented Jun 17, 2026

Uh oh!

rluvaton commented Jun 17, 2026 •

edited

Loading

Uh oh!

rluvaton commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rluvaton commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem it solves

How you use it

The four behaviors

Ranges

What it gives us

Caveats

Apache Auron (Blaze)

Uh oh!

rluvaton commented Jun 16, 2026

Uh oh!

rluvaton commented Jun 16, 2026

Uh oh!

andygrove commented Jun 16, 2026

Uh oh!

wForget commented Jun 17, 2026

Uh oh!

rluvaton commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reduce line count

Make it easier to read

Adding shims from specific spark version and up

Uh oh!

rluvaton commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rluvaton commented Jun 15, 2026 •

edited

Loading

rluvaton commented Jun 17, 2026 •

edited

Loading