Commit 4820e97
* Add Apache Spark 4.0 support (#787)
Adds an in-tree `frameless-*-spark40` module set targeting Spark 4.0.2,
cross-built for Scala 2.13 only (Spark 4 dropped 2.12) and requiring JDK 17.
No external shim dependency: version-divergent Catalyst access is isolated
behind FramelessInternals in a `src/main/spark-4` source overlay, mirroring
the existing spark-3 / spark-3.4+ pattern.
Key adaptations for Spark 4:
- Column no longer wraps a Catalyst Expression; bridge through
classic.ExpressionUtils.column and an eager ColumnNodeToExpressionConverter
(the lazy ColumnNodeExpression is Unevaluable and hides children, which broke
self-join disambiguation and codegen).
- Dataset/SparkSession split into abstract API + classic impl; internal
helpers downcast to classic for logicalPlan/sessionState/sqlContext.
- ExpressionEncoder now takes a leading AgnosticEncoder (SPARK-49025); supply a
metadata-only JavaBeanEncoder stand-in carrying the right ClassTag.
- AnalysisException is errorClass-based; MapGroups gets a spark-4 variant.
- joinCross re-encodes its result via TypedExpressionEncoder, consistent with
the other joins.
- Hide the new catalyst expressions.With from TypedColumn's wildcard import.
Test harness: disable ANSI mode (Spark 4 default) so the property generators
keep their wrap-around/null semantics, and strip field metadata in
SchemaTests. All changes are no-ops on Spark 3.x.
CI: add a JDK 17 leg and pin root-spark40 to Scala 2.13 / JDK 17.
dataset-spark40 passes 414/414 tests; verified end-to-end on a 2-worker
standalone Spark 4.0.2 cluster (groupBy/agg, self-join, joinWith, executor
closures) to confirm cross-node serialization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Apply scalafmt formatting
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Make docs/mdoc build on JDK 17 (site CI job)
Adding a JDK 17 CI leg for Spark 4 made sbt-typelevel run the Generate Site
job on JDK 17 (it picks the last configured Java). mdoc executes Spark code,
which needs the module --add-opens flags on JDK 17. Fork the docs run, pass the
flags through (extracted into sparkJava17Options, shared with the test config),
and pin the forked run's working directory to the repo root so docs keep
finding their relative data files (docs/iris.data).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add MiMa filters for FramelessInternals compat-seam changes
The Spark 4 port reworked FramelessInternals (internal version-compat plumbing,
not intended public API): `column` is now the Expression->Column bridge and
`mkDataset` derives the session from the source Dataset instead of taking a
SQLContext. Both are binary-incompatible signature changes flagged by MiMa
against the 3.x baselines (0.14.0/0.14.1), so exclude them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix Scala 2.12 scaladoc: use backticks instead of [[]] links
Scala 2.12's scaladoc fails (fatally) on [[Expression]] / [[Column]] /
[[ExpressionEncoder]] doc links in FramelessInternals because those Spark types
aren't resolvable in the doc scope. Use backticks (code spans) instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add value-level self-join regression test
The existing self-join tests only compare row counts. This collects and
verifies the decoded (T, U) tuples through the colLeft/colRight disambiguation
path - a regression guard for the Spark 4 ColumnNode rework, which broke that
path (only count-level coverage would have missed a subtly wrong decode).
Passes unchanged on Spark 3.x.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Keep imports closer to source in TypedExpressionEncoder
Revert the opinionated merge of the standalone `import ...Encoder` into a
braced group; add FramelessInternals as a separate plain import instead.
scalafmt does not merge imports, so this stays linter-clean while staying
closer to the original source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Keep Spark 3.5 the default version, drop Spark 3.3 artifacts
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Grigory Pomadchin <grigory.pomadchin@disneystreaming.com>
1 parent 502bb54 commit 4820e97
19 files changed
Lines changed: 3834 additions & 2283 deletions
File tree
- .github/workflows
- dataset/src
- main
- scala
- frameless
- ops
- org/apache/spark/sql
- spark-3.4+/org/apache/spark/sql
- spark-3/org/apache/spark/sql
- spark-4
- frameless
- org/apache/spark/sql
- test
- scala/frameless
- forward
- spark-3.3+/frameless/sql/rules
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
| 34 | + | |
| 35 | + | |
36 | 36 | | |
37 | | - | |
38 | | - | |
39 | 37 | | |
40 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
41 | 47 | | |
42 | 48 | | |
43 | 49 | | |
| |||
62 | 68 | | |
63 | 69 | | |
64 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
65 | 84 | | |
66 | 85 | | |
67 | 86 | | |
| |||
115 | 134 | | |
116 | 135 | | |
117 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
118 | 150 | | |
119 | 151 | | |
120 | 152 | | |
| |||
169 | 201 | | |
170 | 202 | | |
171 | 203 | | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
172 | 217 | | |
173 | 218 | | |
174 | 219 | | |
175 | | - | |
| 220 | + | |
176 | 221 | | |
177 | 222 | | |
178 | 223 | | |
179 | 224 | | |
180 | 225 | | |
181 | 226 | | |
182 | 227 | | |
183 | | - | |
| 228 | + | |
184 | 229 | | |
185 | 230 | | |
186 | 231 | | |
| |||
204 | 249 | | |
205 | 250 | | |
206 | 251 | | |
207 | | - | |
208 | | - | |
209 | | - | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
210 | 255 | | |
211 | 256 | | |
212 | 257 | | |
213 | | - | |
| 258 | + | |
214 | 259 | | |
215 | 260 | | |
216 | 261 | | |
217 | | - | |
| 262 | + | |
218 | 263 | | |
219 | 264 | | |
220 | 265 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| 50 | + | |
| 51 | + | |
49 | 52 | | |
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
56 | | - | |
57 | | - | |
58 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
59 | 62 | | |
60 | 63 | | |
61 | 64 | | |
| |||
0 commit comments