Skip to content

Commit 393507d

Browse files
schenksjclaude
andcommitted
feat(phase-7): Spark 3.4/4.0 Delta version matrix + TPC-DS docs
Adds delta-spark test dependencies for all three Spark profiles: - spark-3.4: delta-spark 2.4.0 (Delta 2.x, no DV support) - spark-3.5: delta-spark 3.3.2 (Delta 3.x, full DV/column-mapping) - spark-4.0: delta-spark 4.0.0 (Delta 4.x, experimental) Each profile includes the failureaccess companion dep and Spark/Hadoop exclusions matching the spark-3.5 pattern. CI workflow updated to test all three Spark versions in matrix. Also adds TPC-DS plan stability fixture generation docs to the contributor guide (procedure for adding q*.native_delta_compat golden files). Documentation updated: Spark 3.4/4.0 no longer listed as unsupported, now listed with their Delta version constraints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2461794 commit 393507d

4 files changed

Lines changed: 73 additions & 2 deletions

File tree

.github/workflows/delta_spark_test.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,10 @@ jobs:
103103
matrix:
104104
os: [ubuntu-24.04]
105105
java-version: [17]
106-
spark-version: [{short: '3.5', full: '3.5.8'}]
106+
spark-version:
107+
- {short: '3.4', full: '3.4.3'}
108+
- {short: '3.5', full: '3.5.8'}
109+
- {short: '4.0', full: '4.0.1'}
107110
fail-fast: false
108111
name: delta-native/${{ matrix.os }}/spark-${{ matrix.spark-version.full }}/java-${{ matrix.java-version }}
109112
runs-on: ${{ matrix.os }}

docs/source/contributor-guide/delta-spark-tests.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,24 @@ python benchmarks/tpc/tpcbench.py \
102102

103103
Engine configs are in `benchmarks/tpc/engines/comet-delta.toml` and
104104
`benchmarks/tpc/engines/comet-delta-hashjoin.toml`.
105+
106+
## TPC-DS Plan Stability Fixtures
107+
108+
To generate the `q*.native_delta_compat/` plan stability golden files:
109+
110+
1. Add `CometConf.SCAN_NATIVE_DELTA_COMPAT` to the `scanImpls` list in
111+
`CometPlanStabilitySuite.scala` (line 66).
112+
113+
2. Ensure TPC-DS data exists as Delta tables (see `create-delta-tables.py`
114+
in the Benchmarks section above).
115+
116+
3. Generate golden files:
117+
118+
```bash
119+
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark \
120+
-Dsuites=org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite \
121+
test -Pspark-3.5 -Dmaven.gitcommitid.skip
122+
```
123+
124+
4. Commit the generated `q*.native_delta_compat/extended.txt` files under
125+
`spark/src/test/resources/tpcds-plan-stability/approved-plans-*`.

docs/source/user-guide/latest/delta.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,4 +126,5 @@ The following scenarios will fall back to Spark's native Delta reader:
126126
- Tables with `rowTracking` enabled
127127
- Change Data Feed (`readChangeFeed`) queries
128128
- The `_metadata.row_index` virtual column
129-
- Spark 3.4 and 4.0 (currently tested on Spark 3.5 only)
129+
- Spark 3.4 uses Delta 2.4.x (DVs not supported in Delta 2.x; simpler feature set)
130+
- Spark 4.0 uses Delta 4.0.x (experimental)

spark/pom.xml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,29 @@ under the License.
218218
<version>9.4.53.v20231009</version>
219219
<scope>test</scope>
220220
</dependency>
221+
<!-- Delta 2.4.x targets Spark 3.4.x -->
222+
<dependency>
223+
<groupId>io.delta</groupId>
224+
<artifactId>delta-spark_${scala.binary.version}</artifactId>
225+
<version>2.4.0</version>
226+
<scope>test</scope>
227+
<exclusions>
228+
<exclusion>
229+
<groupId>org.apache.spark</groupId>
230+
<artifactId>*</artifactId>
231+
</exclusion>
232+
<exclusion>
233+
<groupId>org.apache.hadoop</groupId>
234+
<artifactId>*</artifactId>
235+
</exclusion>
236+
</exclusions>
237+
</dependency>
238+
<dependency>
239+
<groupId>com.google.guava</groupId>
240+
<artifactId>failureaccess</artifactId>
241+
<version>1.0.2</version>
242+
<scope>test</scope>
243+
</dependency>
221244
</dependencies>
222245
</profile>
223246

@@ -299,6 +322,29 @@ under the License.
299322
<version>11.0.24</version>
300323
<scope>test</scope>
301324
</dependency>
325+
<!-- Delta 4.0.x targets Spark 4.0.x -->
326+
<dependency>
327+
<groupId>io.delta</groupId>
328+
<artifactId>delta-spark_${scala.binary.version}</artifactId>
329+
<version>4.0.0</version>
330+
<scope>test</scope>
331+
<exclusions>
332+
<exclusion>
333+
<groupId>org.apache.spark</groupId>
334+
<artifactId>*</artifactId>
335+
</exclusion>
336+
<exclusion>
337+
<groupId>org.apache.hadoop</groupId>
338+
<artifactId>*</artifactId>
339+
</exclusion>
340+
</exclusions>
341+
</dependency>
342+
<dependency>
343+
<groupId>com.google.guava</groupId>
344+
<artifactId>failureaccess</artifactId>
345+
<version>1.0.2</version>
346+
<scope>test</scope>
347+
</dependency>
302348
</dependencies>
303349
</profile>
304350
<profile>

0 commit comments

Comments
 (0)