Skip to content

Commit 56e1477

Browse files
Peter Liuclaude
andcommitted
Cache Spark download in CI to speed up builds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9cf152c commit 56e1477

1 file changed

Lines changed: 15 additions & 1 deletion

File tree

.github/workflows/base.yml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,18 +30,32 @@ jobs:
3030
distribution: "corretto"
3131
java-version: "17"
3232

33+
- name: Cache Spark and Deequ JAR
34+
id: cache-spark
35+
uses: actions/cache@v4
36+
with:
37+
path: |
38+
spark-3.5.0-bin-hadoop3
39+
deequ_2.12-2.1.0b-spark-3.5.jar
40+
key: spark-3.5.0-deequ-2.1.0b
41+
3342
- name: Download Spark 3.5
43+
if: steps.cache-spark.outputs.cache-hit != 'true'
3444
run: |
3545
curl -L -o spark-3.5.0-bin-hadoop3.tgz \
3646
https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
3747
tar -xzf spark-3.5.0-bin-hadoop3.tgz
38-
echo "SPARK_HOME=$PWD/spark-3.5.0-bin-hadoop3" >> $GITHUB_ENV
48+
rm spark-3.5.0-bin-hadoop3.tgz
3949
4050
- name: Download Deequ JAR
51+
if: steps.cache-spark.outputs.cache-hit != 'true'
4152
run: |
4253
curl -L -o deequ_2.12-2.1.0b-spark-3.5.jar \
4354
https://github.com/awslabs/python-deequ/releases/download/v2.0.0b1/deequ_2.12-2.1.0b-spark-3.5.jar
4455
56+
- name: Set SPARK_HOME
57+
run: echo "SPARK_HOME=$PWD/spark-3.5.0-bin-hadoop3" >> $GITHUB_ENV
58+
4559
- name: Install Python dependencies
4660
run: |
4761
uv pip install -e ".[dev]" --system

0 commit comments

Comments
 (0)