Skip to content

ci: split spark_sql_test workflow per Spark version#4408

Merged
andygrove merged 3 commits into
apache:mainfrom
andygrove:split-spark-sql-workflows
May 22, 2026
Merged

ci: split spark_sql_test workflow per Spark version#4408
andygrove merged 3 commits into
apache:mainfrom
andygrove:split-spark-sql-workflows

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 22, 2026

Summary

Part of #4406

  • Splits the monolithic spark_sql_test.yml (a 7-module x 4-version matrix) into one workflow per Spark version.
  • Shared job logic lives in a new spark_sql_test_reusable.yml invoked via workflow_call; each per-version caller (spark_sql_test_3_4.yml, _3_5.yml, _4_0.yml, _4_1.yml) is ~70 lines and just supplies spark-short, spark-full, and java.
  • Spark 3.5 and 4.0 run on every PR and on main. Spark 3.4 and 4.1 run on main only, plus on-demand on a PR when a committer adds the label run-spark-3.4-tests or run-spark-4.1-tests (works for fork PRs too).

Each version is now its own check in PRs, which makes it easier to spot/re-run a single version. The build-native job runs once per workflow invocation (so up to 4x on a main push) but is fully cached.

Followups (not in this PR)

  • Create the run-spark-3.4-tests and run-spark-4.1-tests labels in this repo so the on-demand triggers can be applied. GitHub does not auto-create labels referenced from a workflow.

Test plan

  • actionlint passes on all 5 new workflow files (verified locally with actionlint -color --shellcheck=off).
  • After merge, the Spark 3.5 and Spark 4.0 workflows appear as checks on subsequent PRs.
  • After merge, all 4 workflows run on the next eligible push to main.
  • Adding the run-spark-3.4-tests label to a PR triggers the Spark 3.4 workflow against that PR; same for run-spark-4.1-tests.
  • Manually dispatch each per-version workflow with collect-fallback-logs=true and confirm the merged artifact uploads with the version-namespaced name.

andygrove added 2 commits May 22, 2026 11:27
Replace the single spark_sql_test.yml (which fanned out a 7-module x
4-version matrix) with one workflow per Spark version that delegates
to a reusable spark_sql_test_reusable.yml. Spark 3.5 and 4.0 run on
PRs and main; Spark 3.4 and 4.1 run on main only.
Add a `pull_request: types: [labeled]` trigger to the Spark 3.4 and
4.1 workflows, gated on the labels `run-spark-3.4-tests` and
`run-spark-4.1-tests` respectively. Lets a committer opt a PR into
the older / newer Spark coverage without pushing the contributor's
branch into apache/datafusion-comet.
@andygrove andygrove marked this pull request as ready for review May 22, 2026 17:37
Comment on lines +61 to +67
if: github.event_name != 'pull_request' || github.event.label.name == 'run-spark-3.4-tests'
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '3.4'
spark-full: '3.4.3'
java: 11
collect-fallback-logs: ${{ github.event.inputs.collect-fallback-logs == 'true' }}
Comment on lines +76 to +81
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '3.5'
spark-full: '3.5.8'
java: 11
collect-fallback-logs: ${{ github.event.inputs.collect-fallback-logs == 'true' }}
Comment on lines +76 to +81
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '4.0'
spark-full: '4.0.2'
java: 21
collect-fallback-logs: ${{ github.event.inputs.collect-fallback-logs == 'true' }}
Comment on lines +61 to +67
if: github.event_name != 'pull_request' || github.event.label.name == 'run-spark-4.1-tests'
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '4.1'
spark-full: '4.1.1'
java: 17
collect-fallback-logs: ${{ github.event.inputs.collect-fallback-logs == 'true' }}
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@andygrove andygrove merged commit 2da24da into apache:main May 22, 2026
94 checks passed
@andygrove andygrove deleted the split-spark-sql-workflows branch May 22, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants