Skip to content

ci: gate long-running jobs behind ubuntu-slim jobs#4494

Merged
andygrove merged 6 commits into
apache:mainfrom
mbutrovich:gate_long_workflows
May 29, 2026
Merged

ci: gate long-running jobs behind ubuntu-slim jobs#4494
andygrove merged 6 commits into
apache:mainfrom
mbutrovich:gate_long_workflows

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented May 28, 2026

Which issue does this PR close?

Related to #4406.

Rationale for this change

Long-running jobs (PR builds, Spark and Iceberg test matrices, docs deploy) currently fire in parallel with the cheap correctness checks (RAT, prettier, missing-suites, actionlint). A formatting or license-header failure surfaces only after a runner has already been claimed for tests, which wastes capacity and delays signal. Gating the heavy jobs on the cheap ones short-circuits the pipeline on trivial failures.

What changes are included in this PR?

  • New umbrella ci.yml with three phases: preflight (RAT, prettier, missing-suites, actionlint on ubuntu-slim), changes (path-filter compute, gated on preflight), and a fan-out of every long workflow gated on its changes output and event/label criteria.
  • New dev/ci/compute-changes.py replaces dorny/paths-filter (not on the apache org actions allow list). It reads a git diff --name-only listing and emits one <name>=true|false per heavy job, using the same picomatch-style include/exclude semantics as the prior YAML filters. Stdlib only.
  • Existing path filters, label gates (run-spark-3.4-tests, run-spark-4.1-tests), and main-only restrictions for Iceberg 1.8/1.9 preserved verbatim, just relocated into the umbrella and the compute-changes.py FILTERS dict.
  • pr_build_linux.yml, pr_build_macos.yml, pr_benchmark_check.yml, and docs.yaml converted to workflow_call reusables.
  • Eleven standalone short-check and per-version trigger workflows deleted.
  • pr_title_check.yml left standalone because it relies on pull_request.types: [edited], which the umbrella does not handle.
  • .github/workflows/README.md added with a flow diagram and a "what runs when" table.

How are these changes tested?

Existing tests.

Comment thread .github/workflows/ci.yml
Comment on lines +45 to +87
name: Preflight
runs-on: ubuntu-slim
steps:
- uses: actions/checkout@v6

- name: Set up Java
uses: actions/setup-java@v5
with:
distribution: temurin
java-version: 11

- name: Apache RAT license check
run: ./mvnw -B -N apache-rat:check

- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '24'

- name: Install prettier
run: npm install -g prettier

- name: Check markdown formatting
run: prettier --check "**/*.md"

- name: Check missing suites
run: python3 dev/ci/check-suites.py

- name: Install actionlint
run: |
curl -sSfL https://raw.githubusercontent.com/rhysd/actionlint/main/scripts/download-actionlint.bash | bash
echo "$PWD" >> $GITHUB_PATH

- name: Lint GitHub Actions workflows
run: actionlint -color --shellcheck=off

# ---------------------------------------------------------------------------
# changes: compute which long jobs need to run for this event. Replaces the
# per-workflow `on: paths:` filters that used to gate triggering. On
# workflow_dispatch we force every output true so a manual run can exercise
# any gated job.
# ---------------------------------------------------------------------------
changes:
Comment thread .github/workflows/ci.yml
Comment on lines +88 to +339
name: Detect changes
needs: preflight
runs-on: ubuntu-slim
outputs:
build_linux: ${{ steps.compute.outputs.build_linux }}
build_macos: ${{ steps.compute.outputs.build_macos }}
benchmark: ${{ steps.compute.outputs.benchmark }}
docs: ${{ steps.compute.outputs.docs }}
spark_3_4: ${{ steps.compute.outputs.spark_3_4 }}
spark_3_5: ${{ steps.compute.outputs.spark_3_5 }}
spark_4_0: ${{ steps.compute.outputs.spark_4_0 }}
spark_4_1: ${{ steps.compute.outputs.spark_4_1 }}
iceberg_1_8: ${{ steps.compute.outputs.iceberg_1_8 }}
iceberg_1_9: ${{ steps.compute.outputs.iceberg_1_9 }}
iceberg_1_10: ${{ steps.compute.outputs.iceberg_1_10 }}
steps:
- uses: actions/checkout@v6

- name: Run paths filter
id: filter
if: github.event_name != 'workflow_dispatch'
uses: dorny/paths-filter@v3
with:
filters: |
build_linux:
- "native/**"
- "common/**"
- "spark/**"
- "spark-integration/**"
- "pom.xml"
- "**/pom.xml"
- ".mvn/**"
- "mvnw"
- "Makefile"
- "rust-toolchain.toml"
- "dev/ci/**"
- ".github/workflows/ci.yml"
- ".github/workflows/pr_build_linux.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/java-test/**"
- ".github/actions/rust-test/**"
- "!**.md"
- "!native/core/benches/**"
- "!native/spark-expr/benches/**"
- "!spark/src/test/scala/org/apache/spark/sql/benchmark/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
build_macos:
- "native/**"
- "common/**"
- "spark/**"
- "spark-integration/**"
- "pom.xml"
- "**/pom.xml"
- ".mvn/**"
- "mvnw"
- "Makefile"
- "rust-toolchain.toml"
- "dev/ci/**"
- ".github/workflows/ci.yml"
- ".github/workflows/pr_build_macos.yml"
- ".github/actions/setup-macos-builder/**"
- ".github/actions/java-test/**"
- "!**.md"
- "!native/core/benches/**"
- "!native/spark-expr/benches/**"
- "!spark/src/test/scala/org/apache/spark/sql/benchmark/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
benchmark:
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
docs:
- ".asf.yaml"
- ".github/workflows/ci.yml"
- ".github/workflows/docs.yaml"
- "docs/**"
spark_3_4:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/spark-3.5/**"
- "!spark/src/main/spark-4.0/**"
- "!spark/src/main/spark-4.1/**"
- "!spark/src/main/spark-4.2/**"
- "!spark/src/main/spark-4.x/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/3.4.3.diff"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/spark_sql_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-spark-builder/**"
spark_3_5:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/spark-3.4/**"
- "!spark/src/main/spark-4.0/**"
- "!spark/src/main/spark-4.1/**"
- "!spark/src/main/spark-4.2/**"
- "!spark/src/main/spark-4.x/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/3.5.8.diff"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/spark_sql_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-spark-builder/**"
spark_4_0:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/spark-3.4/**"
- "!spark/src/main/spark-3.5/**"
- "!spark/src/main/spark-3.x/**"
- "!spark/src/main/spark-4.1/**"
- "!spark/src/main/spark-4.2/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/4.0.2.diff"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/spark_sql_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-spark-builder/**"
spark_4_1:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/spark-3.4/**"
- "!spark/src/main/spark-3.5/**"
- "!spark/src/main/spark-3.x/**"
- "!spark/src/main/spark-4.0/**"
- "!spark/src/main/spark-4.2/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/4.1.1.diff"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/spark_sql_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-spark-builder/**"
iceberg_1_8:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/iceberg/**"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/iceberg_spark_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-iceberg-builder/**"
iceberg_1_9:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/iceberg/**"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/iceberg_spark_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-iceberg-builder/**"
iceberg_1_10:
- "native/**/src/**"
- "native/**/Cargo.toml"
- "native/Cargo.lock"
- "!native/hdfs/**"
- "!native/fs-hdfs/**"
- "common/src/main/**"
- "common/pom.xml"
- "spark/src/main/**"
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark/pom.xml"
- "dev/diffs/iceberg/**"
- "pom.xml"
- "rust-toolchain.toml"
- ".github/workflows/ci.yml"
- ".github/workflows/iceberg_spark_test_reusable.yml"
- ".github/actions/setup-builder/**"
- ".github/actions/setup-iceberg-builder/**"

- name: Compute outputs
id: compute
shell: bash
run: |
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
for key in build_linux build_macos benchmark docs spark_3_4 spark_3_5 spark_4_0 spark_4_1 iceberg_1_8 iceberg_1_9 iceberg_1_10; do
echo "${key}=true" >> "$GITHUB_OUTPUT"
done
else
echo "build_linux=${{ steps.filter.outputs.build_linux }}" >> "$GITHUB_OUTPUT"
echo "build_macos=${{ steps.filter.outputs.build_macos }}" >> "$GITHUB_OUTPUT"
echo "benchmark=${{ steps.filter.outputs.benchmark }}" >> "$GITHUB_OUTPUT"
echo "docs=${{ steps.filter.outputs.docs }}" >> "$GITHUB_OUTPUT"
echo "spark_3_4=${{ steps.filter.outputs.spark_3_4 }}" >> "$GITHUB_OUTPUT"
echo "spark_3_5=${{ steps.filter.outputs.spark_3_5 }}" >> "$GITHUB_OUTPUT"
echo "spark_4_0=${{ steps.filter.outputs.spark_4_0 }}" >> "$GITHUB_OUTPUT"
echo "spark_4_1=${{ steps.filter.outputs.spark_4_1 }}" >> "$GITHUB_OUTPUT"
echo "iceberg_1_8=${{ steps.filter.outputs.iceberg_1_8 }}" >> "$GITHUB_OUTPUT"
echo "iceberg_1_9=${{ steps.filter.outputs.iceberg_1_9 }}" >> "$GITHUB_OUTPUT"
echo "iceberg_1_10=${{ steps.filter.outputs.iceberg_1_10 }}" >> "$GITHUB_OUTPUT"
fi

# ---------------------------------------------------------------------------
# Heavy jobs: each is a thin caller of an existing reusable workflow. The
# `if:` expressions encode the same event/label/path criteria the
# standalone trigger workflows used to encode in their `on:` blocks.
# ---------------------------------------------------------------------------

pr_build_linux:
Comment thread .github/workflows/ci.yml
Comment on lines +340 to +349
name: PR Build (Linux)
needs: changes
if: |
needs.changes.outputs.build_linux == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
github.event_name == 'pull_request')
uses: ./.github/workflows/pr_build_linux.yml

pr_build_macos:
Comment thread .github/workflows/ci.yml
Comment on lines +350 to +359
name: PR Build (macOS)
needs: changes
if: |
needs.changes.outputs.build_macos == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
github.event_name == 'pull_request')
uses: ./.github/workflows/pr_build_macos.yml

pr_benchmark_check:
Comment thread .github/workflows/ci.yml
Comment on lines +360 to +369
name: PR Benchmark Check
needs: changes
if: |
needs.changes.outputs.benchmark == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
github.event_name == 'pull_request')
uses: ./.github/workflows/pr_benchmark_check.yml

docs:
Comment thread .github/workflows/ci.yml
Comment on lines +409 to +422
name: Spark SQL Tests (Spark 4.0)
needs: changes
if: |
needs.changes.outputs.spark_4_0 == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
github.event_name == 'pull_request')
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '4.0'
spark-full: '4.0.2'
java: 17

spark_4_1:
Comment thread .github/workflows/ci.yml
Comment on lines +423 to +438
name: Spark SQL Tests (Spark 4.1)
needs: changes
# Main-only by default; PRs need the `run-spark-4.1-tests` label.
if: |
needs.changes.outputs.spark_4_1 == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' &&
contains(github.event.pull_request.labels.*.name, 'run-spark-4.1-tests')))
uses: ./.github/workflows/spark_sql_test_reusable.yml
with:
spark-short: '4.1'
spark-full: '4.1.1'
java: 17

iceberg_1_8:
Comment thread .github/workflows/ci.yml
Comment on lines +439 to +453
name: Iceberg Spark SQL Tests (Iceberg 1.8)
needs: changes
# Main-only; never runs on PR events.
if: |
needs.changes.outputs.iceberg_1_8 == 'true' &&
(github.event_name == 'push' || github.event_name == 'workflow_dispatch')
uses: ./.github/workflows/iceberg_spark_test_reusable.yml
with:
iceberg-short: '1.8'
iceberg-full: '1.8.1'
spark-short: '3.4'
spark-full: '3.4.3'
java: 11

iceberg_1_9:
Comment thread .github/workflows/ci.yml
Comment on lines +454 to +468
name: Iceberg Spark SQL Tests (Iceberg 1.9)
needs: changes
# Main-only; never runs on PR events.
if: |
needs.changes.outputs.iceberg_1_9 == 'true' &&
(github.event_name == 'push' || github.event_name == 'workflow_dispatch')
uses: ./.github/workflows/iceberg_spark_test_reusable.yml
with:
iceberg-short: '1.9'
iceberg-full: '1.9.1'
spark-short: '3.5'
spark-full: '3.5.8'
java: 17

iceberg_1_10:
Comment thread .github/workflows/ci.yml
Comment on lines +469 to +482
name: Iceberg Spark SQL Tests (Iceberg 1.10)
needs: changes
if: |
needs.changes.outputs.iceberg_1_10 == 'true' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
github.event_name == 'pull_request')
uses: ./.github/workflows/iceberg_spark_test_reusable.yml
with:
iceberg-short: '1.10'
iceberg-full: '1.10.0'
spark-short: '3.5'
spark-full: '3.5.8'
java: 17
@mbutrovich mbutrovich marked this pull request as draft May 28, 2026 14:40
@mbutrovich mbutrovich marked this pull request as ready for review May 28, 2026 17:38
@mbutrovich
Copy link
Copy Markdown
Contributor Author

I think this looks right:

Screenshot 2026-05-28 at 12 21 25 PM

@mbutrovich mbutrovich added this to the 0.17.0 milestone May 28, 2026
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich even for long running pipelines running in parallel we need to comply to

https://infra.apache.org/github-actions-policy.html

  • All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices.
  • All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20.

On the pic I can see 20 jobs and if the run in parallel we prob need to break it down to comply the expectations

@mbutrovich
Copy link
Copy Markdown
Contributor Author

On the pic I can see 20 jobs and if the run in parallel we prob need to break it down to comply the expectations

I'm not doing anything that we're not doing now other than gate existing long-running jobs behind the short ones. We're not enforcing that invariant now, so I'll have to look into how to do that. I likely won't get to it until next week.

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Not trying to single anyone out, but situations like what just happened with #4487 are why we need this ASAP. That PR has a docs issue, prettier failed immediately, and it still fired up dozens of workflows that I had to manually cancel.

@mbutrovich mbutrovich requested a review from andygrove May 29, 2026 19:50
@comphead
Copy link
Copy Markdown
Contributor

Not trying to single anyone out, but situations like what just happened with #4487 are why we need this ASAP. That PR has a docs issue, prettier failed immediately, and it still fired up dozens of workflows that I had to manually cancel.

That makes sense to me, I have some feeling before CI optimizations we had exactly this approach when starting fast preconditions jobs like formatting, RAT, etc first and only if they have succeeded then launch heavyweight pipelines.

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich it makes sense to me, lets give it a try

@andygrove andygrove merged commit 9d79c67 into apache:main May 29, 2026
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants