Skip to content

Commit a6744eb

Browse files
authored
Merge branch 'main' into opt_native_shuffle
2 parents 0b71de4 + 9d79c67 commit a6744eb

22 files changed

Lines changed: 848 additions & 832 deletions

.github/workflows/README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# GitHub Workflows
2+
3+
GitHub Actions only loads `*.yml` / `*.yaml` files in this directory as
4+
workflows. This README is ignored by the runner.
5+
6+
## Pipeline overview
7+
8+
A single umbrella workflow (`ci.yml`) orchestrates everything that runs on
9+
pull requests and pushes to `main`. The umbrella runs cheap **preflight**
10+
checks first, computes which heavy jobs are relevant to the change, and only
11+
then fans out to the long-running test/build workflows. Each long workflow
12+
is a `workflow_call` reusable invoked from the umbrella.
13+
14+
```
15+
pull_request | push to main | workflow_dispatch
16+
|
17+
v
18+
+-----------------------+
19+
| preflight | ubuntu-slim
20+
| (RAT, prettier, |
21+
| missing-suites, |
22+
| actionlint) |
23+
+-----------+-----------+
24+
| on success
25+
v
26+
+-----------------------+
27+
| changes | ubuntu-slim
28+
| (compute-changes.py: |
29+
| one boolean per |
30+
| heavy job) |
31+
+-----------+-----------+
32+
|
33+
+-----------+-----------+-----------+-----------+-----------+-----------+
34+
| | | | | | |
35+
v v v v v v v
36+
pr_build_ pr_build_ pr_benchmark_ docs spark_3_5 spark_4_0 iceberg_1_10
37+
linux macos check (push) (PR+push) (PR+push) (PR+push)
38+
(PR+push) (PR+push) (PR+push)
39+
| | |
40+
v v v
41+
spark_3_4 / spark_4_1 iceberg_1_8 / 1_9
42+
(push or PR + label) (push only)
43+
44+
reusable workflows invoked via `uses:`:
45+
pr_build_linux.yml spark_sql_test_reusable.yml
46+
pr_build_macos.yml iceberg_spark_test_reusable.yml
47+
pr_benchmark_check.yml
48+
docs.yaml
49+
```
50+
51+
## What runs when
52+
53+
| Job in `ci.yml` | Triggered by | Path filter source |
54+
| -------------------- | ------------------------------------------------ | ----------------------------------- |
55+
| `preflight` | every PR / push to main / dispatch | none (always runs) |
56+
| `changes` | every PR / push to main / dispatch | runs `dev/ci/compute-changes.py` |
57+
| `pr_build_linux` | PR or push, paths matched | `dev/ci/compute-changes.py` |
58+
| `pr_build_macos` | PR or push, paths matched | `dev/ci/compute-changes.py` |
59+
| `pr_benchmark_check` | PR or push, paths matched | benchmark sources only |
60+
| `docs` | push to main, paths matched | `.asf.yaml`, `docs/**`, `docs.yaml` |
61+
| `spark_3_5` | PR or push, paths matched | Spark 3.5 sources |
62+
| `spark_4_0` | PR or push, paths matched | Spark 4.0 sources |
63+
| `spark_3_4` | push, **or** PR with `run-spark-3.4-tests` label | Spark 3.4 sources |
64+
| `spark_4_1` | push, **or** PR with `run-spark-4.1-tests` label | Spark 4.1 sources |
65+
| `iceberg_1_10` | PR or push, paths matched | Iceberg sources |
66+
| `iceberg_1_8` | push only | Iceberg sources |
67+
| `iceberg_1_9` | push only | Iceberg sources |
68+
69+
A heavy job appears in the PR's checks list as a `skipped` entry whenever
70+
its path filter or event criteria don't match. Skipped checks count as
71+
passing for branch protection.
72+
73+
## Standalone workflows (not under the umbrella)
74+
75+
These workflows have their own triggers because they fire on events the
76+
umbrella doesn't watch, or operate independently of the rest of CI:
77+
78+
| File | Why standalone |
79+
| ---------------------- | ---------------------------------------------------------------------------------------------------- |
80+
| `pr_title_check.yml` | Fires on `pull_request.types: [edited]` so it re-runs when a PR title is edited without a code push. |
81+
| `codeql.yml` | Security scanner; weekly schedule + on every push/PR. |
82+
| `miri.yml` | Nightly Miri safety checks. |
83+
| `stale.yml` | Daily stale-PR closer. |
84+
| `take.yml` | Issue-comment trigger for `take` / `untake`. |
85+
| `label_new_issues.yml` | Issue trigger to apply `requires-triage`. |
86+
87+
## Reusable workflows (called by `ci.yml`)
88+
89+
| File | Called from `ci.yml` job(s) |
90+
| --------------------------------- | -------------------------------------------------- |
91+
| `pr_build_linux.yml` | `pr_build_linux` |
92+
| `pr_build_macos.yml` | `pr_build_macos` |
93+
| `pr_benchmark_check.yml` | `pr_benchmark_check` |
94+
| `docs.yaml` | `docs` |
95+
| `spark_sql_test_reusable.yml` | `spark_3_4`, `spark_3_5`, `spark_4_0`, `spark_4_1` |
96+
| `iceberg_spark_test_reusable.yml` | `iceberg_1_8`, `iceberg_1_9`, `iceberg_1_10` |
97+
98+
## Modifying path filters
99+
100+
Each long workflow's "what files trigger me" rules live in the `FILTERS`
101+
dict at the top of `dev/ci/compute-changes.py`. The `changes` job in
102+
`ci.yml` invokes that script and the gate `if:` on each long job consumes
103+
`needs.changes.outputs.<name>`. When adding a new test suite or moving
104+
sources, update the relevant filter entry there.
105+
106+
## Branch protection
107+
108+
Required-check names changed when these workflows were consolidated. The
109+
umbrella exposes per-job names like `CI / pr_build_linux / Lint`,
110+
`CI / spark_3_5 / linux-test (...)`, etc. Update repository branch
111+
protection rules to point at the new names; the old standalone workflow
112+
names (`Spark SQL Tests (Spark 3.5)`, `PR Build (Linux)`, ...) no longer
113+
exist as top-level workflows.

0 commit comments

Comments
 (0)