Skip to content

Commit 7e766a7

Browse files
committed
Don't rely on external dependency for filtering changes.
1 parent 63bc4c2 commit 7e766a7

3 files changed

Lines changed: 323 additions & 227 deletions

File tree

.github/workflows/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ is a `workflow_call` reusable invoked from the umbrella.
2525
v
2626
+-----------------------+
2727
| changes | ubuntu-slim
28-
| (dorny/paths-filter: |
28+
| (compute-changes.py: |
2929
| one boolean per |
3030
| heavy job) |
3131
+-----------+-----------+
@@ -53,9 +53,9 @@ is a `workflow_call` reusable invoked from the umbrella.
5353
| Job in `ci.yml` | Triggered by | Path filter source |
5454
|---------------------|-----------------------------------------|------------------------------------------|
5555
| `preflight` | every PR / push to main / dispatch | none (always runs) |
56-
| `changes` | every PR / push to main / dispatch | runs `dorny/paths-filter@v3` |
57-
| `pr_build_linux` | PR or push, paths matched | inlined in `changes` job |
58-
| `pr_build_macos` | PR or push, paths matched | inlined in `changes` job |
56+
| `changes` | every PR / push to main / dispatch | runs `dev/ci/compute-changes.py` |
57+
| `pr_build_linux` | PR or push, paths matched | `dev/ci/compute-changes.py` |
58+
| `pr_build_macos` | PR or push, paths matched | `dev/ci/compute-changes.py` |
5959
| `pr_benchmark_check`| PR or push, paths matched | benchmark sources only |
6060
| `docs` | push to main, paths matched | `.asf.yaml`, `docs/**`, `ci.yml`, `docs.yaml` |
6161
| `spark_3_5` | PR or push, paths matched | Spark 3.5 sources |
@@ -97,10 +97,11 @@ umbrella doesn't watch, or operate independently of the rest of CI:
9797

9898
## Modifying path filters
9999

100-
Each long workflow's "what files trigger me" rules live in the `changes`
101-
job inside `ci.yml` (in the `dorny/paths-filter` block). When adding a new
102-
test suite or moving sources, update the filter for the affected job there;
103-
the gate `if:` on each job consumes `needs.changes.outputs.<name>`.
100+
Each long workflow's "what files trigger me" rules live in the `FILTERS`
101+
dict at the top of `dev/ci/compute-changes.py`. The `changes` job in
102+
`ci.yml` invokes that script and the gate `if:` on each long job consumes
103+
`needs.changes.outputs.<name>`. When adding a new test suite or moving
104+
sources, update the relevant filter entry there.
104105

105106
## Branch protection
106107

.github/workflows/ci.yml

Lines changed: 29 additions & 219 deletions
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,11 @@ jobs:
8080

8181
# ---------------------------------------------------------------------------
8282
# changes: compute which long jobs need to run for this event. Replaces the
83-
# per-workflow `on: paths:` filters that used to gate triggering. On
84-
# workflow_dispatch we force every output true so a manual run can exercise
85-
# any gated job.
83+
# per-workflow `on: paths:` filters that used to gate triggering. Filter
84+
# rules live in dev/ci/compute-changes.py, which is invoked here in lieu of
85+
# dorny/paths-filter (not on the apache org actions allow list). On
86+
# workflow_dispatch every output is forced true so a manual run can
87+
# exercise any gated job.
8688
# ---------------------------------------------------------------------------
8789
changes:
8890
name: Detect changes
@@ -102,233 +104,41 @@ jobs:
102104
iceberg_1_10: ${{ steps.compute.outputs.iceberg_1_10 }}
103105
steps:
104106
- uses: actions/checkout@v6
105-
106-
- name: Run paths filter
107-
id: filter
108-
if: github.event_name != 'workflow_dispatch'
109-
uses: dorny/paths-filter@v3
110107
with:
111-
filters: |
112-
build_linux:
113-
- "native/**"
114-
- "common/**"
115-
- "spark/**"
116-
- "spark-integration/**"
117-
- "pom.xml"
118-
- "**/pom.xml"
119-
- ".mvn/**"
120-
- "mvnw"
121-
- "Makefile"
122-
- "rust-toolchain.toml"
123-
- "dev/ci/**"
124-
- ".github/workflows/ci.yml"
125-
- ".github/workflows/pr_build_linux.yml"
126-
- ".github/actions/setup-builder/**"
127-
- ".github/actions/java-test/**"
128-
- ".github/actions/rust-test/**"
129-
- "!**.md"
130-
- "!native/core/benches/**"
131-
- "!native/spark-expr/benches/**"
132-
- "!spark/src/test/scala/org/apache/spark/sql/benchmark/**"
133-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
134-
build_macos:
135-
- "native/**"
136-
- "common/**"
137-
- "spark/**"
138-
- "spark-integration/**"
139-
- "pom.xml"
140-
- "**/pom.xml"
141-
- ".mvn/**"
142-
- "mvnw"
143-
- "Makefile"
144-
- "rust-toolchain.toml"
145-
- "dev/ci/**"
146-
- ".github/workflows/ci.yml"
147-
- ".github/workflows/pr_build_macos.yml"
148-
- ".github/actions/setup-macos-builder/**"
149-
- ".github/actions/java-test/**"
150-
- "!**.md"
151-
- "!native/core/benches/**"
152-
- "!native/spark-expr/benches/**"
153-
- "!spark/src/test/scala/org/apache/spark/sql/benchmark/**"
154-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
155-
benchmark:
156-
- "native/core/benches/**"
157-
- "native/spark-expr/benches/**"
158-
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
159-
docs:
160-
- ".asf.yaml"
161-
- ".github/workflows/ci.yml"
162-
- ".github/workflows/docs.yaml"
163-
- "docs/**"
164-
spark_3_4:
165-
- "native/**/src/**"
166-
- "native/**/Cargo.toml"
167-
- "native/Cargo.lock"
168-
- "!native/hdfs/**"
169-
- "!native/fs-hdfs/**"
170-
- "common/src/main/**"
171-
- "common/pom.xml"
172-
- "spark/src/main/**"
173-
- "!spark/src/main/spark-3.5/**"
174-
- "!spark/src/main/spark-4.0/**"
175-
- "!spark/src/main/spark-4.1/**"
176-
- "!spark/src/main/spark-4.2/**"
177-
- "!spark/src/main/spark-4.x/**"
178-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
179-
- "spark/pom.xml"
180-
- "dev/diffs/3.4.3.diff"
181-
- "pom.xml"
182-
- "rust-toolchain.toml"
183-
- ".github/workflows/ci.yml"
184-
- ".github/workflows/spark_sql_test_reusable.yml"
185-
- ".github/actions/setup-builder/**"
186-
- ".github/actions/setup-spark-builder/**"
187-
spark_3_5:
188-
- "native/**/src/**"
189-
- "native/**/Cargo.toml"
190-
- "native/Cargo.lock"
191-
- "!native/hdfs/**"
192-
- "!native/fs-hdfs/**"
193-
- "common/src/main/**"
194-
- "common/pom.xml"
195-
- "spark/src/main/**"
196-
- "!spark/src/main/spark-3.4/**"
197-
- "!spark/src/main/spark-4.0/**"
198-
- "!spark/src/main/spark-4.1/**"
199-
- "!spark/src/main/spark-4.2/**"
200-
- "!spark/src/main/spark-4.x/**"
201-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
202-
- "spark/pom.xml"
203-
- "dev/diffs/3.5.8.diff"
204-
- "pom.xml"
205-
- "rust-toolchain.toml"
206-
- ".github/workflows/ci.yml"
207-
- ".github/workflows/spark_sql_test_reusable.yml"
208-
- ".github/actions/setup-builder/**"
209-
- ".github/actions/setup-spark-builder/**"
210-
spark_4_0:
211-
- "native/**/src/**"
212-
- "native/**/Cargo.toml"
213-
- "native/Cargo.lock"
214-
- "!native/hdfs/**"
215-
- "!native/fs-hdfs/**"
216-
- "common/src/main/**"
217-
- "common/pom.xml"
218-
- "spark/src/main/**"
219-
- "!spark/src/main/spark-3.4/**"
220-
- "!spark/src/main/spark-3.5/**"
221-
- "!spark/src/main/spark-3.x/**"
222-
- "!spark/src/main/spark-4.1/**"
223-
- "!spark/src/main/spark-4.2/**"
224-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
225-
- "spark/pom.xml"
226-
- "dev/diffs/4.0.2.diff"
227-
- "pom.xml"
228-
- "rust-toolchain.toml"
229-
- ".github/workflows/ci.yml"
230-
- ".github/workflows/spark_sql_test_reusable.yml"
231-
- ".github/actions/setup-builder/**"
232-
- ".github/actions/setup-spark-builder/**"
233-
spark_4_1:
234-
- "native/**/src/**"
235-
- "native/**/Cargo.toml"
236-
- "native/Cargo.lock"
237-
- "!native/hdfs/**"
238-
- "!native/fs-hdfs/**"
239-
- "common/src/main/**"
240-
- "common/pom.xml"
241-
- "spark/src/main/**"
242-
- "!spark/src/main/spark-3.4/**"
243-
- "!spark/src/main/spark-3.5/**"
244-
- "!spark/src/main/spark-3.x/**"
245-
- "!spark/src/main/spark-4.0/**"
246-
- "!spark/src/main/spark-4.2/**"
247-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
248-
- "spark/pom.xml"
249-
- "dev/diffs/4.1.1.diff"
250-
- "pom.xml"
251-
- "rust-toolchain.toml"
252-
- ".github/workflows/ci.yml"
253-
- ".github/workflows/spark_sql_test_reusable.yml"
254-
- ".github/actions/setup-builder/**"
255-
- ".github/actions/setup-spark-builder/**"
256-
iceberg_1_8:
257-
- "native/**/src/**"
258-
- "native/**/Cargo.toml"
259-
- "native/Cargo.lock"
260-
- "!native/hdfs/**"
261-
- "!native/fs-hdfs/**"
262-
- "common/src/main/**"
263-
- "common/pom.xml"
264-
- "spark/src/main/**"
265-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
266-
- "spark/pom.xml"
267-
- "dev/diffs/iceberg/**"
268-
- "pom.xml"
269-
- "rust-toolchain.toml"
270-
- ".github/workflows/ci.yml"
271-
- ".github/workflows/iceberg_spark_test_reusable.yml"
272-
- ".github/actions/setup-builder/**"
273-
- ".github/actions/setup-iceberg-builder/**"
274-
iceberg_1_9:
275-
- "native/**/src/**"
276-
- "native/**/Cargo.toml"
277-
- "native/Cargo.lock"
278-
- "!native/hdfs/**"
279-
- "!native/fs-hdfs/**"
280-
- "common/src/main/**"
281-
- "common/pom.xml"
282-
- "spark/src/main/**"
283-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
284-
- "spark/pom.xml"
285-
- "dev/diffs/iceberg/**"
286-
- "pom.xml"
287-
- "rust-toolchain.toml"
288-
- ".github/workflows/ci.yml"
289-
- ".github/workflows/iceberg_spark_test_reusable.yml"
290-
- ".github/actions/setup-builder/**"
291-
- ".github/actions/setup-iceberg-builder/**"
292-
iceberg_1_10:
293-
- "native/**/src/**"
294-
- "native/**/Cargo.toml"
295-
- "native/Cargo.lock"
296-
- "!native/hdfs/**"
297-
- "!native/fs-hdfs/**"
298-
- "common/src/main/**"
299-
- "common/pom.xml"
300-
- "spark/src/main/**"
301-
- "!spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
302-
- "spark/pom.xml"
303-
- "dev/diffs/iceberg/**"
304-
- "pom.xml"
305-
- "rust-toolchain.toml"
306-
- ".github/workflows/ci.yml"
307-
- ".github/workflows/iceberg_spark_test_reusable.yml"
308-
- ".github/actions/setup-builder/**"
309-
- ".github/actions/setup-iceberg-builder/**"
108+
# Need both branches' history so we can diff base..head for PRs and
109+
# before..after for pushes.
110+
fetch-depth: 0
310111

311112
- name: Compute outputs
312113
id: compute
313114
shell: bash
115+
env:
116+
EVENT_NAME: ${{ github.event_name }}
117+
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
118+
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
119+
PUSH_BEFORE: ${{ github.event.before }}
120+
PUSH_AFTER: ${{ github.sha }}
314121
run: |
315-
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
122+
set -euo pipefail
123+
if [[ "$EVENT_NAME" == "workflow_dispatch" ]]; then
316124
for key in build_linux build_macos benchmark docs spark_3_4 spark_3_5 spark_4_0 spark_4_1 iceberg_1_8 iceberg_1_9 iceberg_1_10; do
317125
echo "${key}=true" >> "$GITHUB_OUTPUT"
318126
done
127+
exit 0
128+
fi
129+
if [[ "$EVENT_NAME" == "pull_request" ]]; then
130+
git diff --name-only "$PR_BASE_SHA"..."$PR_HEAD_SHA" > changed_files.txt
319131
else
320-
echo "build_linux=${{ steps.filter.outputs.build_linux }}" >> "$GITHUB_OUTPUT"
321-
echo "build_macos=${{ steps.filter.outputs.build_macos }}" >> "$GITHUB_OUTPUT"
322-
echo "benchmark=${{ steps.filter.outputs.benchmark }}" >> "$GITHUB_OUTPUT"
323-
echo "docs=${{ steps.filter.outputs.docs }}" >> "$GITHUB_OUTPUT"
324-
echo "spark_3_4=${{ steps.filter.outputs.spark_3_4 }}" >> "$GITHUB_OUTPUT"
325-
echo "spark_3_5=${{ steps.filter.outputs.spark_3_5 }}" >> "$GITHUB_OUTPUT"
326-
echo "spark_4_0=${{ steps.filter.outputs.spark_4_0 }}" >> "$GITHUB_OUTPUT"
327-
echo "spark_4_1=${{ steps.filter.outputs.spark_4_1 }}" >> "$GITHUB_OUTPUT"
328-
echo "iceberg_1_8=${{ steps.filter.outputs.iceberg_1_8 }}" >> "$GITHUB_OUTPUT"
329-
echo "iceberg_1_9=${{ steps.filter.outputs.iceberg_1_9 }}" >> "$GITHUB_OUTPUT"
330-
echo "iceberg_1_10=${{ steps.filter.outputs.iceberg_1_10 }}" >> "$GITHUB_OUTPUT"
132+
# push to main; first push to a branch has all-zero before sha
133+
if [[ "$PUSH_BEFORE" =~ ^0+$ ]]; then
134+
git ls-tree -r --name-only "$PUSH_AFTER" > changed_files.txt
135+
else
136+
git diff --name-only "$PUSH_BEFORE".."$PUSH_AFTER" > changed_files.txt
137+
fi
331138
fi
139+
echo "Changed files:"
140+
cat changed_files.txt
141+
python3 dev/ci/compute-changes.py changed_files.txt >> "$GITHUB_OUTPUT"
332142
333143
# ---------------------------------------------------------------------------
334144
# Heavy jobs: each is a thin caller of an existing reusable workflow. The

0 commit comments

Comments
 (0)