Skip to content

[SPARK-57708][INFRA] Install zstd in CI container images used by the branch-4.1 scheduler#56911

Open
gaogaotiantian wants to merge 1 commit into
apache:branch-4.1from
gaogaotiantian:SPARK-57708-zstd-branch-4.1
Open

[SPARK-57708][INFRA] Install zstd in CI container images used by the branch-4.1 scheduler#56911
gaogaotiantian wants to merge 1 commit into
apache:branch-4.1from
gaogaotiantian:SPARK-57708-zstd-branch-4.1

Conversation

@gaogaotiantian

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Install zstd in the CI container image Dockerfiles that are actually built by the branch-4.1 scheduler (branch41_scheduler.yml): dev/infra/Dockerfile and the docs, lint, sparkr, python-311, python-314, and pypy-310 images under dev/spark-test-image/.

Why are the changes needed?

On branch-4.1, build_and_test.yml extracts the precompiled artifact with zstd -dc compile-artifact.tar.zst | tar -xf -, but the container images do not have zstd installed, so the step fails with zstd: not found. (The same gap also defeats actions/cache, whose cache version embeds the compression method; container images without zstd fall back to gzip and never restore caches saved by host jobs.)

This is the branch-4.1-scoped counterpart of SPARK-57278, which fixed the same problem on master. Only the images that the branch-4.1 scheduler builds are modified here; the other Dockerfiles under dev/spark-test-image/ are not used by any branch-4.1 workflow.

Does this PR introduce any user-facing change?

No. CI-only.

How was this patch tested?

CI on branch-4.1. The Extract precompiled artifact step now finds zstd in the rebuilt images, and actions/cache restores the Coursier cache in the container jobs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-8)

…branch-4.1 scheduler

### What changes were proposed in this pull request?

Install `zstd` in the CI container image Dockerfiles that are actually built
by the `branch-4.1` scheduler (`branch41_scheduler.yml`): `dev/infra/Dockerfile`
and the `docs`, `lint`, `sparkr`, `python-311`, `python-314`, and `pypy-310`
images under `dev/spark-test-image/`.

### Why are the changes needed?

On `branch-4.1`, `build_and_test.yml` extracts the precompiled artifact with
`zstd -dc compile-artifact.tar.zst | tar -xf -`, but the container images do not
have `zstd` installed, so the step fails with `zstd: not found`. (The same gap
also defeats `actions/cache`, whose cache version embeds the compression method;
container images without `zstd` fall back to `gzip` and never restore caches
saved by host jobs.)

This is the `branch-4.1`-scoped counterpart of SPARK-57278, which fixed the same
problem on `master`. Only the images that the `branch-4.1` scheduler builds are
modified here; the other Dockerfiles under `dev/spark-test-image/` are not used
by any `branch-4.1` workflow.

### Does this PR introduce _any_ user-facing change?

No. CI-only.

### How was this patch tested?

CI on `branch-4.1`. The `Extract precompiled artifact` step now finds `zstd` in
the rebuilt images, and `actions/cache` restores the Coursier cache in the
container jobs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant