[SPARK-57782][INFRA][DOC] Make pages.yml reuse the doc image#56393
Draft
zhengruifeng wants to merge 3 commits into
Draft
[SPARK-57782][INFRA][DOC] Make pages.yml reuse the doc image#56393zhengruifeng wants to merge 3 commits into
zhengruifeng wants to merge 3 commits into
Conversation
512a92a to
7a82e1d
Compare
…d_test Run the documentation job inside the prebuilt documentation image (apache-spark-github-action-image-docs-cache:master-static) that build_and_test.yml already uses, dropping the redundant inline setup of the Python docs dependencies, Ruby, and Pandoc now provided by the image.
Temporarily run the GitHub Pages workflow on the fork to validate the container-based doc build: trigger on push to this branch, drop the apache/spark job guard, and skip the Pages configure/deploy steps on the fork. To be reverted before merge.
… validation" This reverts commit c1f16f3.
821a563 to
e2854b1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Run the "GitHub Pages deployment" documentation job inside the prebuilt documentation container image
ghcr.io/apache/spark/apache-spark-github-action-image-docs-cache:master-static-- the same image that the documentation job inbuild_and_test.ymlbuilds and runs in. That image is produced fromdev/spark-test-image/docs/Dockerfileand published bybuild_infra_images_cache.yml.As a result, the following steps now come from the image and are removed from
pages.yml:Install Python 3.11andInstall Python dependencies(the pinned Sphinx/pandas/grpcio pip list)Install Ruby for documentation generationInstall PandocCompanion changes required to build inside a container, mirroring the documentation job in
build_and_test.yml:LC_ALL/LANGtoC.UTF-8git config --global --add safe.directory ${GITHUB_WORKSPACE}step (the doc build invokes git as root inside the container)dev/free_disk_space_containerto reclaim runner disk now that the image also occupies itsetup-java(Java 17) soJAVA_HOMEis set for the Scala/SQL doc generation, and align the Bundler install withbuild_and_test.ymlWhy are the changes needed?
pages.ymlduplicated the documentation toolchain setup -- a long pinned Python dependency list, Ruby, and Pandoc -- that is already captured indev/spark-test-image/docs/Dockerfileand published as a reusable image. Reusing that image keeps the documentation dependencies in a single source of truth, removes the duplicated install steps, and avoids reinstalling the toolchain on every run.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Validated by running the updated workflow end-to-end on a fork. Since the workflow only triggers on push to
masterinapache/spark, it was temporarily enabled for the PR branch (commit c1f16f3, reverted in 821a563, so the final diff is unchanged), with only the two Pages deploy steps skipped because Pages is not enabled on the fork.Successful run: https://github.com/zhengruifeng/spark/actions/runs/27413443557
The run pulled
apache-spark-github-action-image-docs-cache:master-static, built the full documentation inside the container (SKIP_RDOC=1 bundle exec jekyll buildagainstapache/spark@mastersources, ~27 minutes), and uploaded the built site as a 114 MBgithub-pagesartifact. Total run time (~30 minutes) is consistent with recent runs of the current workflow onapache/spark(~31-65 minutes), with the environment setup reduced to a ~50 second image pull plus a ~20 secondbundle install. The skipped deploy steps (actions/configure-pages/actions/deploy-pages) are unchanged by this PR.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-8)