ci: add caching to speed up tutor-based CI workflow#802
Conversation
- Resolve edx-platform tip SHA early (full 40-char SHA used as cache key) - Cache built dist/ packages keyed by source hash - Cache pip packages for faster Tutor installation - Shallow-clone edx-platform (--depth=1) and cache the clone dir by branch+SHA - Add permissions: packages: write for ghcr.io push - Login to ghcr.io conditioned on non-fork PRs and pushes - Pull openedx-dev Docker image from ghcr.io cache; skip build on hit - Push freshly-built Docker image to ghcr.io (non-fork only) - Cache Tutor config directory; skip tutor dev launch entirely on hit Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/383405df-dd1b-48c8-b7d7-82aaa58a1fb8 Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
- Improve SHA resolution error message with more detail - Add uv.lock to dist/ cache key hash - Validate DOCKER_IMAGE_OPENEDX is non-empty before using it Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/383405df-dd1b-48c8-b7d7-82aaa58a1fb8 Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR speeds up the Tutor-based integration test workflow by adding multiple caches (Python build artifacts, pip cache, edx-platform checkout, Tutor config, and a Docker image cache in GHCR) so repeated CI runs don’t rebuild everything from scratch.
Changes:
- Resolve the edx-platform branch tip SHA up front and use it to key subsequent caches.
- Cache
dist/artifacts, pip downloads, and the edx-platform checkout to reduce repeated network/build work. - Add a GHCR-backed Docker image cache and cache Tutor config dirs to potentially skip image builds and
tutor dev launch.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot fix the failing integration tests in https://github.com/mitodl/open-edx-plugins/actions/runs/26108947537/job/76780542597, look for the reason and fix it. |
On fork PRs, github.repository_owner resolves to the fork owner's namespace, so the cache image tag pointed at an empty registry. Introduce a workflow-level GHCR_CACHE_OWNER env var hardcoded to the upstream org (mitodl) so the cache tag always resolves correctly regardless of who opens the PR. Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/c0dd75d2-1aba-4631-9f2a-71b8b496d86c Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
…g key
The previous code constructed the dev image name by appending "-dev" to
DOCKER_IMAGE_OPENEDX (which gives e.g. "overhangio/openedx:21.0.6-main-dev"),
but Tutor actually builds the dev image under a different name
("openedx-dev:21.0.6-main").
Switch to tutor's canonical DOCKER_IMAGE_OPENEDX_DEV config key so the
docker tag and docker push steps reference the image that Tutor actually
created, avoiding the "No such image" error seen in CI.
Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/219f72d7-6a13-4fa9-8fb7-2c8b1023be1d
Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
Fixed in 2895b45. The failure was in the "Push Docker image to registry cache" step: The workflow was constructing the dev image name by appending |
|
@copilot CI checks are still taking about 25 minutes on master branch. This includes about 15m for the tutor launch step. Looks like there is still some issue? Look at the latest CI run and see what can be fixed/improved. |
…launch The Tutor config cache was never hitting because: 1. tutor dev launch -I --skip-build starts Docker containers (MongoDB, MySQL) which write data files owned by root. When the post-cache step tries to tar ~/.local/share/tutor[-main]/data/, it gets "Permission denied" errors and the cache is never saved, causing tutor dev launch to re-run every time (9-15 minutes). 2. The cache key included the edx-platform tip SHA, which changes many times per day for the master branch. Even if the cache had saved correctly, it would miss on every new master commit. Fix: - Replace `tutor dev launch -I --skip-build; tutor dev stop` with `tutor config save`, which generates the docker-compose files needed by the test runner in ~5 seconds without starting any containers (so no root-owned data files are created and the cache saves cleanly). - Remove the edx-platform SHA from the Tutor config cache key. The generated config files (docker-compose templates, nginx configs, etc.) don't depend on the specific edx-platform SHA content, only on the Tutor version and branch. This makes the cache persistent across edx-platform tip changes. Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/0fb67622-1262-4825-b046-161b9238b734 Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
Investigated and fixed in 396604e. Two root causes were found: Root cause 1 — Tutor config cache never saved due to permission errors: Root cause 2 — Cache key included edx-platform SHA: Fixes:
On the next run after a cache miss, "Generate Tutor config" will take ~5 seconds and save cleanly. Subsequent runs will skip it entirely via cache hit. |
|
@copilot fix the pre-commit checks |
- Quote all $GITHUB_OUTPUT redirections to satisfy shellcheck SC2086 - Quote $DIRECTORY, $DEV, and $PWD variables in the docker compose command to satisfy shellcheck SC2086 - Expand docker compose command to multi-line for readability - Accept yamlfmt reformatting of the long cache key line Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/e3c0d127-a6fc-49de-9bdd-c9d74a37e82f Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
Fixed in b2f5f29. Two pre-commit hooks were failing:
|
|
@copilot now tests are failing with missing apps, I think tutor launch cannot be replaced with tutor config save. R&D on how tutor works with open edx and then make required changes |
… entry points The mounted-directories.sh init script (run by tutor dev init) creates Open_edX.egg-info/ in the bind-mounted edx-platform. Without this, get_plugin_apps(ProjectType.LMS) cannot find content_libraries, causing the INSTALLED_APPS error. This step runs pip install --no-deps -e on the host runner to create the egg-info without needing container startup. Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/1271879d-dd58-49da-b1bf-7433da499873 Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
Fixed in 276723d. Root cause: Without this egg-info, Fix: Added a step after |
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: read | ||
| packages: write |
There was a problem hiding this comment.
Fixed in 3509aa6.
GitHub Actions does not support conditional job-level permissions (permissions must be static YAML), so a full job split would require saving and uploading the Docker image as a tarball artifact (several GB per matrix combination) then downloading and loading it in a separate job — adding significant overhead.
Instead, the feedback is addressed by ensuring packages: write is never exercised during PR runs:
- The GHCR login step condition changed from
github.event_name != 'pull_request' || same-repo-pr→github.event_name == 'push': the GITHUB_TOKEN with write scope is never used during PRs (PRs pull from GHCR anonymously, which works for public packages) - The Docker push step condition simplified to
github.event_name == 'push': push never runs during PRs - A comment on
packages: writedocuments this design and the GitHub Actions limitation
Also fixed the Tutor config cache key that was split across lines (reviewer comment at line 166-167).
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
…matting - Change GHCR login condition from 'not PR or same-repo PR' to 'push only' so the GITHUB_TOKEN with write scope is never used during pull_request runs - Change Docker push condition to 'push only' (same reasoning) - Add a comment on packages:write explaining the GitHub Actions limitation (job-level permissions can't be conditional) and confirming the permission is never exercised during PR runs - Fix Tutor config cache key that was split across lines Agent-Logs-Url: https://github.com/mitodl/open-edx-plugins/sessions/e7bee42e-7e80-4928-b5d0-06d5cc77f3c4 Co-authored-by: asadali145 <52656433+asadali145@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
.github/workflows/ci.yml:167
- The cache key for the Tutor config cache is split across multiple lines. YAML will treat this as a multi-line plain scalar (folded), which is easy to accidentally break and makes it harder to reason about the exact cache key being generated. Keeping the full
${{ ... }}expression on a single line would be more robust and consistent with the other cache keys.
key: tutor-config-${{ steps.tutor-version.outputs.version }}-${{ matrix.edx_branch
}}
| - name: Add edx-platform tutor mounts | ||
| run: | | ||
| cd ${{ github.workspace }}/../edx-platform | ||
| tutor mounts add . | ||
|
|
| exit 1 | ||
| fi | ||
| echo "openedx_dev_image=$OPENEDX_DEV_IMAGE" >> "$GITHUB_OUTPUT" | ||
| CACHE_IMAGE_TAG="ghcr.io/${{ env.GHCR_CACHE_OWNER }}/openedx-dev-cache:${{ steps.edx-sha.outputs.branch_slug }}-${{ steps.edx-sha.outputs.sha }}" |
What are the relevant tickets?
N/A
Description (What does it do?)
Adds several caching layers to
.github/workflows/ci.ymlto avoid rebuilding the Tutor environment from scratch on every push. The biggest time sinks were the Docker image build (~15–20 min), the edx-platform clone (~5–10 min), and Tutor config generation (~9–15 min).Changes:
Resolve edx-platform tip SHA early — a cheap
git ls-remotecall (no download) gives the current HEAD SHA for the branch; used as cache key across all subsequent caches.Cache built
dist/packages — keyed by a hash of all files undersrc/**,pyproject.toml, anduv.lock(including static assets and templates); skipsuv build --all-packageson hits.Cache pip packages — caches
~/.cache/pipkeyed by branch + OS; speeds up repeatedpip install tutor>=…calls.Shallow-clone edx-platform + cache the clone dir — replaces the full two-step clone+checkout with a single
git clone --depth=1 --branch=…and caches the directory keyed by branch+SHA; skips the clone entirely on hits.Docker image cache via ghcr.io — before building, tries to pull
ghcr.io/mitodl/openedx-dev-cache:<branch-sha>. On hit, retags the pulled image and skipstutor images build openedx-dev. On miss, builds normally then pushes to ghcr.io for future runs usingtutor config printvalue DOCKER_IMAGE_OPENEDX_DEVto resolve the correct image name. Push is conditioned onpushevents only (never onpull_requestruns). The GHCR namespace is hardcoded viaenv.GHCR_CACHE_OWNER: mitodlso fork PRs always pull from the upstream org's cache.Cache Tutor config directory — caches
~/.local/share/tutorand~/.local/share/tutor-mainkeyed by Tutor version + branch; skipstutor config saveentirely on hits (the docker-compose env files are already present from the cache).Generate edx-platform egg-info on the host runner — runs
pip install --no-deps -e /path/to/edx-platformafter the edx-platform is cloned/restored. When edx-platform is bind-mounted into the Tutor container, the image'sOpen_edX.egg-info/directory is overwritten by the host checkout (which has no egg-info). Without the egg-info,pkg_resourcescannot read the edx-platform'slms.djangoappentry points, soget_plugin_apps(ProjectType.LMS)fails to discover apps likecontent_libraries, causing aRuntimeErrorat Django startup. Previously,tutor dev init(viamounted-directories.sh) regenerated this egg-info inside a running container; this step reproduces that regeneration in ~5 seconds on the host without starting any containers.Estimated savings per run on cache hit:
How can this be tested?
tutor config save.ghcr.io/mitodl/openedx-dev-cachepackage is created in the org's packages after the first run on the main branch.Additional Context
packages: writepermission is declared at the job level (GitHub Actions does not support conditional job-level permissions). However, it is never exercised duringpull_requestruns: both the GHCR login step and the Docker push step are conditioned ongithub.event_name == 'push'. PRs pull from GHCR using anonymous access (the cache package is public-readable).tutor dev launch -I --skip-buildwas replaced bytutor config savebecause the former starts Docker containers (MongoDB, MySQL, etc.) that write data files owned by root. Those root-owned files causedtar: Permission deniederrors in the post-cache step, preventing the Tutor config cache from ever being saved.tutor config savegenerates all needed docker-compose files in ~5 seconds without starting any containers.Open_edX.egg-info/is generated on the host runner rather than inside a container. The egg-info directory is a portable text-format metadata directory;pkg_resourcesinside the container reads it correctly via the bind mount regardless of which Python generated it. The egg-info is also captured in the edx-platform directory cache, so on cache hits it is already present.actions/cache@v4(SHA-pinned) anddocker/login-action@v3(SHA-pinned) are the only new actions added.