ci: add caching to speed up tutor-based CI workflow by Copilot · Pull Request #802 · mitodl/open-edx-plugins

Copilot · 2026-05-19T15:56:03Z

What are the relevant tickets?

N/A

Description (What does it do?)

Adds several caching layers to .github/workflows/ci.yml to avoid rebuilding the Tutor environment from scratch on every push. The biggest time sinks were the Docker image build (~15–20 min), the edx-platform clone (~5–10 min), and Tutor config generation (~9–15 min).

Changes:

Resolve edx-platform tip SHA early — a cheap git ls-remote call (no download) gives the current HEAD SHA for the branch; used as cache key across all subsequent caches.
Cache built dist/ packages — keyed by a hash of all files under src/**, pyproject.toml, and uv.lock (including static assets and templates); skips uv build --all-packages on hits.
Cache pip packages — caches ~/.cache/pip keyed by branch + OS; speeds up repeated pip install tutor>=… calls.
Shallow-clone edx-platform + cache the clone dir — replaces the full two-step clone+checkout with a single git clone --depth=1 --branch=… and caches the directory keyed by branch+SHA; skips the clone entirely on hits.
Docker image cache via ghcr.io — before building, tries to pull ghcr.io/mitodl/openedx-dev-cache:<branch-sha>. On hit, retags the pulled image and skips tutor images build openedx-dev. On miss, builds normally then pushes to ghcr.io for future runs using tutor config printvalue DOCKER_IMAGE_OPENEDX_DEV to resolve the correct image name. Push is conditioned on push events only (never on pull_request runs). The GHCR namespace is hardcoded via env.GHCR_CACHE_OWNER: mitodl so fork PRs always pull from the upstream org's cache.
Cache Tutor config directory — caches ~/.local/share/tutor and ~/.local/share/tutor-main keyed by Tutor version + branch; skips tutor config save entirely on hits (the docker-compose env files are already present from the cache).
Generate edx-platform egg-info on the host runner — runs pip install --no-deps -e /path/to/edx-platform after the edx-platform is cloned/restored. When edx-platform is bind-mounted into the Tutor container, the image's Open_edX.egg-info/ directory is overwritten by the host checkout (which has no egg-info). Without the egg-info, pkg_resources cannot read the edx-platform's lms.djangoapp entry points, so get_plugin_apps(ProjectType.LMS) fails to discover apps like content_libraries, causing a RuntimeError at Django startup. Previously, tutor dev init (via mounted-directories.sh) regenerated this egg-info inside a running container; this step reproduces that regeneration in ~5 seconds on the host without starting any containers.

Estimated savings per run on cache hit:

Step	Before	After
Clone edx-platform	~5–10 min (full)	< 1 min (shallow) or skip
Build Docker image	~15–20 min	skip
Generate Tutor config	~9–15 min	~5 s or skip
Tutor pip install	~1–2 min	< 30 s

How can this be tested?

Open a PR and observe the CI run: the first run will populate all caches.
Push another commit to the same PR without changing edx-platform or the plugin source — subsequent runs should skip the Docker build, the edx-platform clone, and tutor config save.
Verify the ghcr.io/mitodl/openedx-dev-cache package is created in the org's packages after the first run on the main branch.

Additional Context

The packages: write permission is declared at the job level (GitHub Actions does not support conditional job-level permissions). However, it is never exercised during pull_request runs: both the GHCR login step and the Docker push step are conditioned on github.event_name == 'push'. PRs pull from GHCR using anonymous access (the cache package is public-readable).
tutor dev launch -I --skip-build was replaced by tutor config save because the former starts Docker containers (MongoDB, MySQL, etc.) that write data files owned by root. Those root-owned files caused tar: Permission denied errors in the post-cache step, preventing the Tutor config cache from ever being saved. tutor config save generates all needed docker-compose files in ~5 seconds without starting any containers.
The edx-platform's Open_edX.egg-info/ is generated on the host runner rather than inside a container. The egg-info directory is a portable text-format metadata directory; pkg_resources inside the container reads it correctly via the bind mount regardless of which Python generated it. The egg-info is also captured in the edx-platform directory cache, so on cache hits it is already present.
The Tutor config cache key omits the edx-platform SHA (unlike the Docker image and edx-platform clone caches). The generated config files depend only on the Tutor version and branch, not on the specific edx-platform commit, so tying the key to the SHA would cause unnecessary invalidations on every new master commit.
The Docker image pushed to ghcr.io will accumulate over time. Consider configuring a package retention policy in GitHub settings to clean up old image tags automatically.
actions/cache@v4 (SHA-pinned) and docker/login-action@v3 (SHA-pinned) are the only new actions added.

Copilot

Pull request overview

This PR speeds up the Tutor-based integration test workflow by adding multiple caches (Python build artifacts, pip cache, edx-platform checkout, Tutor config, and a Docker image cache in GHCR) so repeated CI runs don’t rebuild everything from scratch.

Changes:

Resolve the edx-platform branch tip SHA up front and use it to key subsequent caches.
Cache dist/ artifacts, pip downloads, and the edx-platform checkout to reduce repeated network/build work.
Add a GHCR-backed Docker image cache and cache Tutor config dirs to potentially skip image builds and tutor dev launch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

asadali145 · 2026-05-19T21:37:01Z

@copilot fix the failing integration tests in https://github.com/mitodl/open-edx-plugins/actions/runs/26108947537/job/76780542597, look for the reason and fix it.

Copilot · 2026-05-19T21:40:17Z

@copilot fix the failing integration tests in https://github.com/mitodl/open-edx-plugins/actions/runs/26108947537/job/76780542597, look for the reason and fix it.

Fixed in 2895b45. The failure was in the "Push Docker image to registry cache" step:

Error response from daemon: No such image: overhangio/openedx:21.0.6-main-dev

The workflow was constructing the dev image name by appending -dev to DOCKER_IMAGE_OPENEDX (overhangio/openedx:21.0.6-main → overhangio/openedx:21.0.6-main-dev), but Tutor actually builds and names the dev image as openedx-dev:21.0.6-main — a completely different naming scheme. The fix replaces the manual construction with tutor config printvalue DOCKER_IMAGE_OPENEDX_DEV, which returns the correct local image name that Tutor actually created.

asadali145 · 2026-05-20T07:52:32Z

@copilot CI checks are still taking about 25 minutes on master branch. This includes about 15m for the tutor launch step. Looks like there is still some issue? Look at the latest CI run and see what can be fixed/improved.

Copilot · 2026-05-20T08:03:04Z

@copilot CI checks are still taking about 25 minutes on master branch. This includes about 15m for the tutor launch step. Looks like there is still some issue? Look at the latest CI run and see what c...

Investigated and fixed in 396604e. Two root causes were found:

Root cause 1 — Tutor config cache never saved due to permission errors:
tutor dev launch -I --skip-build starts Docker containers (MongoDB, MySQL, etc.) which write data files owned by root into ~/.local/share/tutor[-main]/data/. The Post-cache step's tar couldn't read those files (Permission denied), so the cache was never stored and tutor dev launch re-ran every time (9–15 min).

Root cause 2 — Cache key included edx-platform SHA:
The master branch gets new commits many times per day, changing the SHA and invalidating the cache key even if the save had worked.

Fixes:

Replaced tutor dev launch -I --skip-build; tutor dev stop with tutor config save — generates the docker-compose files the test runner needs in ~5 seconds, without starting any containers (so no root-owned data files, cache saves cleanly).
Removed the edx-platform SHA from the Tutor config cache key. The generated config files (docker-compose templates, etc.) don't depend on edx-platform's specific commit — only on the Tutor version and branch. The cache now persists across edx-platform tip changes.

On the next run after a cache miss, "Generate Tutor config" will take ~5 seconds and save cleanly. Subsequent runs will skip it entirely via cache hit.

asadali145 · 2026-05-20T08:17:38Z

@copilot fix the pre-commit checks

Copilot · 2026-05-20T08:22:12Z

@copilot fix the pre-commit checks

Fixed in b2f5f29. Two pre-commit hooks were failing:

yamlfmt — reformatted the long tutor-config cache key line (auto-fixed by the hook itself).
actionlint (shellcheck SC2086) — several unquoted shell variables:
- All >> $GITHUB_OUTPUT redirections → >> "$GITHUB_OUTPUT"
- The docker compose command's $DIRECTORY, $DEV, and $PWD variables → properly quoted; also expanded that long single-line command to multi-line for readability.

asadali145 · 2026-05-20T10:08:53Z

@copilot now tests are failing with missing apps, I think tutor launch cannot be replaced with tutor config save. R&D on how tutor works with open edx and then make required changes

Copilot · 2026-05-20T10:34:23Z

@copilot now tests are failing with missing apps, I think tutor launch cannot be replaced with tutor config save. R&D on how tutor works with open edx and then make required changes

Fixed in 276723d.

Root cause: tutor dev launch runs tutor dev init, which executes the mounted-directories.sh script inside the LMS container. This script detects that edx-platform is bind-mounted (by checking for the missing bindmount-canary file) and runs pip install -e . to regenerate Open_edX.egg-info/ in the mounted directory.

Without this egg-info, pkg_resources cannot find the edx-platform's entry points. The edx-platform registers content_libraries (and many other apps) as LMS Django apps via the lms.djangoapp entry point group. The get_plugin_apps(ProjectType.LMS) call in lms/envs/common.py reads from these entry points to populate INSTALLED_APPS. Without the egg-info, content_libraries is never added to INSTALLED_APPS, causing the RuntimeError.

Fix: Added a step after Add edx-platform tutor mounts that runs pip install --no-deps -e /path/to/edx-platform on the host runner. This creates the Open_edX.egg-info/ directory in the edx-platform directory, which is then available inside the container via the bind mount. No container startup needed — this is equivalent to what mounted-directories.sh does but runs in ~5 seconds on the host instead of the ~9 minutes that tutor dev launch took.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

asadali145 · 2026-05-20T12:46:30Z

    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write


@copilot incorporate the above feedback.

Fixed in 3509aa6.

GitHub Actions does not support conditional job-level permissions (permissions must be static YAML), so a full job split would require saving and uploading the Docker image as a tarball artifact (several GB per matrix combination) then downloading and loading it in a separate job — adding significant overhead.

Instead, the feedback is addressed by ensuring packages: write is never exercised during PR runs:

The GHCR login step condition changed from github.event_name != 'pull_request' || same-repo-pr → github.event_name == 'push': the GITHUB_TOKEN with write scope is never used during PRs (PRs pull from GHCR anonymously, which works for public packages)

The Docker push step condition simplified to github.event_name == 'push': push never runs during PRs

A comment on packages: write documents this design and the GitHub Actions limitation

Also fixed the Tutor config cache key that was split across lines (reviewer comment at line 166-167).

asadali145 · 2026-06-05T09:57:01Z

@copilot rebase this branch with main.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

asadali145 · 2026-06-22T08:25:33Z

@copilot revisit all the changes in this PR and tell me why it took 9-15m initially after optimizations but as we did more iterations the time is not again over 20m.

…-environment-steps

Copilot · 2026-06-22T08:31:13Z

@copilot rebase this branch with main.

Done — merged origin/main into the branch to bring it up to date with the latest commits. (Force-push is not available from this agent, so a merge commit was used instead of a rebase.)

@copilot revisit all the changes in this PR and tell me why it took 9-15m initially after optimizations but as we did more iterations the time is not again over 20m.

The root cause is that the Docker image cache never works for PR runs.

Looking at the most recent successful CI run, the two dominant steps are:

Build Tutor images: ~13 min — always rebuilt on every PR run
Run tests: ~13 min — unavoidable

Total: ~26 min.

Why the Docker cache misses every time:

The Push Docker image to registry cache step is conditioned on github.event_name == 'push' (i.e., only on merges to main). PR runs use the pull_request event, so they can only pull from GHCR — they never write to it. The Pull cached Docker image step completes in ~0 seconds (the pull fails silently, setting cache_hit=false), and the build always runs.

Additionally, the GHCR cache tag encodes the edx-platform SHA: ...-<edx-sha>-<tutor-version>. On the master branch, edx-platform advances multiple times per day. Even when the image is pushed (on a main-branch merge), the next PR run resolves a newer SHA and misses the cache entirely.

Why "9-15m initially" → "20+ min now":

The "9-15m" was the time saved when tutor dev launch (9-15 min) was replaced with tutor config save (~5 sec). But that saving only eliminated the Tutor config step — the Docker image rebuild (~13 min) was always there and now shows as the sole bottleneck. The matrix addition of release/ulmo (3 branches × ~26 min each) further amplified the observed total runtime.

To actually get Docker cache hits on PR runs, one of the following is needed:

Also push the cached image on pull_request events (reverts the security trade-off discussed in the packages: write thread — though with the current guards it would only apply to same-repo PRs).
Key the cache tag only on Tutor version (dropping the edx-platform SHA), accepting that a cached image might serve a slightly different edx-platform commit.
Use Docker BuildKit's --cache-from/--cache-to with actions/cache — but Tutor invokes docker build internally, making this non-trivial to wire up.

Would you like me to implement one of these approaches?

Copilot AI assigned Copilot and asadali145 May 19, 2026

Copilot created this pull request from a session on behalf of asadali145 May 19, 2026 15:56 View session

Copilot finished work on behalf of asadali145 May 19, 2026 15:56

Copilot AI requested a review from asadali145 May 19, 2026 15:56

asadali145 requested a review from Copilot May 19, 2026 15:57

Copilot started reviewing on behalf of asadali145 May 19, 2026 15:58 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml Outdated

Copilot started work on behalf of asadali145 May 19, 2026 21:36 View session

Copilot finished work on behalf of asadali145 May 19, 2026 21:37

Copilot started work on behalf of asadali145 May 19, 2026 21:37 View session

Copilot finished work on behalf of asadali145 May 19, 2026 21:40

Copilot started work on behalf of asadali145 May 20, 2026 07:52 View session

Copilot finished work on behalf of asadali145 May 20, 2026 08:04

Copilot started work on behalf of asadali145 May 20, 2026 08:17 View session

Copilot finished work on behalf of asadali145 May 20, 2026 08:22

Copilot started work on behalf of asadali145 May 20, 2026 10:09 View session

Copilot finished work on behalf of asadali145 May 20, 2026 10:35

asadali145 requested a review from Copilot May 20, 2026 12:38

Copilot started reviewing on behalf of asadali145 May 20, 2026 12:39 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Copilot started work on behalf of asadali145 May 20, 2026 12:46 View session

Copilot finished work on behalf of asadali145 June 5, 2026 09:50

Copilot started work on behalf of asadali145 June 5, 2026 09:57 View session

Copilot finished work on behalf of asadali145 June 5, 2026 10:01

asadali145 requested a review from Copilot June 5, 2026 11:49

Copilot started reviewing on behalf of asadali145 June 5, 2026 11:49 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

asadali145 requested a review from Copilot June 5, 2026 12:45

Copilot started reviewing on behalf of asadali145 June 5, 2026 12:45 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

asadali145 requested a review from Copilot June 5, 2026 12:57

Copilot started reviewing on behalf of asadali145 June 5, 2026 12:57 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml

ci: add caching to speed up tutor-based CI workflow

5491e79

asadali145 force-pushed the copilot/improve-tutor-environment-steps branch from 0cd737e to 5491e79 Compare June 5, 2026 13:19

asadali145 requested a review from Copilot June 5, 2026 13:19

Copilot started reviewing on behalf of asadali145 June 5, 2026 13:19 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Potential fix for pull request finding

b29fc60

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of asadali145 June 5, 2026 18:21 View session

Include Tutor git SHA in version_tag when installed from git checkout

88503b5

Copilot finished work on behalf of asadali145 June 5, 2026 18:23

asadali145 force-pushed the copilot/improve-tutor-environment-steps branch from 4a73bd4 to 88503b5 Compare June 5, 2026 18:32

asadali145 added 2 commits June 8, 2026 13:13

more changes

9bb2aea

revert uv pip install changes

7fe7e79

Copilot started work on behalf of asadali145 June 22, 2026 08:25 View session

Merge remote-tracking branch 'origin/main' into copilot/improve-tutor…

09208eb

…-environment-steps

Copilot finished work on behalf of asadali145 June 22, 2026 08:31

Uh oh!

Conversation

Copilot AI commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Additional Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asadali145 commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

asadali145 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

asadali145 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

asadali145 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

asadali145 May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asadali145 commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

asadali145 commented Jun 22, 2026

Uh oh!

Copilot AI commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Copilot AI commented May 19, 2026 •

edited

Loading

Copilot AI commented Jun 22, 2026 •

edited

Loading