Skip to content

Commit ef26798

Browse files
authored
ci(regression): build test Docker image once, share across shards (heygen-com#427)
* ci(regression): build test Docker image once, share across shards Splits regression.yml into a `build-image` job + the existing `regression-shards` matrix. The build job produces a Docker tarball via `docker/build-push-action` with `outputs: type=docker,dest=...`, uploads it as a GHA artifact (retention 1 day, gzip level 1), and each shard downloads + `docker load`s it instead of rebuilding. Measured on PR heygen-com#419 regression runs before the change: - Docker build step: ~234s per shard WITH GHA layer cache hit - 11 shards × ~234s = ~43 min of runner time per PR just on redundant image builds Cold-cache cases are much worse — happening right now on PR heygen-com#419 after release commit b6f50ce bumped every `packages/*/package.json`, invalidating the COPY layer that feeds `bun install --frozen-lockfile`. All 10 shards are currently 25-30+ min into a parallel rebuild, thundering-herding the same npm packages from 10 runners. After this change: - 1× build (~4 min warm, ~15 min cold) + 11× (download + `docker load`) - Expected ~15-20s overhead per shard for artifact download + load - Net savings: ~30-40 min of runner time per PR run on warm cache, substantially more on cold cache The build job doesn't checkout LFS — Dockerfile.test only COPYs source + package manifests, never the golden baselines, so the image build never needed LFS. Shards still need LFS for the tests/**/output/output.mp4 baselines they validate against. * ci(regression): add explicit least-privilege permissions Addresses CodeQL warning 'Workflow does not contain permissions'. Defaults the workflow GITHUB_TOKEN to `contents: read` only. The build-image job elevates to `actions: write` because `docker/build-push-action` with `cache-from/to: type=gha` uses the GitHub Actions cache API, which needs read+write on the actions scope.
1 parent 6accf09 commit ef26798

1 file changed

Lines changed: 61 additions & 12 deletions

File tree

.github/workflows/regression.yml

Lines changed: 61 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ concurrency:
1111
group: regression-${{ github.ref }}
1212
cancel-in-progress: true
1313

14+
# Least-privilege token: only reading code. Jobs that need more (e.g. GHA
15+
# cache reads/writes from docker/build-push-action with `type=gha`) elevate
16+
# their own permissions inline.
17+
permissions:
18+
contents: read
19+
1420
jobs:
1521
changes:
1622
name: Detect changes
@@ -30,10 +36,55 @@ jobs:
3036
- "packages/engine/**"
3137
- "Dockerfile*"
3238
33-
regression-shards:
39+
# Build the regression Docker image once, export it as a tarball, and upload
40+
# as an artifact. Each matrix shard then downloads + `docker load`s it instead
41+
# of rebuilding from cache. Measured on PR #419: the Docker build step takes
42+
# ~4 min per shard even with GHA cache, so 11 shards = ~44 min of redundant
43+
# build time per run. This job replaces that with a single ~4 min build plus
44+
# ~15s of artifact download per shard.
45+
build-image:
46+
name: Build regression test image
3447
needs: changes
3548
if: needs.changes.outputs.code == 'true'
3649
runs-on: ubuntu-latest
50+
timeout-minutes: 20
51+
permissions:
52+
contents: read
53+
actions: write # docker/build-push-action `type=gha` cache reads + writes
54+
steps:
55+
- name: Checkout
56+
uses: actions/checkout@v4
57+
# No LFS needed here — Dockerfile.test only copies source + package manifests,
58+
# not the golden baselines under packages/producer/tests/**/output.
59+
60+
- name: Set up Docker Buildx
61+
uses: docker/setup-buildx-action@v3
62+
63+
- name: Build test image to tarball
64+
uses: docker/build-push-action@v6
65+
with:
66+
context: .
67+
file: Dockerfile.test
68+
tags: hyperframes-producer:test
69+
cache-from: type=gha,scope=regression-test-image
70+
cache-to: type=gha,mode=max,scope=regression-test-image
71+
outputs: type=docker,dest=/tmp/regression-test-image.tar
72+
73+
- name: Report image size
74+
run: ls -lh /tmp/regression-test-image.tar
75+
76+
- name: Upload image artifact
77+
uses: actions/upload-artifact@v4
78+
with:
79+
name: regression-test-image
80+
path: /tmp/regression-test-image.tar
81+
retention-days: 1
82+
compression-level: 1
83+
84+
regression-shards:
85+
needs: [changes, build-image]
86+
if: needs.changes.outputs.code == 'true'
87+
runs-on: ubuntu-latest
3788
timeout-minutes: 40
3889
strategy:
3990
fail-fast: false
@@ -79,18 +130,16 @@ jobs:
79130
fi
80131
done
81132
82-
- name: Set up Docker Buildx
83-
uses: docker/setup-buildx-action@v3
84-
85-
- name: Build test Docker image (cached)
86-
uses: docker/build-push-action@v6
133+
- name: Download test image artifact
134+
uses: actions/download-artifact@v4
87135
with:
88-
context: .
89-
file: Dockerfile.test
90-
load: true
91-
tags: hyperframes-producer:test
92-
cache-from: type=gha,scope=regression-test-image
93-
cache-to: type=gha,mode=max,scope=regression-test-image
136+
name: regression-test-image
137+
path: /tmp
138+
139+
- name: Load test image
140+
run: |
141+
docker load -i /tmp/regression-test-image.tar
142+
docker image ls hyperframes-producer:test
94143
95144
- name: "Run regression shard: ${{ matrix.shard }}"
96145
run: |

0 commit comments

Comments
 (0)