Skip to content

Commit a2a027f

Browse files
committed
circleci: implement hierarchical cache model with fork-specific caches
Implements a new cache-based pipeline that dramatically improves build times through hierarchical caching with fork-specific caches. Key changes: - Add executor, commands, and job definitions for cache-based builds - Add create_hashes job to generate cache digest files - Add x86_blobs job for blob downloads with cache support - Add x86_musl_cross_make job for toolchain builds with cache save - Add x86_coreboot job per fork, each saves both modules and coreboot caches - Add ppc64_musl_cross_make and ppc64_coreboot jobs (decoupled from single job) - Add glossary documenting fan-in, workspace chain, cache layers Fixes for cache invalidation issues: - Remove .circleci/config.yml from cache key hashes (prevents cache invalidation on CI config changes - was causing full rebuilds on every pipeline) - Fix musl-cross-make module to auto-detect existing crossgcc using wildcard check - Exclude .circleci/config.yml from all_modules_and_patches.sha256sums and coreboot_musl-cross-make.sha256sums Test results (multiple pipeline runs): Pipeline 3789 (first run, cold cache): - x86-musl-cross-make: 30 min - ppc64-musl-cross-make: 16 min - Result: Cache saved Pipeline 3790 (second run, cache hit): - x86-musl-cross-make: 4.5 min (6.6x faster than first run) - ppc64-musl-cross-make: 4.5 min (3.5x faster than first run) - Result: Beats baseline (14.5 min) by 3.2x Pipeline 3791 (third run, cache hit): - x86-musl-cross-make: ~6 min (27s Make Board + spin up variance) - ppc64-musl-cross-make: 4.5 min - Result: Still beats baseline The wildcard fix for musl-cross-make module detects existing crossgcc from cache and skips rebuild entirely (Make Board takes only 27s vs 26 min cold). Add tests/circle-ci-simulation/ for local cache behavior verification: - test_cache_hash.sh, test_musl_skip.sh, simulate_cold_cache.sh, etc. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
1 parent 562f9e7 commit a2a027f

15 files changed

Lines changed: 977 additions & 161 deletions

.circleci/config.yml

Lines changed: 352 additions & 146 deletions
Large diffs are not rendered by default.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ crossgcc
2525
typescript*
2626
result
2727
.claude/
28+
tmpDir/

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Heads codebase. Start here:
3434
| [doc/ux-patterns.md](doc/ux-patterns.md) | GUI/UX conventions: whiptail wrappers, integrity report, error flows |
3535
| [doc/config.md](doc/config.md) | Board and user configuration system |
3636
| [doc/docker.md](doc/docker.md) | Reproducible build workflow using Docker |
37+
| [doc/circleci.md](doc/circleci.md) | CircleCI pipeline layout, workspace flow, and cache behavior |
3738
| [doc/qemu.md](doc/qemu.md) | QEMU board targets for development and testing |
3839
| [doc/wp-notes.md](doc/wp-notes.md) | Flash write-protection status per board |
3940
| [doc/BOARDS_AND_TESTERS.md](doc/BOARDS_AND_TESTERS.md) | Supported boards and their maintainers/testers |
@@ -65,9 +66,15 @@ provided Docker wrappers — no host-side QEMU or swtpm installation is needed.
6566
and QEMU runtime with software TPM (swtpm) and the bundled `canokey-qemu`
6667
virtual OpenPGP smartcard. Build and test entirely in software before flashing real hardware.
6768

69+
Build targets are the directory names under `boards/`. For the current set of
70+
tested and maintained targets, see [doc/BOARDS_AND_TESTERS.md](doc/BOARDS_AND_TESTERS.md).
71+
6872
For full details — wrapper scripts, Nix local dev, reproducibility verification, and
6973
maintainer workflow — see **[doc/docker.md](doc/docker.md)**.
7074

75+
For CI cache/workspace behavior and the CircleCI job graph, see
76+
**[doc/circleci.md](doc/circleci.md)**.
77+
7178
For QEMU board testing see **[doc/qemu.md](doc/qemu.md)**.
7279

7380
For troubleshooting build issues see **[doc/faq.md](doc/faq.md)** and
@@ -97,13 +104,11 @@ enabled by most board configs include:
97104

98105
* [musl-cross-make](https://github.com/richfelker/musl-cross-make) — cross-compiler toolchain
99106
* [coreboot](https://www.coreboot.org/) — minimal firmware replacing vendor BIOS/UEFI
100-
* [Linux](https://kernel.org) — minimal kernel payload (no built-in initramfs; boots with external initrd such as `initrd.cpio.xz`)
107+
* [Linux](https://kernel.org) — minimal kernel payload (no built-in initrd; boots with external initrd such as `initrd.cpio.xz`)
101108
* [busybox](https://busybox.net/) — core utilities
102-
* [kexec](https://wiki.archlinux.org/index.php/kexec) — boot OS from /boot
109+
* [kexec](https://wiki.archlinux.org/index.php/kexec) — Linux kernel executor (loads kernels from boot partition, USB, network)
110+
* [tpmtotp](https://github.com/osresearch/tpmtotp) — TPM-based TOTP/HOTP one-time password generator
103111
* [cryptsetup](https://gitlab.com/cryptsetup/cryptsetup) — LUKS disk encryption
104-
* [GPG](https://www.gnupg.org/) — /boot signature verification
105-
* [mbedtls](https://tls.mbed.org/) — cryptography for TPM operations
106-
* [tpmtotp](https://trmm.net/Tpmtotp) — TPM-based TOTP/HOTP attestation
107112

108113
The full build also includes: lvm2, tpm2-tools, flashrom/flashprog, dropbear (SSH),
109114
fbwhiptail (GUI), qrencode, and many others. See individual `modules/*` files and
@@ -118,12 +123,13 @@ kernel.
118123

119124
* Building coreboot's cross compilers can take a while. Luckily this is only done once.
120125
* Builds are finally reproducible! The [reproduciblebuilds tag](https://github.com/osresearch/heads/issues?q=is%3Aopen+is%3Aissue+milestone%3Areproduciblebuilds) tracks any regressions.
121-
* Currently only tested in QEMU, the Thinkpad x230, Librem series and the Chell Chromebook.
122-
** Xen does not work in QEMU. Signing, HOTP, and TOTP do work; see below.
123-
* Building for the Lenovo X220 requires binary blobs to be placed in the blobs/x220/ folder.
124-
See the readme.md file in that folder
125-
* Building for the Librem 13 v2/v3 or Librem 15 v3/v4 requires binary blobs to be placed in
126-
the blobs/librem_skl folder. See the readme.md file in that folder
126+
* Current tested and maintained boards are tracked in [doc/BOARDS_AND_TESTERS.md](doc/BOARDS_AND_TESTERS.md). Board targets themselves live under `boards/`.
127+
* Xen does not work in QEMU. Signing, HOTP, and TOTP do work; see below.
128+
* Blob requirements are board- or board-family-specific. Check the relevant documentation under `blobs/` for the target you are building.
129+
* Purism boards use Purism-managed coreboot blob paths from the Purism fork (for example `3rdparty/purism-blobs/...` via `CONFIG_IFD_BIN_PATH` and `CONFIG_ME_BIN_PATH` in `config/coreboot-librem_*.config`). Heads should not maintain those vendor blob payloads. Runtime firmware notes for Librem blob jail are in [blobs/librem_jail/README](blobs/librem_jail/README).
130+
* Lenovo xx20 boards such as X220 and X230 use the shared xx20 blob flow documented in [blobs/xx20/readme.md](blobs/xx20/readme.md). X220-specific notes are in [blobs/x220/readme.md](blobs/x220/readme.md).
131+
* Other boards can source blobs from board-family directories under `blobs/` (for example xx20/xx30/xx80, t420, t440p, w541) or from fork-specific paths configured in coreboot configs (for example Dasharo boards using `3rdparty/dasharo-blobs/...`). Vendor blob payloads remain maintained by their upstream vendors/forks.
132+
* T480 and T480s blob requirements are documented in [blobs/xx80/README.md](blobs/xx80/README.md). Other families have their own docs under `blobs/`, for example `t420/`, `t440p/`, and `w541/`.
127133

128134
### QEMU
129135

doc/architecture.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ The top-level `Makefile` orchestrates:
9898
- Final ROM image: coreboot ROM with Linux + initrd payload embedded
9999

100100
Reproducible builds are achieved via Nix-pinned Docker images. See [docker.md](docker.md).
101+
The CI pipeline's workspace and cache behavior is documented in
102+
[circleci.md](circleci.md).
101103

102104
---
103105

doc/circleci.md

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# CircleCI Pipeline and Cache Model
2+
3+
This document explains how the CircleCI pipeline in Heads is structured,
4+
what the cache layers mean, and how each coreboot fork saves its own modules cache.
5+
6+
See also: [development.md](development.md), [docker.md](docker.md),
7+
[architecture.md](architecture.md).
8+
9+
---
10+
11+
## Goals
12+
13+
The CircleCI pipeline is optimized for two constraints:
14+
15+
- Avoid CircleCI workspace fan-in errors.
16+
- Reuse expensive build outputs across pipelines without delaying unrelated
17+
board builds more than necessary.
18+
19+
The current layout favors a linear x86 seed chain followed by parallel board
20+
builds.
21+
22+
---
23+
24+
## Key concepts
25+
26+
### Workspace
27+
28+
A workspace is data passed from an upstream job to downstream jobs in the same
29+
workflow run.
30+
31+
- Workspaces help sibling jobs in the current pipeline.
32+
- Workspaces are downloaded fresh by downstream jobs.
33+
- Persisting the same paths from multiple upstream jobs into one downstream job
34+
causes fan-in problems in CircleCI.
35+
36+
### Cache
37+
38+
A CircleCI cache is stored for reuse by later pipeline runs in the same
39+
repository.
40+
41+
- Caches help future pipelines.
42+
- Caches do not speed up sibling jobs in the same workflow run.
43+
- Forks do not share caches with the upstream repository.
44+
- Each x86_coreboot job saves both modules and coreboot caches for its fork.
45+
46+
---
47+
48+
## x86 pipeline shape
49+
50+
The x86 chain is intentionally linear until a seed board has produced a usable
51+
workspace:
52+
53+
1. `create_hashes`
54+
2. `x86_blobs`
55+
3. `x86_musl_cross_make`
56+
4. `x86_coreboot` seed jobs, one per coreboot fork
57+
5. Downstream board builds for each fork, in parallel
58+
59+
For the coreboot 25.09 branch, the seed board is `EOL_t480-hotp-maximized`.
60+
That job produces the workspace used by the other 25.09 boards in the same
61+
workflow.
62+
63+
Other x86 forks follow the same pattern:
64+
65+
- `novacustom-nv4x_adl` seeds the `coreboot-dasharo_nv4x` fork
66+
- `novacustom-v560tu` seeds the `coreboot-dasharo_v56` fork
67+
- `librem_14` seeds the `coreboot-purism` fork
68+
- `EOL_t480-hotp-maximized` seeds the `coreboot-25.09` fork
69+
- `EOL_librem_l1um` seeds the `coreboot-4.11` fork
70+
- `UNTESTED_msi_z690a_ddr4` seeds the `coreboot-dasharo_msi_z690` fork
71+
- `UNTESTED_msi_z790p_ddr4` seeds the `coreboot-dasharo_msi_z790` fork
72+
73+
The downstream `build` jobs for each family consume the workspace from the
74+
relevant seed job instead of rebuilding the fork toolchain from scratch.
75+
76+
The ppc64 chain mirrors x86:
77+
1. `create_hashes`
78+
2. `ppc64_musl_cross_make` - builds musl-cross-make toolchain, saves cache
79+
3. `ppc64_coreboot` - builds coreboot-talos_2 fork, saves cache
80+
4. (no downstream boards - only one ppc64 board exists)
81+
82+
---
83+
84+
## Cache layers
85+
86+
The x86 pipeline uses hierarchical cache layers:
87+
88+
1. **`{arch}-musl-cross-make-nix-docker-heads-{hash}`**
89+
- Base toolchain (GCC + musl = musl-cross-make)
90+
- Paths: `build/{arch}/musl-cross-make-*`, `crossgcc/{arch}`, `install/{arch}`, `packages/{arch}`
91+
92+
2. **`{arch}-coreboot-musl-cross-make-nix-docker-heads-{hash}-{coreboot_dir}`**
93+
- Includes musl + coreboot toolstack
94+
- Paths: `build/{arch}/{coreboot_dir}`, `build/{arch}/musl-cross-make-*`, `crossgcc/{arch}`, `install/{arch}`, `packages/{arch}`
95+
96+
3. **`{arch}-modules-coreboot-musl-cross-make-nix-docker-heads-{hash}-{coreboot_dir}`**
97+
- Includes coreboot + musl + all built modules (FULL)
98+
- Paths: `build/{arch}`, `install/{arch}`, `crossgcc/{arch}`, `packages/{arch}`
99+
100+
4. **`{arch}-blobs-nix-docker-heads`** are handled separately
101+
102+
Cache key naming: `{arch}-{layer}-nix-docker-heads-{hash}[-{fork}]`
103+
104+
The cache key naming shows the dependency chain: each layer includes everything from the layers below it.
105+
106+
Restore order (most complete to least):
107+
```
108+
1. {arch}-modules-coreboot-musl-cross-make-nix-docker-heads-{modules_hash}-{coreboot_dir}
109+
2. {arch}-coreboot-musl-cross-make-nix-docker-heads-{coreboot_hash}-{coreboot_dir}
110+
3. {arch}-musl-cross-make-nix-docker-heads-{musl_hash}
111+
```
112+
113+
Each `x86_coreboot` job saves both:
114+
- modules cache (full build state)
115+
- coreboot cache (fork-specific toolstack)
116+
117+
---
118+
119+
## Current pipeline details
120+
121+
The current pipeline behavior is:
122+
123+
1. It uses explicit jobs for cache hashing, blob preparation, x86 musl seed,
124+
x86 coreboot forks (each saves both modules and coreboot caches), generic board
125+
builds, and the single ppc64 Talos II build.
126+
2. It uses a pinned `heads-docker` executor so the toolchain environment is
127+
stable across jobs.
128+
3. It clears only `build/<arch>/log/*` before a build, not the restored build
129+
trees themselves.
130+
4. It keeps x86 blob preparation separate from toolchain and firmware builds.
131+
5. It keys x86 coreboot caches by fork so one fork cannot restore another
132+
fork's build tree.
133+
6. It restores the largest valid cache first, because CircleCI stops at the
134+
first matching key.
135+
7. It stores `install/<arch>` together with the compiler and package trees so a
136+
restored musl toolchain still has its sysroot.
137+
8. It refreshes restored `.configured` and `.build` stamps before invoking
138+
`make`, so fresh checkout mtimes do not trigger a redundant rebuild of an
139+
already restored musl-cross-make tree.
140+
9. It decouples ppc64 into musl-cross-make and coreboot jobs (like x86) so each
141+
saves its cache immediately rather than at the end of a long combined build.
142+
143+
---
144+
145+
## Maintainer checklist
146+
147+
When changing `.circleci/config.yml`, update this document by answering these
148+
questions in order:
149+
150+
1. Did the job graph change?
151+
Update the `x86 pipeline shape` section and the seed-board list.
152+
2. Did a cache key, restore order, or saved path change?
153+
Update `Cache layers` and `Why musl could rebuild after a cache hit`.
154+
3. Did the change alter current runtime behavior or restore/build semantics?
155+
Update `Current pipeline details`.
156+
4. Did the change affect the maintenance workflow itself?
157+
Update this section too.
158+
159+
If you cannot summarize the change in one of those sections, the document is
160+
missing a section and should be extended rather than worked around.
161+
162+
---
163+
164+
## Edit map
165+
166+
Use this map when modifying the pipeline:
167+
168+
- Add or remove a cache hash input:
169+
edit `create_hashes` in `.circleci/config.yml` and update `Cache layers` here.
170+
- Add or remove x86 blob preparation:
171+
edit `x86_blobs` and update `x86 pipeline shape` plus `Cache layers`.
172+
- Add or remove an x86 coreboot fork seed:
173+
edit the `x86_coreboot` workflow entries and update the seed-board list in
174+
`x86 pipeline shape`.
175+
- Add or remove downstream boards for a fork:
176+
edit the `build` workflow entries and verify the seed dependency still points
177+
to the correct fork seed.
178+
- Change what makes musl reusable:
179+
update the save/restore paths in `.circleci/config.yml` and re-check the
180+
explanation in `Why musl could rebuild after a cache hit`.
181+
- Change ppc64 behavior:
182+
edit `ppc64_musl_cross_make` and/or `ppc64_coreboot` and re-check both
183+
`Cache layers` and the ppc64 chain description.
184+
185+
---
186+
187+
## Invariants
188+
189+
These are the current rules worth preserving unless a deliberate design change:
190+
191+
- Only one job at a time should persist a given workspace chain.
192+
- Blob download is separate from x86 toolchain and coreboot builds.
193+
- Each fork saves both modules and coreboot caches.
194+
- x86 and ppc64 restore lists should prefer the largest valid cache first.
195+
- Same-workflow cache misses can be expected when the broad key is being published during that workflow; this should improve on the next pipeline.
196+
- Musl reuse requires both `crossgcc/<arch>` and `install/<arch>`.
197+
- Each coreboot fork has its own cache keyed by `{coreboot_dir}` to prevent cross-fork contamination.
198+
- ppc64 now uses decoupled musl-cross-make + coreboot jobs, each saving cache immediately.
199+
200+
## How each fork saves its cache
201+
202+
Each `x86_coreboot` job (the first board for each coreboot fork) saves both:
203+
1. **modules cache** - full build state including all built modules
204+
2. **coreboot cache** - fork-specific coreboot toolstack
205+
206+
This means every fork is self-sufficient:
207+
- First board of fork builds everything and saves both caches
208+
- Downstream boards in same fork restore full modules cache
209+
- No separate cache publication job needed
210+
211+
---
212+
213+
## Why musl could rebuild after a cache hit
214+
215+
**Original problem**: Even when cache is restored, musl-cross-make was rebuilt
216+
because the Makefile only checked if `CROSS` env var was set, not if the
217+
compiler actually existed on disk.
218+
219+
**Fix**: The musl-cross-make module now uses `wildcard` to auto-detect if
220+
`crossgcc/<arch>/bin/<triplet>-gcc` exists. If found, it sets CROSS and uses
221+
the `--version` path (no rebuild). If not found, it builds from scratch.
222+
223+
The build logic also requires both:
224+
- the compiler binaries under `crossgcc/<arch>`
225+
- the installed sysroot under `install/<arch>`
226+
227+
If the cache only restores the compiler tree but not the installed headers and
228+
libraries, the generic module build rules still have missing outputs and musl
229+
is rebuilt.
230+
231+
That is why the current branch stores `install/x86` and `install/ppc64` in the
232+
musl and coreboot cache layers, not only in the broad modules cache.
233+
234+
There is a second reuse problem to watch for: restored stamp files can be older
235+
than freshly checked-out source files in CI. When that happens, GNU Make can
236+
decide that `.configured` and then `.build` are stale even though the restored
237+
outputs are complete. The current CI job refreshes restored `.configured` and
238+
`.build` timestamps before invoking `make` so restored musl-cross-make trees are
239+
reused instead of spending several minutes rebuilding for timestamp reasons
240+
alone.
241+
242+
---
243+
244+
## Cold-cache behavior
245+
246+
Cold runs are still expensive because:
247+
248+
- Downstream jobs still download the upstream workspace chain.
249+
- A fork starts with cold CircleCI caches because caches are repository-scoped.
250+
- CircleCI restores only the first matching key, so an unexpectedly narrow hit
251+
can still leave later work to do if the cache contents are incomplete.
252+
- Saving a large cache still requires uploading the selected directories.
253+
254+
---
255+
256+
## When to change this design
257+
258+
Adjust the model only if one of these is true:
259+
260+
- The seed board is no longer representative of the fork workspace.
261+
- The persisted workspace is too large and should be split further.
262+
- The modules cache key is too broad and causes low reuse.
263+
- CircleCI changes workspace or cache semantics.
264+
265+
## Design invariants
266+
267+
- Each coreboot fork saves both modules and coreboot caches, eliminating single-point-of-failure.
268+
- Cache key naming shows the dependency chain: modules includes coreboot includes musl.
269+
- Restore ordering must be explicit and largest-first. If two keys are valid,
270+
CircleCI uses the first match only.
271+
- Restored build markers can be older than fresh checkout files. Without stamp refresh,
272+
Make can rebuild musl-cross-make even after a correct modules-cache restore.
273+
- For ppc64, the middle fallback `coreboot+musl` improves reuse when
274+
`modules` is absent but a richer cache than plain `musl` exists.
275+
- Each x86 coreboot fork saves its own modules cache keyed by `{coreboot_dir}`.
276+
This prevents cross-fork contamination while enabling fork-specific reuse.
277+
- x86 coreboot forks avoid generic cross-fork fallback keys to prevent
278+
restoring another fork's coreboot tree.
279+
- ppc64 uses decoupled musl-cross-make + coreboot jobs. Each saves its cache
280+
immediately rather than at the end of a long combined build.
281+
- musl-cross-make module auto-detects existing crossgcc using wildcard check,
282+
skipping rebuild when compiler already exists from cache.
283+
284+
## First run observations (pipeline 3789 on circleci-cache-fix branch)
285+
286+
Cold cache run on new pipeline structure:
287+
- x86-musl-cross-make: 30 min (vs baseline 14.5 min) - slower due to new overhead
288+
- ppc64-musl-cross-make: 16 min (vs baseline 18 min) - slightly faster
289+
290+
The Make Board step takes longer in new pipeline because it persists more
291+
data after build (build/, install/, crossgcc/, packages/). The real test is
292+
second run when cache exists - verifies if wildcard fix skips rebuild.
293+
294+
## Cache hash inputs
295+
296+
Cache key hashes intentionally exclude `.circleci/config.yml` to prevent cache
297+
invalidation on CircleCI configuration changes. Add back once cache model is stable
298+
(see TODO in `.circleci/config.yml` create_hashes job).
299+
300+
Key files included in hashes:
301+
- `all_modules_and_patches.sha256sums`: `./Makefile`, `./flake.lock`, `./patches/`, `./modules/`
302+
- `coreboot_musl-cross-make.sha256sums`: `./flake.lock`, `./modules/coreboot`, `./modules/musl-cross-make*`, `./patches/coreboot*`
303+
- `musl-cross-make.sha256sums`: `./flake.lock`, `./modules/musl-cross-make*`
304+
305+

0 commit comments

Comments
 (0)