Skip to content

Commit 4af2250

Browse files
committed
ci: re-enable sccache on the manylinux2014 dockcross job (phase 2, job 1)
First dockcross job re-enabled after the phase-2 revert, now safe behind the build.sh probe. Forwards the Depot cache env into the container via DOCKCROSS_ARGS and enables SCCACHE_LOG=debug + SCCACHE_ERROR_LOG + RUST_BACKTRACE=full so this run captures the in-container panic root cause if it recurs (the probe keeps the build green either way). The CUDA, aarch64, Android, OpenCL-Android and Windows jobs stay uncached until this one is verified green in CI — one job at a time. Document the staged rollout and the probe in CLAUDE.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LjWiKSyNzqqpobSKYRiew5
1 parent c643b20 commit 4af2250

2 files changed

Lines changed: 49 additions & 8 deletions

File tree

.github/workflows/publish.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,22 @@ jobs:
200200
name: Cross-Compile manylinux2014 x86_64
201201
needs: [startgate, build-webui]
202202
runs-on: ubuntu-latest
203+
# Phase 2 dockcross cache rollout — FIRST job (fastest plain-build.sh job, cleanest probe).
204+
# build.sh now probe-compiles through sccache before trusting it as the launcher, so a
205+
# present-but-crashing in-container sccache (the panic that stalled the first attempt) falls
206+
# back to an uncached, green -O3 build instead of redding it. The diagnostic vars below are
207+
# forwarded into the container so this run captures the root cause if the panic recurs; drop
208+
# SCCACHE_LOG / SCCACHE_ERROR_LOG / RUST_BACKTRACE (and their -e passthroughs) once the cache
209+
# is confirmed working here, then roll out to the next dockcross job. Inert without DEPOT_TOKEN
210+
# (fork PRs) or with use_cache=false.
211+
env:
212+
USE_CACHE: ${{ github.event_name != 'workflow_dispatch' || inputs.use_cache }}
213+
SCCACHE_WEBDAV_ENDPOINT: https://cache.depot.dev
214+
SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}
215+
SCCACHE_LOG: debug
216+
SCCACHE_ERROR_LOG: /tmp/sccache_server.log
217+
RUST_BACKTRACE: full
218+
DOCKCROSS_ARGS: "-e SCCACHE_WEBDAV_ENDPOINT -e SCCACHE_WEBDAV_TOKEN -e USE_CACHE -e SCCACHE_LOG -e SCCACHE_ERROR_LOG -e RUST_BACKTRACE"
203219
steps:
204220
- uses: actions/checkout@v6
205221
- name: Download shared WebUI assets

CLAUDE.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -197,14 +197,39 @@ stays `-O3` and is **bit-identical** to a clean build (release-safe).
197197

198198
**Safety / transparency.** It is **inert** until `DEPOT_TOKEN` is configured and on **fork
199199
PRs** (secrets are hidden there) — those simply compile normally; the `Install sccache` step
200-
is `continue-on-error`; and `use_cache=false` forces a pristine, from-scratch build.
201-
202-
**Rollout.** **Phase 1 (current): the 3 macOS build jobs** (slowest + OOM-prone) —
203-
`brew install sccache` + the env above + `BUILD_JOBS: 2`. **Phase 2 (TODO):** the dockcross
204-
Linux/Android/CUDA jobs (the `sccache` binary **and** `DEPOT_TOKEN` must be passed *into* the
205-
container), the Windows jobs (sccache supports MSVC), and the Linux-host `test-cpp` job. To
206-
extend a job: install `sccache`, set the two `SCCACHE_WEBDAV_*` env vars, and (for
207-
RAM-limited runners) `BUILD_JOBS`.
200+
is `continue-on-error`; and `use_cache=false` forces a pristine, from-scratch build. Crucially,
201+
`build.sh` runs a **probe-compile health-check** (`sccache_can_wrap_compiler`) before trusting
202+
sccache as the launcher: it compiles a trivial TU *through* sccache, and only sets
203+
`-DCMAKE_{C,CXX}_COMPILER_LAUNCHER=sccache` if that succeeds. So a sccache that is present but
204+
**crashes** (the in-container panic that stalled phase 2) also falls back to an uncached, green
205+
`-O3` build — it logs the Rust panic backtrace (and the detached server's `SCCACHE_ERROR_LOG`,
206+
when a job sets one) for diagnosis but never reds the build. This closes the gap the original
207+
absent-only guard left.
208+
209+
**Rollout.** **Phase 1 — DONE & proven: the 3 macOS build jobs** (slowest + OOM-prone) —
210+
`brew install sccache` + the env above + `BUILD_JOBS: 2`. macOS build dropped **~40 min → ~6 min**
211+
with a warm cache. **Phase 2 — in progress: the dockcross cross-compiles**, enabled **one job at
212+
a time and verified green in CI before the next**. (The first attempt enabled all four at once
213+
and was reverted: the static-musl sccache panicked in-container and — pre-probe — redded the
214+
build. The probe above now makes that a safe fallback.) Order, each adding the env + a
215+
`DOCKCROSS_ARGS` passthrough:
216+
1. `crosscompile-linux-x86_64` (manylinux2014) — **enabled first**, with `SCCACHE_LOG=debug` +
217+
`SCCACHE_ERROR_LOG` + `RUST_BACKTRACE=full` so the run captures the panic root cause if it
218+
recurs. Once green with a cache hit in `sccache --show-stats`, drop the diagnostic vars.
219+
2. `crosscompile-linux-x86_64-cuda` (via `build_cuda_linux.sh`, which execs `build.sh`) — only
220+
the gcc C/C++ TUs cache (134 model files + ggml + httplib); the nvcc `.cu` kernels won't
221+
(limited sccache nvcc support) — still a large partial win on the ~70 min job.
222+
3. `crosscompile-linux-aarch64`, then 4. `crosscompile-android-aarch64`.
223+
5. `crosscompile-android-aarch64-opencl`**separate**, uses `build_opencl_android.sh` (not
224+
`build.sh`); needs its own probe/launcher wiring.
225+
226+
Per-job recipe: add `env:` { `USE_CACHE`, `SCCACHE_WEBDAV_ENDPOINT`, `SCCACHE_WEBDAV_TOKEN` } and
227+
`DOCKCROSS_ARGS: "-e SCCACHE_WEBDAV_ENDPOINT -e SCCACHE_WEBDAV_TOKEN -e USE_CACHE"` — the
228+
dockcross wrapper only forwards host env it is explicitly told to via `-e`. The fetched sccache
229+
version is the `SCCACHE_DL_VERSION` knob in `build.sh` (default **0.15.0**; overridable per-job
230+
to try a different build against a container that crashed another). **Windows** (`build.bat` +
231+
MSVC) is separate and last: use `mozilla-actions/sccache-action` / sccache's MSVC support, not
232+
the `build.sh` musl fetch.
208233

209234
**Cross-repo scope.** This Depot/sccache compiler cache makes sense only for java-llama.cpp —
210235
it is the only sibling repo with a native (C++/JNI) build. It does not apply to the pure-Maven

0 commit comments

Comments
 (0)