Skip to content

Speed up build-all / cross-platform native build (especially on CI) #393

@manusa

Description

@manusa

Problem

make build-all (and therefore release.yml, snapshots.yml, and the Linux build.yml job) is slow, dominated by build-native-cross-platform:

xgo -image ghcr.io/techknowlogick/xgo:go-1.25.10 $(COMMON_BUILD_ARGS) \
    -out native/out/helm --targets */arm64,*/amd64 ./native

This compiles the full Helm Go SDK transitive graph (helm.sh/helm/v3, client-go, controller-runtime, distribution/v3 — ~500 modules per go.sum) six times sequentially in a Docker container: linux/{amd64,arm64}, darwin/{amd64,arm64}, windows/{amd64,arm64}.

Now also blocks Go 1.26.x bump (#394)

Bumping the Go toolchain to 1.26.x is currently blocked on this issue. Go 1.26's cmd/link adds a LIBRARY directive to the auto-generated export_file.def for -buildmode=c-shared on Windows, using filepath.Base(outopt) verbatim (golang/go#78238, fix milestone Go 1.27, no 1.26.x backport announced). GNU ld's DEF parser rejects names containing multiple dots and hyphens — exactly what xgo produces (helm-windows-4.0-amd64.dll).

Verified workarounds and why they don't fly:

  • -extldflags '-Wl,--export-all-symbols' bypasses the broken codepath but then exceeds PE's 16-bit export ordinal cap (Helm's transitive symbol table is ~107k symbols vs. the 65k limit).
  • xgo's -out flag only sets the prefix; the per-target -windows-$PLATFORM-$XGOARCH.dll suffix is hardcoded in its container's build.sh.
  • A native cross-build (mingw-w64 on Linux) with -o helm.dll builds cleanly on Go 1.26.3 with no workaround, and the resulting DLL has a correct PE export table — verified locally.

So whichever improvement option below drops xgo from the Windows pipeline (B at minimum) unblocks the 1.26.x bump as a side effect, just by giving us control over the output filename. PR #394 is the holding-pattern bump to 1.25.10 in the meantime; the 1.26.x bump should follow as a separate PR once xgo is gone.

Why it's slow on CI specifically

  1. xgo image pull every runghcr.io/techknowlogick/xgo:go-1.25.10 is ~3 GB. No Docker layer caching is configured in any workflow.
  2. Go build cache is invisible to xgoactions/setup-go@v6 caches ~/.cache/go-build and ~/go/pkg/mod on the host, but xgo runs in a container with its own ephemeral cache. So each of the 6 targets recompiles every dep from scratch.
  3. No cache between CI runs — nothing persists xgo's internal cache across runs.
  4. Strictly sequential — xgo builds targets one after another inside one container; no parallelism on the 4-core runner.
  5. Duplicated work on the Linux build jobmake test-go compiles linux/amd64 with the host toolchain, then make build-all recompiles linux/amd64 inside xgo. Two full dep compiles for the same target.
  6. No Maven cache~/.m2/repository is re-downloaded every run in all three workflows.
  7. One extra target built that we don't ship--targets */arm64,*/amd64 produces 6 binaries, but lib/ only ships 5 (no lib/windows-arm64). windows/arm64 is wasted work.

Improvement options (ranked by ROI)

High ROI — CI-only, low risk

A. Parallelize per-platform builds via job matrix.
Split build-all so each platform builds on its own runner, then a final job downloads artifacts (actions/upload-artifactdownload-artifact) into native/out/ and runs mvn deploy. With 5 parallel jobs you go from ~6× the slowest-target time down to ~1×. Likely the single biggest win.

B. Drop xgo for Linux and Windows targets.
Use cross-compilers already available on ubuntu-latest:

  • linux/amd64: plain go build (host CGO).
  • linux/arm64: apt-get install gcc-aarch64-linux-gnu + CC=aarch64-linux-gnu-gcc GOARCH=arm64 go build.
  • windows/amd64: apt-get install gcc-mingw-w64 + CC=x86_64-w64-mingw32-gcc GOOS=windows go build.

Eliminates the 3 GB image pull and Docker overhead for 3 of 5 targets, and lets setup-go's host caches actually help. Required for the Go 1.26.x bump (see section above) — gives us control over the Windows DLL output filename.

C. Build darwin targets on a macOS runner.
Native CGO, no cross-toolchain needed, no xgo. Combined with A+B this removes xgo from the critical path entirely.

D. Cache ~/.m2/repository in all three workflows via actions/cache keyed on pom.xml hashes.

Medium ROI

E. If keeping xgo, cache the image via docker save / docker load keyed on the image tag, and mount the host's Go module/build cache into the container (xgo supports -go-volume / volume flags) so reruns reuse compiled deps.

F. Reuse the test-go compile on the Linux jobmake test-go already produces linux/amd64 artifacts via the host toolchain; skip rebuilding linux/amd64 in build-all.

Low ROI / nits

G. Restrict --targets to the 5 we actually ship — drop windows/arm64. Make the list explicit: linux/amd64,linux/arm64,darwin/amd64,darwin/arm64,windows/amd64.

Variables to consider before choosing

These affect which combination is right and should be weighed before implementing:

  • Maintenance cost of native cross-toolchains vs. xgo's "one tool, all targets" simplicity. Option B means three more apt packages and three more CC= invocations to keep in sync with Go version bumps.
  • macOS runner minutes are billed at 10× Linux on private repos. manusa/helm-java is public so this is free, but worth knowing for cost modeling if that changes.
  • Reproducibility — xgo pins a specific cross-toolchain image; ad-hoc apt installs depend on whatever ubuntu-latest ships. Likely fine, but a darwin/linux cross via osxcross would not be reproducible without pinning.
  • Local-dev impactmake build-native-cross-platform is also runnable locally by maintainers. Any rework should keep a single-command local cross-build path, even if CI uses a different mechanism.
  • Release artifact integrity — the release workflow signs and deploys to Maven Central. If natives are built across multiple runners, the final job must assemble them deterministically before mvn -Prelease deploy runs.
  • Disk space — Ubuntu runners already need free-disk-space to fit xgo. Native cross-compile may free that requirement (smaller working set).
  • Test coverage gap — the current build.yml matrix runs build-current-platform (which includes Java tests) on Windows and macOS. A parallel native-only matrix in release flows would not run tests; tests still come from build.yml. That's fine, but worth being explicit about.

Suggested next step

Prototype A + B + C in a branch, time it against current main, and post the comparison here before deciding. If the gain is large enough (expectation: ~5× wall-clock), commit to it; otherwise fall back to E + D as lower-risk caching wins. Either way, B is the minimum needed to unblock the Go 1.26.x bump.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions