Skip to content

perf(snippets): batch slow validators to cut snippet-validation CI wall time#526

Merged
kinyoklion merged 6 commits into
mainfrom
rlamb/snippets-validate-batch
Jun 30, 2026
Merged

perf(snippets): batch slow validators to cut snippet-validation CI wall time#526
kinyoklion merged 6 commits into
mainfrom
rlamb/snippets-validate-batch

Conversation

@kinyoklion

@kinyoklion kinyoklion commented Jun 29, 2026

Copy link
Copy Markdown
Member

Problem

Snippet-validation CI wall time was dominated by jobs validating ~100 syntax-only sdk-docs fragments serially, one harness invocation per snippet. The per-snippet work is tiny, but each invocation re-paid full environment setup. The whole run's wall time is the slowest job:

job was
ios-client-sdk (sdk-docs) ~2h2m
cpp-server-sdk ~1h29m
cpp-client-sdk ~1h23m
android-client-sdk (sdk-docs) ~1h6m
flutter-client-sdk ~44m
rust-server-sdk ~42m
java-server-sdk ~22m
dotnet-server-sdk ~21m
haskell-server-sdk ~18m

Approach

Add a batch mode (batch: true in runner.yaml). The Go runner resolves every matching snippet up front, groups them by (runtime, build-affecting env), builds each image once, and partitions across up to --jobs concurrent harness invocations (default NumCPU). Each invocation loops a manifest of staged snippets in one warm workspace. Native (macOS) groups run single-shard (shared host — brew/simulator/SPM caches can't be driven concurrently). Non-batch validators are untouched. A shared run_batch helper drives the manifest loop.

Per-validator warm-workspace work:

  • rust — pre-bake the compiled dependency tree; recompile only the binary crate.
  • cpp (server/client + v2 variants) — pre-bake a configured CMake project so per-snippet skips the configure over the whole cpp-sdks tree.
  • flutter (current + v2/v3) — flutter build linux --debug (real front-end compile, no dart2js/browser) instead of flutter build web --release + headless Chromium. Snippets import only cross-platform APIs, so the linux target compiles identical code.
  • android — reset to the baseline scaffold between snippets and keep the gradle daemon warm.
  • ios — resolve the Swift Package once into shared DerivedData; xcodebuild build (compile, no simulator) for syntax-only, xcodebuild test for init.
  • java — pre-bake a warm maven project (deps + plugins in ~/.m2); run snippets via offline mvn -o compile + exec:java instead of mvn clean compile assembly:single from scratch.
  • dotnet — pre-bake a project with the package superset restored; per-snippet swaps Program.cs and rebuilds incrementally.
  • haskell — loop the existing warm cabal project in the batch harness.

Results — verified green on CI (run 28459913855: all 35 jobs success)

Full-run wall time: ~2h+ → ~18 min, all green.

validator before after (CI job time)
ios-client-sdk (sdk-docs) ~2h ~7m
cpp-server / client ~1.5h ~5m
android sdk-docs ~1h6m ~6m
flutter (x3) ~44m ~7m
rust-server ~42m <5m
java-server ~22m ~6m
dotnet-server ~21m ~6m
haskell-server ~18m ~17m *

Full snippet coverage preserved (unit-selection logic unchanged — only dispatch differs). Also fixes a latent race in await_success_line (a process that printed the success line and exited before the next poll was read as a failure).

* haskell is now the pole, and it's image-build-bound, not per-snippet-bound — the cabal image compiles the Haskell SDK from source (now twice, for the v3 dep tree), which batching can't shorten. The remaining 5–8 min jobs are likewise dominated by cold image builds with no registry cache.

Follow-up

Cross-run Docker-layer caching (GHA cache / GHCR) is the next lever — it would cut the haskell pole and the cold-build component of every docker job, taking the wall well under 10 min. Left as a separate PR.

The validator ran one harness invocation per snippet, serially. For the
syntax-only sdk-docs groups that dominate CI wall time (cpp, flutter,
android, rust), the per-snippet work is tiny but each invocation paid the
full environment setup again — re-resolving SDK packages, re-configuring
CMake over the whole cpp-sdks tree, cold-starting a JVM, recompiling the
dependency graph from scratch.

Batch mode opts a validator in via `batch: true` in runner.yaml. The Go
runner then resolves every matching (snippet, check) unit up front, groups
them by (runtime, build-affecting env) so version-pinned / redis variants
don't share a workspace, builds each image once, and partitions the group
across up to --jobs concurrent harness invocations (default NumCPU). Each
invocation gets a manifest of staged snippets and loops over them inside a
single warm workspace. Non-batch validators keep the exact
one-invocation-per-snippet path.

The shared `run_batch` helper in lib.sh drives the manifest loop, tallies
pass/fail, and continues past failures so one bad fragment doesn't hide the
rest. Also fixes a latent race in await_success_line: a process that printed
the success line and exited before the next poll was read as a failure
because the loop broke on the dead pid without a final grep; syntax-only
hellos that print-and-exit immediately hit this often under batch mode.
Opt the heaviest validators into batch mode and rework each harness to do
its expensive setup once per job, then loop the staged snippets in a warm
workspace. Measured locally (full per-SDK run):

  rust-server     ~42m  -> ~31s
  cpp-server      ~1h29m -> ~20s
  cpp-client      ~1h23m -> ~26s
  flutter (x3)    ~44m  -> ~2m
  android sdk-docs ~1h6m -> ~6m

- rust: Dockerfile pre-bakes the SDK + tokio + transport dependency tree
  compiled once; per-snippet only recompiles the binary crate.
- cpp (server/client + v2-c/v2-cpp variants): pre-bake a CONFIGURED CMake
  project (default + redis) so per-snippet skips the configure over the
  whole cpp-sdks tree and only runs an incremental `cmake --build`. The
  parse-only v2 stub validators just loop gcc/g++ in one container.
- flutter (current + v2/v3): validate with `flutter build linux --debug`
  instead of `flutter build web --release` + headless Chromium. Both run
  the same Dart front-end (catching every syntax/type error), but the
  linux debug build stops before dart2js/AOT and needs no browser, so it
  finishes in ~5-7s warm vs ~27s. The snippets import only flutter/material
  and the LD SDK, which is cross-platform, so the linux target compiles the
  identical code with no divergence.
- android: reset the package dir to the baseline scaffold between snippets
  and keep the gradle daemon warm across the loop, so only the first
  snippet pays JVM + gradle startup.
@kinyoklion kinyoklion requested a review from a team as a code owner June 29, 2026 23:16
iOS was the largest pole (~2h): the native harness ran, per snippet,
xcodegen + `-resolvePackageDependencies` (which builds the LD SDK Swift
Package) + `xcodebuild test` (which boots a simulator) — even for the
syntax-only sdk-docs fragments whose body never runs.

Batch the ios-client harness: set up the project and resolve the Swift
Package ONCE into a shared DerivedData, then loop the staged snippets.
Dispatch on SNIPPET_CHECK:

  - parse (sdk-docs / experimentation): `xcodebuild build` against the
    iphonesimulator SDK — a compile/type-check with no simulator boot. The
    wrappee body lives in a never-instantiated function, so a clean compile
    is the signal; emit the canonical line. The swift-syntax-only scaffold
    now carries `env: SNIPPET_CHECK: parse` (mirroring the android
    syntax-only scaffolds) to select this path.
  - runtime (init): `xcodebuild test` as before, booting the simulator and
    grepping the captured log.

Cannot be exercised locally (needs macOS); verifying on CI.
Comment thread snippets/validators/languages/ios-client/harness/run.sh
Comment thread snippets/validators/languages/rust/harness/run.sh Outdated
…s line

First CI run surfaced two iOS-only failures (the docker validators and
android all passed):

- Concurrent shards each ran `brew install xcodegen`, colliding on
  Homebrew's download lock. Native validators run directly on the macOS
  host with no container isolation, so concurrent shards contend on shared
  state (the brew lock, the Simulator runtime, the SwiftPM/DerivedData
  caches). Run native groups single-shard; one warm workspace that resolves
  the Swift Package once is also the optimal shape there. Docker groups keep
  the worker pool, since each shard is an isolated container.

- The iOS runtime (init) path grepped the xcodebuild log for the success
  line but never re-emitted it to stdout, so the verify-hello-app wrapper's
  grep of the command output found nothing and failed the cell even though
  the snippet passed (`batch: 1/1 passed`). Re-emit the matched line, as the
  parse path already does.
These became the wall-time poles once the bigger offenders were batched
(~18-22m each). Same docker + syntax-only shape, so the same treatment, and
the batch worker pool gives them ~NumCPU-way concurrency on top:

- java: the old harness ran `mvn clean compile assembly:single` from scratch
  per snippet (re-resolving plugins/deps, building a fat jar). Pre-bake a
  warm maven project (deps + plugins in ~/.m2, one compile done) and run
  each snippet via offline `mvn -o compile` + `exec:java`. Copy the whole
  staged source tree so multi-file snippets (the init runner + its Main
  companion) compile.
- dotnet: the old harness synthesized a csproj and ran `dotnet add package`
  + `dotnet restore` + `dotnet run` from scratch per snippet. Pre-bake a
  warm project with the package superset every snippet's requirements ask
  for (ServerSdk + Observability + Ai + Redis + Telemetry + Consul +
  DynamoDB) restored and built; per-snippet just swaps Program.cs and
  rebuilds incrementally. The ASP.NET Core init stages its own package-less
  Web .csproj, so for that one the harness adds its requirements + restores.
- haskell: the warm cabal project already existed; wrap the per-snippet
  build/run in the batch loop. haskell-server-v3's Dockerfile COPYs the
  shared haskell-server harness (now batch-aware), so it gets `batch: true`
  too — otherwise its snippets hit the batch harness in non-batch mode and
  fail on the missing SNIPPET_BATCH.

Local full-SDK runs: java ~22m->~2m, dotnet ~21m->~90s, haskell ~18m->fast.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 524b2cb. Configure here.

Comment thread snippets/validators/languages/haskell-server-v3/runner.yaml
- iOS: reset Sources to the scaffold baseline before staging each snippet.
  The project compiles every file under Sources/, so a differently-named
  .swift file from an earlier fragment could otherwise linger; mirrors the
  baseline reset the android and dotnet harnesses already do.
- rust: surface a failed warm build on the version-pinned path instead of
  swallowing it with `|| true`, so a broken re-pin is reported up front
  rather than as a confusing per-snippet `cargo run` error.
- haskell: delete the dead languages/haskell-server-v3/harness/run.sh. The
  v3 Dockerfile COPYs the shared (batch-aware) haskell-server harness, so
  the v3-local file was never used — it only invited the misreading that
  `batch: true` on v3 would hit a SNIPPET_ENTRYPOINT-only harness. Add a
  Dockerfile comment making the shared-harness intent explicit.
@kinyoklion kinyoklion merged commit 1e0cafd into main Jun 30, 2026
42 checks passed
@kinyoklion kinyoklion deleted the rlamb/snippets-validate-batch branch June 30, 2026 16:49
@github-actions github-actions Bot mentioned this pull request Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants