perf(snippets): batch slow validators to cut snippet-validation CI wall time#526
Merged
Conversation
The validator ran one harness invocation per snippet, serially. For the syntax-only sdk-docs groups that dominate CI wall time (cpp, flutter, android, rust), the per-snippet work is tiny but each invocation paid the full environment setup again — re-resolving SDK packages, re-configuring CMake over the whole cpp-sdks tree, cold-starting a JVM, recompiling the dependency graph from scratch. Batch mode opts a validator in via `batch: true` in runner.yaml. The Go runner then resolves every matching (snippet, check) unit up front, groups them by (runtime, build-affecting env) so version-pinned / redis variants don't share a workspace, builds each image once, and partitions the group across up to --jobs concurrent harness invocations (default NumCPU). Each invocation gets a manifest of staged snippets and loops over them inside a single warm workspace. Non-batch validators keep the exact one-invocation-per-snippet path. The shared `run_batch` helper in lib.sh drives the manifest loop, tallies pass/fail, and continues past failures so one bad fragment doesn't hide the rest. Also fixes a latent race in await_success_line: a process that printed the success line and exited before the next poll was read as a failure because the loop broke on the dead pid without a final grep; syntax-only hellos that print-and-exit immediately hit this often under batch mode.
Opt the heaviest validators into batch mode and rework each harness to do its expensive setup once per job, then loop the staged snippets in a warm workspace. Measured locally (full per-SDK run): rust-server ~42m -> ~31s cpp-server ~1h29m -> ~20s cpp-client ~1h23m -> ~26s flutter (x3) ~44m -> ~2m android sdk-docs ~1h6m -> ~6m - rust: Dockerfile pre-bakes the SDK + tokio + transport dependency tree compiled once; per-snippet only recompiles the binary crate. - cpp (server/client + v2-c/v2-cpp variants): pre-bake a CONFIGURED CMake project (default + redis) so per-snippet skips the configure over the whole cpp-sdks tree and only runs an incremental `cmake --build`. The parse-only v2 stub validators just loop gcc/g++ in one container. - flutter (current + v2/v3): validate with `flutter build linux --debug` instead of `flutter build web --release` + headless Chromium. Both run the same Dart front-end (catching every syntax/type error), but the linux debug build stops before dart2js/AOT and needs no browser, so it finishes in ~5-7s warm vs ~27s. The snippets import only flutter/material and the LD SDK, which is cross-platform, so the linux target compiles the identical code with no divergence. - android: reset the package dir to the baseline scaffold between snippets and keep the gradle daemon warm across the loop, so only the first snippet pays JVM + gradle startup.
iOS was the largest pole (~2h): the native harness ran, per snippet,
xcodegen + `-resolvePackageDependencies` (which builds the LD SDK Swift
Package) + `xcodebuild test` (which boots a simulator) — even for the
syntax-only sdk-docs fragments whose body never runs.
Batch the ios-client harness: set up the project and resolve the Swift
Package ONCE into a shared DerivedData, then loop the staged snippets.
Dispatch on SNIPPET_CHECK:
- parse (sdk-docs / experimentation): `xcodebuild build` against the
iphonesimulator SDK — a compile/type-check with no simulator boot. The
wrappee body lives in a never-instantiated function, so a clean compile
is the signal; emit the canonical line. The swift-syntax-only scaffold
now carries `env: SNIPPET_CHECK: parse` (mirroring the android
syntax-only scaffolds) to select this path.
- runtime (init): `xcodebuild test` as before, booting the simulator and
grepping the captured log.
Cannot be exercised locally (needs macOS); verifying on CI.
…s line First CI run surfaced two iOS-only failures (the docker validators and android all passed): - Concurrent shards each ran `brew install xcodegen`, colliding on Homebrew's download lock. Native validators run directly on the macOS host with no container isolation, so concurrent shards contend on shared state (the brew lock, the Simulator runtime, the SwiftPM/DerivedData caches). Run native groups single-shard; one warm workspace that resolves the Swift Package once is also the optimal shape there. Docker groups keep the worker pool, since each shard is an isolated container. - The iOS runtime (init) path grepped the xcodebuild log for the success line but never re-emitted it to stdout, so the verify-hello-app wrapper's grep of the command output found nothing and failed the cell even though the snippet passed (`batch: 1/1 passed`). Re-emit the matched line, as the parse path already does.
joker23
approved these changes
Jun 30, 2026
These became the wall-time poles once the bigger offenders were batched (~18-22m each). Same docker + syntax-only shape, so the same treatment, and the batch worker pool gives them ~NumCPU-way concurrency on top: - java: the old harness ran `mvn clean compile assembly:single` from scratch per snippet (re-resolving plugins/deps, building a fat jar). Pre-bake a warm maven project (deps + plugins in ~/.m2, one compile done) and run each snippet via offline `mvn -o compile` + `exec:java`. Copy the whole staged source tree so multi-file snippets (the init runner + its Main companion) compile. - dotnet: the old harness synthesized a csproj and ran `dotnet add package` + `dotnet restore` + `dotnet run` from scratch per snippet. Pre-bake a warm project with the package superset every snippet's requirements ask for (ServerSdk + Observability + Ai + Redis + Telemetry + Consul + DynamoDB) restored and built; per-snippet just swaps Program.cs and rebuilds incrementally. The ASP.NET Core init stages its own package-less Web .csproj, so for that one the harness adds its requirements + restores. - haskell: the warm cabal project already existed; wrap the per-snippet build/run in the batch loop. haskell-server-v3's Dockerfile COPYs the shared haskell-server harness (now batch-aware), so it gets `batch: true` too — otherwise its snippets hit the batch harness in non-batch mode and fail on the missing SNIPPET_BATCH. Local full-SDK runs: java ~22m->~2m, dotnet ~21m->~90s, haskell ~18m->fast.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 524b2cb. Configure here.
- iOS: reset Sources to the scaffold baseline before staging each snippet. The project compiles every file under Sources/, so a differently-named .swift file from an earlier fragment could otherwise linger; mirrors the baseline reset the android and dotnet harnesses already do. - rust: surface a failed warm build on the version-pinned path instead of swallowing it with `|| true`, so a broken re-pin is reported up front rather than as a confusing per-snippet `cargo run` error. - haskell: delete the dead languages/haskell-server-v3/harness/run.sh. The v3 Dockerfile COPYs the shared (batch-aware) haskell-server harness, so the v3-local file was never used — it only invited the misreading that `batch: true` on v3 would hit a SNIPPET_ENTRYPOINT-only harness. Add a Dockerfile comment making the shared-harness intent explicit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
Snippet-validation CI wall time was dominated by jobs validating ~100 syntax-only
sdk-docsfragments serially, one harness invocation per snippet. The per-snippet work is tiny, but each invocation re-paid full environment setup. The whole run's wall time is the slowest job:Approach
Add a batch mode (
batch: trueinrunner.yaml). The Go runner resolves every matching snippet up front, groups them by(runtime, build-affecting env), builds each image once, and partitions across up to--jobsconcurrent harness invocations (defaultNumCPU). Each invocation loops a manifest of staged snippets in one warm workspace. Native (macOS) groups run single-shard (shared host — brew/simulator/SPM caches can't be driven concurrently). Non-batch validators are untouched. A sharedrun_batchhelper drives the manifest loop.Per-validator warm-workspace work:
cpp-sdkstree.flutter build linux --debug(real front-end compile, no dart2js/browser) instead offlutter build web --release+ headless Chromium. Snippets import only cross-platform APIs, so the linux target compiles identical code.xcodebuild build(compile, no simulator) for syntax-only,xcodebuild testforinit.mvn -o compile+exec:javainstead ofmvn clean compile assembly:singlefrom scratch.Results — verified green on CI (run 28459913855: all 35 jobs success)
Full-run wall time: ~2h+ → ~18 min, all green.
Full snippet coverage preserved (unit-selection logic unchanged — only dispatch differs). Also fixes a latent race in
await_success_line(a process that printed the success line and exited before the next poll was read as a failure).* haskell is now the pole, and it's image-build-bound, not per-snippet-bound — the cabal image compiles the Haskell SDK from source (now twice, for the v3 dep tree), which batching can't shorten. The remaining 5–8 min jobs are likewise dominated by cold image builds with no registry cache.
Follow-up
Cross-run Docker-layer caching (GHA cache / GHCR) is the next lever — it would cut the haskell pole and the cold-build component of every docker job, taking the wall well under 10 min. Left as a separate PR.