Skip to content

build pipeline Phase 1: trond build --artifact jar (stacked on #173 design)#174

Merged
kuny0707 merged 7 commits into
tronprotocol:developfrom
barbatos2011:feat/build-pipeline-phase1
May 17, 2026
Merged

build pipeline Phase 1: trond build --artifact jar (stacked on #173 design)#174
kuny0707 merged 7 commits into
tronprotocol:developfrom
barbatos2011:feat/build-pipeline-phase1

Conversation

@barbatos2011
Copy link
Copy Markdown

Stacked on #173 (the design spec). This PR's branch is based on the design PR's branch — when #173 merges, this PR's diff narrows automatically to just Phase 1.

Until then, the PR shows the design doc + Phase 1 implementation together. Reviewers can land #173 first, then this; or treat them as one review unit.

Summary

Phase 1 of the build pipeline from specs/002-trond-build-pipeline/. Implements the standalone trond build --source <path> --artifact jar path end-to-end:

  • New internal/build/ package — orchestrator + cache + lock + audit + git resolution + validators + bash-wrapper docker runner.
  • New cmd/build.go cobra command + SIGINT-aware context.
  • New internal/build/pins/ with go:embed-ed Eclipse Temurin digest pins (FR-024). Placeholder digests; make refresh-builder-pins (also new) resolves real sha256s.
  • schemas/output/build.schema.json + schema baseline regenerated; SchemaVersion 1.2.0 → 1.3.0.

Intent integration (Phase 2), --artifact image (Phase 3), SSH transfer (Phase 4), and build management commands (Phase 5) follow in separate PRs.

What's in the diff (Phase 1 only — design commits land via #173)

  • `internal/build/{builder,runner,cache,manifest,key,source,validate,imagetag,audit,hash,lock_posix,lock_windows}.go` + pins/ subpackage
  • `cmd/build.go` + schema lookup registration
  • `scripts/refresh-builder-pins.sh` + Makefile target
  • 26 test functions (unit + mock-runner + integration-tag-gated)

Security boundary (FR-022)

The build invokes a constant bash wrapper via argv:
```go
exec.CommandContext(ctx, "docker", "run", ..., imageRef,
"bash", "-c", dockerBuildScript, "--", req.GradleTask, req.GradleArgs...)
```

  • `dockerBuildScript` is a compile-time string — user input never gets interpolated.
  • gradle_task + gradle_args flow through `"$@"` argv expansion.
  • Output filename flows through `$OUT_NAME` env var.
  • `TestDockerBuildScript_NoUserInputInterpolation` is a static check that fails if anyone writes `${USER_INPUT}` into the script.
  • `ValidateGradleArgs` is a flag-name allowlist (not char regex — pass-2 of design review showed char regex was the wrong defense; `--init-script /tmp/evil.gradle` passed any sane char regex while `--projects=a,b,c` was rejected). Rejected: `--init-script`, `--include-build`, `--build-file`, `--settings-file`. Accepted: `--offline`, `--no-daemon`, `--parallel`, `--max-workers=N`, `--rerun-tasks`, `-D=`, `-P=`, `-q`/`-i`/`-d`.
  • `build.env` is a fixed key allowlist: `GRADLE_OPTS`, `JAVA_OPTS`, `GRADLE_USER_HOME`, `MAVEN_OPTS`, `ORG_GRADLE_PROJECT_*`. `LD_PRELOAD` etc. blocked.

Test plan

  • `go test ./...` green on darwin/amd64
  • `go vet -tags=integration ./internal/build/...` compiles
  • `./bin/trond build --help` shows documented flags
  • `./bin/trond build -o json` (no --source) returns a VALIDATION_ERROR envelope, exit code 2
  • CI: integration test against fresh Eclipse Temurin pull (gate behind workflow flag; not enabled by default)
  • Manual: `make refresh-builder-pins` writes real digests

Containerized Gradle-orchestrated build for the dev inner loop:
`trond build` standalone + intent-level `build:` block that `apply`
resolves before render/deploy. Pinned eclipse-temurin builder image,
content-addressed cache keyed by git revision (+ patch hash if dirty),
SSH targets get scp'd JARs (no source shipped), --builder host escape
hatch for developers with local Gradle.

Spec covers 5 user stories (P1: standalone build, P1: integrated apply,
P2: image artifact, P2: SSH transfer, P3: host builder), 14 functional
requirements, 4 success criteria. Plan splits implementation into 6
phases over ~7-8 working days, with risks (gradle task name drift,
pin-as-blocker, cache growth) and their mitigations.
Material design changes:
- Removed ambiguous `build.rebuild` enum (semantics overlapped confusingly
  with revision=HEAD + dirty detection).
- Fixed real bug: patch hash now combines `git diff` AND
  `git status --porcelain -uall` so untracked files invalidate cache.
- `pull_policy: never` is now explicit on locally-built docker images.
- Added FRs for: concurrent build flock (FR-015), SIGINT handling (FR-016),
  preflight integration for build inputs (FR-017), prune cross-ref with
  state.json::build_revision (FR-018), GRADLE/JAVA_OPTS + build.env
  passthrough (FR-019).
- US-1 gained acceptance scenarios for SIGINT (130) and concurrent
  serialization. US-4 gained scenarios for transfer progress and remote-scp
  preflight probe.
- MCP tool annotations made explicit (idempotent/readOnly/destructive).

Plan-side changes:
- Dropped go-git/v5 dep — shell out to git, matches docker/scp pattern.
- Phase 1 estimate 2d → 3d (signal handling + flock + patch-hash bug fix).
- Total 7-8d → 9-10d.
- Folded the separate host.go file back into builder.go (was over-stratified).
- Added cache prune TTL for dirty-build entries (7-day default).
Security hardening + correctness fixes from review pass 2.

Spec:
- FR-019 narrows build.env from "any KEY" to fixed allowlist (closes
  LD_PRELOAD-style hijacks via intent files).
- FR-022 (new) forbids shell-mediated gradle invocation; argv-form only;
  token regex "^[a-zA-Z0-9._:=/+-]+$" on gradle_task + gradle_args
  (closes command injection via user-supplied task name).
- FR-002 cache key now includes builder image digest so pin bumps
  invalidate stale artifacts. Naming: <sha>-b<digest6>[+dirty-<patchsha8>].
- FR-020 (new) cache hit MUST also stat the artifact file (manifest
  pointing at deleted JAR is now treated as miss).
- FR-021 (new) source-path relative resolution: CLI relative to CWD,
  intent relative to intent-file dir (matches docker-compose convention).
- FR-023 (new) fixes the build event JSON shape in the audit log.
- FR-024 (new) builder pins are go:embed-ed; override is escape hatch
  that participates in cache key.
- FR-025 (new) dirty-cache TTL configurable (default 7d, "never" allowed).
- FR-001 grows --gradle-arg repeatable flag.
- FR-016 SIGINT now also covers scp via .tmp + rename + remote cleanup.

Plan:
- New "Security boundary" section documenting argv-only execution +
  token regex with good/bad code examples.
- Cache key derivation updated to include BuilderImageDigest + GradleArgs.
- New "Artifact naming pattern" section shows the on-disk layout.
- Phase 1 deliverables grow internal/build/audit.go, pins.go, and an
  integration test runtime against a real eclipse-temurin image
  (build-tag gated).
- Phase 4 transfer details .tmp + rename + cancellation cleanup.
- Phase 5 prune logic now removes orphan manifests (FR-020 cleanup).

No estimate change (~9-10 working days). All additions slot into
existing phases.
Pass-3 review surfaced a portability bug, a defense-in-depth mismatch,
and several clarity / resilience issues. All 10 items applied.

Real bugs / design holes (4):
- Windows: syscall.Flock is POSIX-only. Split FR-015 impl across
  lock_posix.go (//go:build !windows) + lock_windows.go (in-process
  mutex fallback). Documented Windows caveat.
- Defense correction: pass-2's char-regex on gradle_args was too tight
  (rejected --projects=a,b,c) AND too loose (allowed --init-script
  /tmp/evil.gradle). Swapped to a flag-name allowlist. gradle_task
  keeps char regex (task names are inherently regular).
- image_tag format unvalidated. FR-005 grows reference-format check via
  github.com/distribution/reference.
- go:embed file location ambiguous. internal/build/pins/ package owns
  the JSON; root copy becomes a Makefile-generated symlink.

Clarifications (4):
- State field rename: build_revision → build_cache_key (the stored
  value is a full cache key, not just a sha).
- FR-024 grows pin-unreachable semantics: BUILDER_IMAGE_UNAVAILABLE
  error, suggestions for override + upgrade. No silent fallback.
- Phase 5 prune now docker image rm (not docker untag) so layers are
  actually freed.
- FR-023 audit lifecycle: append in_progress at start, atomic update
  to terminal state. Crashed builds leave forensic entries.

Tightening (2):
- Phase 1 estimate 3d → 4d; total 9-10d → 11-12d.
- Removed resolved open question (build.env allowlist; resolved in
  pass 2).
Implements the design from PR tronprotocol#173 / specs/002-trond-build-pipeline:
a content-addressed, container-orchestrated build pipeline for the
dev inner loop. This is Phase 1 only — `trond build --source <path>
--artifact jar -o json` end-to-end skeleton with validation, cache
lookup, and the docker-run path. Phase 2 (intent integration), 3
(image artifact), 4 (SSH transfer), 5 (mgmt commands) follow.

What landed:

- internal/build/pins/: go:embed-ed Eclipse Temurin pin file
  (FR-024). PLACEHOLDER digests for now; `make refresh-builder-pins`
  bumps them per release. Override participates in cache key.
- internal/build/lock_{posix,windows}.go: flock-based serialization
  (FR-015) split for cross-platform compilation.
- internal/build/imagetag.go: Docker reference-format validation
  (FR-005) for `build.image_tag`.
- internal/build/validate.go: gradle_task char regex (FR-022 strict);
  gradle_args flag-name allowlist (FR-022 — the pass-2 correction
  away from char regex); env-key allowlist (FR-019); JAR Main-Class
  check (FR-011).
- internal/build/source.go: git shell-out for revision resolution
  and dirty patch hash combining `git diff` + `git status` (FR-002
  including the regression guard for untracked files).
- internal/build/key.go: CacheKey type. On-disk naming
  `<sha7>-b<digest6>[+dirty-<patch8>][-x<extra6>]` (FR-002).
- internal/build/manifest.go: per-build JSON record (FR-004).
- internal/build/cache.go: lookup that also stats the artifact
  (FR-020), save, EnsureCacheDirs.
- internal/build/audit.go: two-phase lifecycle (in_progress at start,
  terminal on completion) per FR-023.
- internal/build/builder.go: Run() orchestrator; argv-only `docker
  run` invocation per FR-022; SIGINT-aware via ctx; .tmp+rename for
  atomic artifact promotion.
- internal/build/hash.go: file sha256 helper.
- cmd/build.go: cobra wiring + SIGINT context (FR-016).
- schemas/output/build.schema.json + internal/schema/files/build.schema.json.
- internal/schema/embed.go: SchemaVersion 1.2.0 → 1.3.0 + history note.
- internal/schema/version_baseline.json: regenerated.
- internal/schema/manifest.go + cmd/schema_coverage_test.go: register
  `trond build` in both schema lookup tables.

Tests:
- pins: hit/miss/override.
- key: naming shape, dirty suffix, builder-digest invalidation,
  gradle-args invalidation, override determinism.
- validate: task regex, args allowlist (incl. --init-script rejected,
  --projects=a,b,c accepted), env allowlist, image-tag format, JAR
  Main-Class.
- source: git resolve clean / dirty / untracked-only / nonexistent
  revision. The untracked-only case is the FR-002 regression guard.
- cache: empty / orphan-manifest cleanup (FR-020) / hit-with-artifact.
- lock_posix: serialization between two goroutines on same key.

NOT in Phase 1 (per plan):
- Intent `build:` block integration → Phase 2.
- `--artifact image` path → Phase 3 (Run() errors with
  NOT_IMPLEMENTED for now).
- SSH transfer → Phase 4.
- `build list/inspect/prune` mgmt commands → Phase 5.
- `--builder host` body → Phase 5.
- MCP tools → Phase 5.

Builder image digests are PLACEHOLDER sha256:0000... in this commit
so the embed file shape is testable. A real first build requires
either `make refresh-builder-pins` or
`--builder-image-override <ref@sha256:...>`.
Applies the 10-item second-pass review of the Phase 1 code (real bugs +
test gaps + code-quality polish). All 10 items addressed.

🔴 Real bugs:
- Container produced gradle output but trond looked in /out/<key>.tmp
  with no cp step — end-to-end build was unreachable. Fixed by routing
  through a constant bash wrapper script (`dockerBuildScript`) that
  runs ./gradlew via "$@" argv passthrough and cp's build/libs/*.jar
  to /out/$OUT_NAME. The script is a compile-time constant: gradle_task
  and gradle_args still arrive via argv (FR-022 safe), output filename
  via env var (no interpolation). /src is now mounted RW because gradle
  writes build/ and .gradle/ into the project tree just as a host
  invocation does; the read-only mount was safety theater.
- `make refresh-builder-pins` target + scripts/refresh-builder-pins.sh
  added (FR-012). Resolves Eclipse Temurin tags to current sha256
  digests and rewrites internal/build/pins/builder_image_digests.json.
  Requires docker + jq. Atomic temp-file write.
- JAR manifest continuation-line parsing fixed (FR-011 robustness).
  Long Main-Class FQNs wrap at 72 bytes with a leading-space continuation
  line per the JAR manifest spec; the old scanner took only the first
  line. Added a regression test with a wrapped fixture.

🟡 Test coverage:
- builder_test.go with dockerRunner mock seam: happy path, cache hit,
  build-failed → BUILD_FAILED envelope, INVALID_ARTIFACT path, argv
  contract assertion.
- TestRun_SIGINTReportsCancelled exercises FR-016 via a delaying mock
  + cancel(); asserts BUILD_CANCELLED + no .tmp left in cache dir.
- TestDockerBuildScript_NoUserInputInterpolation enforces the
  argv-not-interpolated FR-022 contract at the static-string level —
  adding ${USER_INPUT} into the script now fails the test.
- Integration test (`//go:build integration`) runs real
  eclipse-temurin:8-jdk-jammy against a 3-file gradle fixture
  (settings.gradle + build.gradle + FullNode.java) and asserts trond
  produces a structurally-valid JAR + a hit on the second invocation.
  Behind a build tag so unit-test CI doesn't pull 300 MB images.

🟢 Polish:
- Run() split into resolveBuild() + buildJAR() + small helpers; the
  200-line function is now 60 lines of orchestration.
- Removed bufioReaderAdapter; scanManifestMainClass takes io.Reader
  directly.
- Cache-key digest prefixes bumped 6 → 8 hex chars (24 bit → 32 bit
  cosmic-ray cushion); git sha prefix bumped 7 → 12 hex to match git's
  --abbrev=12 default. Schema regex + tests updated accordingly.
- runner.go separates dockerRunner interface from builder.go orchestration.
CI's golangci-lint flagged 7 issues. All addressed:

- gofmt drift in 4 files (struct-tag alignment): key.go, manifest.go,
  builder_test.go, validate_test.go. Plus a few pre-existing e2e
  test files the formatter touched along the way. `make fmt`-clean.
- errcheck: `defer r.Close()` on zip.OpenReader return — wrap in a
  func literal that discards the error (we're past success at that
  point; nothing to do with a close failure).
- deferInLoop: extracted the per-MANIFEST.MF `f.Open()` + scan into
  a helper (`readManifestMainClass`) so the defer is scoped to one
  call, not a loop iteration. Behaviour unchanged.
- exitAfterDefer in cmd/build.go: previous code called `os.Exit` after
  a `defer cancel()` — the cancel would never run. Removed the inline
  os.Exit; structured errors now flow through cobra RunE return value
  and `cmd/root.go::Execute` renders + sets exit code (the canonical
  path the rest of trond uses).

`make lint` is now clean. `go test ./...` still green.
@barbatos2011 barbatos2011 marked this pull request as ready for review May 17, 2026 14:13
@kuny0707 kuny0707 merged commit 2a2433c into tronprotocol:develop May 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants