Skip to content

docs: RFC for Firecracker snapshots (instant bazel-diff starts)#376

Open
tinder-maxwellelliott wants to merge 1 commit into
masterfrom
claude/musing-jang-a8e7d7
Open

docs: RFC for Firecracker snapshots (instant bazel-diff starts)#376
tinder-maxwellelliott wants to merge 1 commit into
masterfrom
claude/musing-jang-a8e7d7

Conversation

@tinder-maxwellelliott

Copy link
Copy Markdown
Collaborator

Summary

Adds a design RFC (docs/firecracker-snapshots.md) for using Firecracker microVM snapshots to give instant starts of bazel-diff.

The win: bazel-diff's own JVM CLI starts in <1s — the cost is the bazel query deps(//...) it shells out to, which pays full Bazel server warmup + external-repo/bzlmod resolution + Skyframe graph load on every cold start (minutes on a large monorepo). A snapshot captures that warm state once and restores it in ~sub-second, so the PR-time path only re-analyzes changed packages.

Decided scope

  • CLI hooks in the Kotlin tool (warmup, fingerprint) + a Go orchestration tool (tools/firecracker/)
  • Captures a full warm Bazel server + repo cache
  • Targets self-hosted CI (we control host kernel + CPU model)

What the RFC covers

  • Record/consume lifecycle and the host-vs-CLI architectural split
  • New CLI surface: warmup (record entrypoint, clean exit = "safe to snapshot") and fingerprint (cache key); consume reuses existing generate-hashes + get-impacted-targets
  • Correctness (the linchpin): fingerprint cache key over bazel version / MODULE.bazel.lock / .bazelrc / bazel-diff version / flag set, a fail-safe fall-back-to-cold on any mismatch, the fact that SourceFileHasher already makes content correctness independent of server incrementality, and a CI canary
  • Firecracker self-hosted specifics (CPU pinning, COW overlay, UFFD, clock/net resync)
  • Snapshot store layout, Go tool UX, phasing, and open questions

This PR is docs-only

No code changes. Intended as the review artifact before implementation. Phase 1 (the fingerprint + warmup subcommands) would follow in a separate PR.

🤖 Generated with Claude Code

Design doc for capturing a warm Bazel server in a Firecracker microVM
snapshot so PR-time bazel-diff runs restore in ~sub-second instead of
paying full server warmup + external-repo fetch on every cold start.

Scope: CLI hooks (warmup, fingerprint) + a Go orchestration tool,
full warm-server snapshots, self-hosted CI. Centers the correctness
story (fingerprint cache key + fail-safe fall-back-to-cold) since an
incorrect affected set is worse than none.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant