RFD 0191: bhyve live migration by nwilkens · Pull Request #168 · TritonDataCenter/rfd

nwilkens · 2026-04-16T12:26:17Z

Summary

Architecture design for live migration of bhyve VMs in Triton
Extends the existing VMAPI migration control-plane with bhyve-specific hypervisor hooks
Covers CPU compatibility, memory transfer, vCPU state, device serialization, and network cutover

Renumbered from RFD 190 to 191 to avoid conflict with existing rfd-190 branch.

Replaces #167.

🤖 Generated with Claude Code

- Add a "Hard Requirements" section that enumerates, by state category (CPU, interrupt controllers, memory, devices, clocks, storage, network, guest-visible identity), what has to move for live migration to work and why, plus what explicitly does not move and what invariants the hypervisor must enforce. - Add a "CPU And Platform Compatibility" section discussing how other hypervisors (VMware EVC, QEMU/libvirt, Hyper-V, Nutanix AHV, Oxide Propolis) handle the same problem, four candidate approaches for Triton, a recommended combination (per-VM baseline + preamble validation), and platform-image ABI compatibility. - Restructure "The CN-Local Migration Data Plane" to state the component's requirements without committing to a deployment home; leave the packaging decision (dedicated service / cn-agent / something else) explicitly open for community input. - Soften threat-model language: the data plane should be secured as a matter of principle rather than assuming a specific network posture. - Drop the phased "Implementation Plan" in favor of a single paragraph noting that scheduling is a planning exercise, not an architectural decision. - Reframe "Alternatives Considered" to explain why each alternative is less useful than the proposed direction, rather than labeling each as "rejected". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Refinements to the "CPU and per-vCPU state" subsection of Hard Requirements, based on a careful read of real working implementations: - Remove CR8 from the control register list. CR8's guest-visible value is the LAPIC Task Priority Register; it moves with the LAPIC, not the control registers. - Add EFER explicitly to the control register group with a note that while it is architecturally an MSR, its role in gating long-mode means it must transfer alongside the control registers and be excluded from any enumeration-based MSR list to avoid double-restoration. - Add IA32_DEBUGCTL to the debug registers group for the same architectural reason: it is an MSR, but belongs with the debug register state functionally. - Replace the vague "XSAVE area of up to several kilobytes" with a hard requirement that implementations query the required buffer size at runtime (via VM_DESC_FPU_AREA or equivalent) rather than hardcode a size. The area grows with each new state component (AMX tiledata alone is 8 KiB) and static buffers silently truncate on future microarchitectures. - Note that the MSR list is kernel-authoritative, not userspace- authoritative. Enumerating exhaustively from the kernel's MSR data class absorbs kernel additions without userspace ABI churn; hardcoded userspace lists silently drop new MSRs. - Extend per-vCPU run state to call out the SIPI vector explicitly: run state and SIPI vector must migrate as a pair, or APs that received INIT-but-not-SIPI will never start. - Split the old "pending injected events" bullet into two pieces: pending events (NMI, ExtInt, Exception, IntInfo via VDC_VMM_ARCH VAI_PEND_* fields) and the interrupt shadow (VM_REG_GUEST_INTR_SHADOW). Both are distinct and both must migrate to avoid dropping signals or double-delivering interrupts one instruction early. Also remove a stray em-dash introduced in the pending-events rewrite, keeping the document em-dash-free. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RFD 190 is already claimed by the "Deprecating NodeJS for Rust" RFD on the rfd-190 branch. Renumber our bhyve Live Migration Architecture RFD to 191 to avoid the conflict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a new top-level "Memory Transfer And Convergence Policy" section as a structural peer to "Storage Transfer". The prior draft covered what RAM must move (§ Guest physical memory) and the hypervisor-side dirty-tracking requirement (§ 5. Dirty-page tracking) but left the policy layer unspecified: when does the agent decide pre-copy has done enough, what guarantee do operators get about the resulting downtime, and what happens when that guarantee can't be met. Central recommendation: express the exit criterion as a target post-pause downtime in milliseconds. Each pass measures its own effective transfer rate; the exit threshold for pass N is `downtime_budget_ms × observed_bw_pages_per_ms`. Adapts to network conditions and guest behavior by construction. Also covers: - short fixed cooldown between passes (avoid trivial-convergence races); - hard safety ceiling on pass count; - SLA-miss event when the ceiling fires, so operators distinguish cleanly-converged from ran-to-ceiling migrations; - best-effort default, with strict mode as a per-migration option driven from the VMAPI request (out of scope for the first release); - per-deployment budget with per-migration override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add an overlay-case bullet to the Network subsection under Hard Requirements, and a short callout on why the invariant matters for sub-second cutover. Describes the shape of the requirement (peer caches must point at the destination within the cutover budget) and why a guest-issued GARP cannot bootstrap its own invalidation on a fabric network. Stays above the mechanism: does not prescribe which component drives invalidation, what API it uses, or how propagation is measured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nwilkens and others added 8 commits April 15, 2026 12:14

rfd: add bhyve live migration architecture draft

e1413de

rfd: clarify proposed migration interfaces

921306a

rfd/0191: add Claude as co-author

98fc86c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD 0191: bhyve live migration#168

RFD 0191: bhyve live migration#168
nwilkens wants to merge 8 commits into
masterfrom
rfd-0191

nwilkens commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nwilkens commented Apr 16, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant