feat: implement CAPI failure domain contract for zones#720
feat: implement CAPI failure domain contract for zones#720emanuelebosetti wants to merge 2 commits intoionos-cloud:mainfrom
Conversation
|
Thanks! |
Map existing CAPMOX zones to CAPI failure domains so that KubeadmControlPlane automatically distributes control plane nodes across Proxmox zones. API changes: - Add Nodes []string to ZoneConfigSpec for per-zone node lists - Add ControlPlane *bool to ZoneConfigSpec (defaults to true) - Add FailureDomains to ProxmoxClusterStatus (CAPI v1beta2 contract) - Add spec.failureDomain to ProxmoxMachineSpec (InfraMachine contract) - Add GetZoneNodes() helper on ProxmoxCluster - Add FailureDomainNotReady condition reason Controller changes: - Add reconcileFailureDomains in cluster controller (sorted by name) - Read Machine.Spec.FailureDomain in machine controller with retryable FailureDomainNotReady condition - Use effectiveZone/effectiveAllowedNodes on MachineScope to avoid spec mutation from the controller - Pass zone override to GetInClusterPools for IPAM pool selection - Set Zone on NodeLocation in createVM - Update scheduler to use GetEffectiveAllowedNodes() Conversion: - Restore Status.FailureDomains and Spec.FailureDomain in v1alpha1 conversion webhook - Add manual conversion stub for ProxmoxMachineSpec.FailureDomain Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2350e2b to
37265c7
Compare
|
I've amended the commit to include the |
|
Thanks. No, we don't have any other policies. The field moves fast and we just didn't get to writing things down yet. |
We're going through the same process internally — introducing AI across dev teams and figuring out contribution policies. Three references we found particularly useful: Could be useful input for #716 as well. |
|
Hi @wikkyk , do you think it would be possible to include these failure domain changes in the upcoming v0.9.0 release? Happy to address any remaining feedback if needed. Thanks! |
|
Sure. v0.9.0 is still a fair bit far off, though. This only adds a field so it could reasonably be released in the 0.8 series. However, as the linter points out, the type of the |
Address review feedback (ionos-cloud#720): the field is optional with no validation, so kube-api-linter requires a pointer. Switch ProxmoxMachineSpec.FailureDomain to *string in v1alpha2 and regen deepcopy. Move the v1alpha2-only restore of FailureDomain into restoreProxmoxMachineSpec so both ProxmoxMachine and ProxmoxMachineTemplate ConvertTo paths preserve the field across the hub→spoke→hub roundtrip. The pointer change shifted the symptom from "" to nil and exposed the pre-existing missing restore on the Template path (TestFuzzyConversion/for_ProxmoxMachineTemplate/hub-spoke-hub). Validated end-to-end on a 10-node Proxmox test cluster: KCP rolled three fresh control plane Machines and both MachineDeployments rolled their workers through the failureDomain path (Machine.Spec.FailureDomain → ApplyFailureDomainNodes → IPAM zone scope → createVM) without errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@wikkyk pushed the pointer fix as a separate commit. Could you re-trigger the workflows? Thanks! |
Issue: #717
Description of changes:
Map existing CAPMOX zones to CAPI failure domains so that KubeadmControlPlane automatically distributes control plane nodes across Proxmox zones.
API changes (additive, backwards-compatible):
Nodes []stringtoZoneConfigSpecfor per-zone node listsControlPlane *booltoZoneConfigSpec(defaults totrue)FailureDomainstoProxmoxClusterStatus(CAPI v1beta2 contract)spec.failureDomaintoProxmoxMachineSpec(InfraMachine contract)GetZoneNodes()helper onProxmoxClusterFailureDomainNotReadycondition reasonController changes:
reconcileFailureDomainsin cluster controller (sorted by name to prevent reconcile loops)Machine.Spec.FailureDomainin machine controller with retryableFailureDomainNotReadyconditioneffectiveZone/effectiveAllowedNodesonMachineScopeto avoid spec mutation from the controllerGetInClusterPoolsfor IPAM pool selectionZoneonNodeLocationincreateVMGetEffectiveAllowedNodes()Conversion:
Status.FailureDomainsandSpec.FailureDomainin v1alpha1 conversion webhookProxmoxMachineSpec.FailureDomainTesting performed:
make lint test verifypassedzoneConfigwork as before