feat: upgrade cluster-api to v1.11.11 with v1beta2 contract#378
feat: upgrade cluster-api to v1.11.11 with v1beta2 contract#378schegi wants to merge 17 commits into
Conversation
- Bump sigs.k8s.io/cluster-api and cluster-api/test to v1.11.11, controller-runtime to v0.21.0, k8s.io/* to v0.33.3 - Move core CAPI imports to api/core/v1beta2, IPAM to api/ipam/v1beta2, addons/bootstrap/controlplane test imports to their v1beta2 packages - Adopt v1beta2 contract Stage 2 status layout on IonosCloudCluster and IonosCloudMachine: status.initialization.provisioned (*bool, omitzero, MinProperties=1 per contract conventions), top-level metav1.Conditions, deprecated v1beta1 fields under status.deprecated.v1beta1 - Add cluster.x-k8s.io/v1beta2 contract label to CRDs - Wire CRDMigrator with UseStatusForStorageVersionMigration and concurrency 1; update patch helper owned-conditions options - Update importas aliases for the new CAPI package layout - Regenerate deepcopy, CRD manifests and mocks Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…or CAPI v1.11 - Point e2e core/bootstrap/control-plane components at CAPI v1.11.11 and add the v1.11 shared clusterctl metadata - Add 0.7 release series (contract v1beta1, matching upstream CAPI v1.11 metadata) for the release carrying this upgrade - Extend the README compatibility matrix: v0.7 requires CAPI v1.11 (the v1beta2 Stage 2 status layout drops top-level status.ready, which CAPI <= v1.10 relies on) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Raise the go directive to 1.26.0; bump builder image to golang:1.26.1 and lint/docs Go version references - Replace all ptr.To(...) call sites with the Go 1.26 extended new(expr) builtin and drop the now-unused ptr.To helper (ptr.Deref/ptr.Equal stay) - Apply gopls modernize fixes: errors.As -> errors.AsType (Go 1.26), strings.Split(...)[0] -> strings.Cut, redundant type arguments - Upgrade golangci-lint v1.64.8 -> v2.12.2 (v1 analyzers cannot parse new(expr) and reported false positives); migrate .golangci.yml to the v2 format, simplify lint-fix via the v2 fmt command, disable the new revive enforce-switch-style rule and exclude goconst in tests - Fix findings surfaced by v2's newer linters: preallocate slices, remove embedded-field selectors, fmt.Fprintf over WriteString(Sprintf), extract kind-name constants in e2e helpers Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CAPI v1.11 upgrade changed IPAddressClaimSpec.PoolRef from corev1.TypedLocalObjectReference to ipamv1.IPPoolReference. The new manual mapping silently left APIGroup as empty string when poolRef.APIGroup was nil. IPPoolReference.APIGroup is +required with MinLength=1; an empty value causes the API server to reject the IPAddressClaim with a CEL validation error, permanently stalling IP allocation for the affected machine with no clear pointer to the root cause. Fix: return an explicit error when APIGroup is nil so the reconcile loop surfaces the problem instead of producing an invalid API object.
…s an error The CAPI v1.11 upgrade replaced conditions.SetSummary (void return) with conditions.SetSummaryCondition (returns error). The new code returned that error immediately, skipping patchHelper.Patch entirely. The comment below the error check explicitly states the patch must happen even when reconciliation is aborted. Skipping it means any in-memory state changes made before PatchObject is called — deleted pending request entries, updated network info — are never flushed to the API server. On the next reconcile the controller reads stale data and may enter the wrong branch. Fix: capture the SetSummaryCondition error, always attempt the patch, and return the first non-nil error (patch error takes precedence since it means state was not persisted; the summary condition error is returned only if patching succeeded).
…s an error The CAPI v1.11 upgrade replaced conditions.SetSummary (void return) with conditions.SetSummaryCondition (returns error). The new code returned that error immediately, skipping patchHelper.Patch entirely. Skipping the patch means in-memory state changes — ProviderID, MachineNetworkInfo, Initialization.Provisioned — are never flushed to the API server. The controller then proceeds with the reconcile loop using state the API server never received. In the worst case, it removes the finalizer on a machine whose status was never written. Fix: capture the SetSummaryCondition error, always attempt the patch, and return the first non-nil error (patch error takes precedence since it means state was not persisted; the summary condition error is returned only if patching succeeded).
…onditions No production code writes MachineProvisionedCondition as a v1beta1 condition. Since CAPI v1.11, FinalizeMachineProvisioning uses conditions.Set (v1beta2 API) exclusively for this type. Listing it in WithOwnedV1Beta1Conditions has no effect today, but it signals false ownership: the patch helper claims responsibility for a v1beta1 condition slot it never fills. The result is that stale v1beta1 MachineProvisioned conditions from machines provisioned before the CAPI v1.11 upgrade are never cleaned up by the patch helper, because the helper only removes owned entries when the controller writes a replacement value. Fix: remove MachineProvisionedCondition from WithOwnedV1Beta1Conditions. Only clusterv1.ReadyCondition is managed through the v1beta1 interface and belongs in that list.
…dition
FinalizeMachineProvisioning called conditions.Set with
Reason: string(infrav1.MachineProvisionedCondition), recycling the
condition Type "MachineProvisioned" as its own Reason.
The CAPI v1beta2 convention requires Reason to be a distinct word
that explains WHY the condition holds — not a restatement of the type.
When CAPI mirrors this condition into the parent Machine's
InfrastructureReady summary, operators see Reason="MachineProvisioned"
which is identical to the condition type and provides no diagnostic
value. Alert rules and tooling that filter by Reason cannot distinguish
"provisioned" from "still provisioning" using this value.
All sibling conditions in the controller already follow the pattern
(WaitingForClusterInfrastructureReason, WaitingForBootstrapDataReason).
Fix: add a MachineProvisionedReason constant ("Provisioned") to the
API types and use it — together with an explicit Message — in the
conditions.Set call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rReady condition The cluster controller called conditions.Set with Reason: string(infrav1.IonosCloudClusterReady), recycling the condition Type "ClusterReady" as its own Reason. The CAPI v1beta2 convention requires Reason to be a distinct word that explains WHY the condition holds. When CAPI mirrors this into the parent Cluster's InfrastructureReady summary, operators see Reason="ClusterReady" — identical to the type — which provides no diagnostic value for monitoring or alerting. Fix: add a dedicated reason constant to the API types and use it — together with an explicit Message — in the conditions.Set call, matching the convention already used for machine conditions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CAPI v1.11 upgrade replaced conditions.MarkTrue/MarkFalse (which required a message string) with conditions.Set (metav1.Condition), where the Message field is optional. All call sites omitted it, regressing from the enforced-message contract of the v1beta1 API. conditions.SetSummaryCondition propagates Message verbatim when aggregating into the parent Ready condition. Empty messages mean operators inspecting kubectl describe, dashboards, or alert output see blank diagnostic text. This is particularly harmful for the WaitingFor conditions (False cases) where operators are actively debugging why a machine or cluster has not progressed. Fix: add a concise, human-readable Message to every conditions.Set call in both controllers.
…nce conversion The CAPI v1.11 upgrade introduced a manual mapping from corev1.TypedLocalObjectReference to ipamv1.IPPoolReference. The same three-field struct conversion (Name, Kind, APIGroup pointer-deref) appeared identically in both CreateIPAddressClaim and the test helper toIPPoolReference in ipam_test.go. If a new field is added to IPPoolReference the conversion must be updated in two places; a missed update silently produces different objects in production vs. tests, masking divergence. Note: Finding #1 changed the production path to return an error on nil APIGroup, which diverges from the test helper's nil-ok semantics. If the two paths remain structurally similar after that fix, extract a shared helper; otherwise document that the duplication is resolved by the semantic split.
…eta1Conditions and WithOwnedConditions The CAPI v1.11 upgrade added patch.WithOwnedV1Beta1Conditions alongside the existing patch.WithOwnedConditions in both Cluster.PatchObject and Machine.PatchObject. The condition types in both lists are identical, requiring maintainers to keep two parallel slices in sync. Adding a new owned condition type requires updating two places. A condition added only to WithOwnedConditions but forgotten in WithOwnedV1Beta1Conditions silently leaves v1beta1 clients with stale ownership semantics. Fix: derive both lists from a single package-level var (or adjacent constants) so the sync requirement is structurally enforced rather than relying on convention.
FinalizeMachineProvisioning contained two inline nil-guard blocks to
initialize Status.Deprecated and Status.Deprecated.V1Beta1 before
setting V1Beta1.Ready = true. IonosCloudMachine already has
SetV1Beta1Conditions which performs the identical two-step nil-safe
initialization. The inline copy re-implemented the pattern, creating
two maintenance surfaces: if a field is added to
IonosCloudMachineDeprecatedStatus, the inline copy must also be found
and updated separately.
Additionally, allocating fresh structs via &IonosCloudMachineDeprecatedStatus{}
unconditionally overwrites any already-set fields in a concurrent or
re-entrant call path.
Fix: add SetV1Beta1Ready(bool) to IonosCloudMachine following the same
nil-safe pattern as SetV1Beta1Conditions, and use it in
FinalizeMachineProvisioning.
There was a problem hiding this comment.
Pull request overview
Upgrades CAPIC to Cluster API v1.11.11 and migrates the provider implementation to the v1beta2 API/conditions contract, including CRD/status schema updates and accompanying test/e2e tooling changes.
Changes:
- Bump Go/tooling and dependency versions (Go 1.26, CAPI v1.11.11, controller-runtime v0.21, golangci-lint v2 config).
- Migrate API usage to CAPI v1beta2 contract (metav1.Condition, Initialization.* status fields, deprecated v1beta1 bridge fields).
- Update e2e framework/config and helper assertions to use new CAPI APIs and metadata.
Reviewed changes
Copilot reviewed 47 out of 49 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/suite_test.go | Updates e2e suite imports to CAPI v1beta2 packages. |
| test/e2e/helpers/ownerreference.go | Updates owner reference assertions/imports for v1beta2; removes ptr.To usage. |
| test/e2e/helpers/finalizers.go | Updates addons API import and reuses shared kind constants; minor slice prealloc. |
| test/e2e/env_test.go | Uses fmt.Fprintf for writing GitHub Actions outputs. |
| test/e2e/data/shared/v1.11/metadata.yaml | Adds CAPI v1.11 metadata for e2e/clusterctl. |
| test/e2e/config/ionoscloud.yaml | Points e2e component URLs/metadata to CAPI v1.11.11. |
| test/e2e/capic_test.go | Updates e2e spec inputs for API changes and pointer helper removal. |
| scope/machine.go | Migrates machine scope to v1beta2 conditions/patch ownership; sets ProviderID via new helper. |
| scope/machine_test.go | Updates tests to v1beta2 core API import. |
| scope/cluster.go | Migrates cluster scope to v1beta2 conditions/patch ownership; centralizes owned condition types. |
| scope/cluster_test.go | Updates tests to v1beta2 core API import. |
| README.md | Updates documented Go/CAPI compatibility matrix for v0.7 + CAPI v1.11. |
| metadata.yaml | Adds release series 0.7 metadata entry. |
| Makefile | Updates golangci-lint to v2 and uses golangci-lint fmt in lint-fix. |
| internal/util/ptr/ptr.go | Removes generic ptr.To helper; keeps Deref. |
| internal/util/ptr/ptr_test.go | Removes ptr.To tests; simplifies Deref test generics. |
| internal/service/k8s/ipam.go | Migrates IPAM APIs to v1beta2; validates poolRef.APIGroup; adds toIPPoolRef helper. |
| internal/service/k8s/ipam_test.go | Updates IPAM tests for new API types and pointer helper removal. |
| internal/service/cloud/suite_test.go | Updates service test suite objects for v1beta2 core/IPAM types and pointer helper removal. |
| internal/service/cloud/service.go | Uses Go 1.26-style generic error type assertion in not-found detection. |
| internal/service/cloud/server.go | Migrates conditions to metav1.Condition and sets Initialization.Provisioned + deprecated ready bridge. |
| internal/service/cloud/server_test.go | Updates server tests to match pointer helper removal. |
| internal/service/cloud/request.go | Uses strings.Cut to strip query params more directly. |
| internal/service/cloud/request_test.go | Updates request tests to match pointer helper removal. |
| internal/service/cloud/network.go | Migrates core API import; replaces util.IsControlPlaneMachine with local label check helper. |
| internal/service/cloud/network_test.go | Updates network tests to v1beta2 types and pointer helper removal. |
| internal/service/cloud/ipblock.go | Migrates core API import to v1beta2. |
| internal/service/cloud/ipblock_test.go | Updates ipblock tests to v1beta2 types and pointer helper removal. |
| internal/service/cloud/image.go | Adjusts machine version access for v1beta2 Machine.Spec.Version type change. |
| internal/service/cloud/image_test.go | Updates image tests for v1beta2 Machine.Spec.Version type change and pointer helper removal. |
| internal/controller/ionoscloudmachine_controller.go | Migrates to v1beta2 core API + metav1.Condition and Initialization readiness gates. |
| internal/controller/ionoscloudcluster_controller.go | Migrates to v1beta2 core API + metav1.Condition and Initialization.Provisioned; tweaks predicate scheme usage. |
| go.mod | Bumps Go version and primary dependencies to CAPI v1.11.11 and k8s v0.33.3. |
| go.sum | Updates transitive dependency checksums for the version bumps. |
| Dockerfile | Updates builder image to Go 1.26.4. |
| config/crd/kustomization.yaml | Adds v1beta2 contract label alongside v1beta1. |
| config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudmachinetemplates.yaml | CRD schema adjustments (e.g., minProperties) from regeneration. |
| config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudmachines.yaml | CRD status schema updates for metav1.Condition + Initialization/Deprecated bridge fields. |
| config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudclustertemplates.yaml | CRD schema updates from regeneration (incl. ControlPlaneEndpoint validation changes). |
| config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudclusters.yaml | CRD status schema updates and ControlPlaneEndpoint schema/validation changes. |
| cmd/main.go | Updates scheme registrations and controller setup flow; configures CRDMigrator with new options. |
| api/v1alpha1/zz_generated.deepcopy.go | Regenerates deepcopy methods for new status structs/condition types. |
| api/v1alpha1/suite_test.go | Updates envtest scheme registration to v1beta2 IPAM API. |
| api/v1alpha1/ionoscloudmachinetemplate_types.go | Updates core API import to v1beta2. |
| api/v1alpha1/ionoscloudmachine_types.go | Introduces Initialization + Deprecated(V1Beta1) status bridge; migrates conditions to metav1.Condition slice. |
| api/v1alpha1/ionoscloudmachine_types_test.go | Updates tests to use deprecated v1beta1 conditions helper + new status fields. |
| api/v1alpha1/ionoscloudcluster_types.go | Introduces Initialization + Deprecated(V1Beta1) status bridge; migrates conditions to metav1.Condition slice. |
| api/v1alpha1/ionoscloudcluster_types_test.go | Updates tests to use deprecated v1beta1 conditions helper + new status fields. |
| .golangci.yml | Migrates config to golangci-lint v2 format; enables gci/gofumpt as formatters. |
Files not reviewed (1)
- api/v1alpha1/zz_generated.deepcopy.go: Generated file
Comments suppressed due to low confidence (2)
config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudclusters.yaml:82
- The ControlPlaneEndpoint validation rule currently allows specifying a port without a host (e.g.
{port: 6443}), which results in an invalid/unusable endpoint but still passes schema validation. The rule should require host to be non-empty whenever port is set (and still allow both fields to be omitted).
x-kubernetes-validations:
- message: port must be within 1-65535
rule: size(self.host) == 0 && self.port == 0 || self.port > 0 &&
self.port < 65536
config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudclustertemplates.yaml:76
- The ControlPlaneEndpoint validation rule currently allows specifying a port without a host (e.g.
{port: 6443}), which results in an invalid/unusable endpoint but still passes schema validation. The rule should require host to be non-empty whenever port is set (and still allow both fields to be omitted).
x-kubernetes-validations:
- message: port must be within 1-65535
rule: size(self.host) == 0 && self.port == 0 || self.port
> 0 && self.port < 65536
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove double blank line in ipam.go and trailing newline in ipam_test.go to satisfy gci formatter. Drop unused //nolint:staticcheck directives on SetV1Beta1Ready call sites since staticcheck no longer flags them. Update stale condition message in isInfrastructureReady to reference the v1beta2 field name Cluster.Status.Initialization.InfrastructureProvisioned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses SonarQube S1192: define a constant instead of duplicating "unable to create controller" across three call sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace kindIonosCloudCluster with infrav1.IonosCloudClusterKind and the "Cluster" literals with clusterv1.ClusterKind — both constants already exist in their respective API packages. Addresses the copilot review finding; no upstream constant exists for the remaining literals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Available CAPI v1.11 VerifyClusterAvailable requires Cluster.Available to be True with an EMPTY message. Non-empty messages on True conditions propagate through SetSummaryCondition into the "Ready" mirror and ultimately into Cluster.Available, causing the assertion to fail. Root cause: the summaryMessage function in CAPI's conditions package includes non-empty Info-priority (True) condition messages when the overall status is True. Removing the messages from our success-state conditions makes the summary message empty, satisfying the test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
mcbenjemaa
left a comment
There was a problem hiding this comment.
First thing to mention is that the upgrade to v1beta2 should be done in steps, while introducing a new apiVersion in this case is v1alpha2
cause the new v1beta2 contract is a breaking change.
- First step is to add
v1beta2object to the status ofv1alpha1, but you can omit this, as we did with capmox. - Secondly, is adding the
v1alpha2version, while implementing thev1beta2contract. - Third, is to add conversion webhooks and make the controller reconcile the new
v1alpha2resource.
Here are the guidelines to implement the v1beta2:
https://cluster-api.sigs.k8s.io/developer/providers/migrations/v1.10-to-v1.11#how-to-implement-the-new-v1beta2-contract



Summary
This PR upgrades the provider to Cluster API v1.11.11, implementing the full v1beta2 contract migration and fixing all issues discovered during the upgrade code review.
Core upgrade (commits 1–3)
sigs.k8s.io/cluster-apiand all transitive dependencies; updates Go to 1.26.api/v1beta1toapi/core/v1beta2,api/ipam/v1beta2,api/addons/v1beta2,api/bootstrap/kubeadm/v1beta2,api/controlplane/kubeadm/v1beta2.conditions.MarkTrue/MarkFalse/SetSummary(v1beta1) withconditions.Set/SetSummaryCondition(v1beta2,metav1.Condition).Status.Initialization.Provisioned(v1beta2 contract) alongsideStatus.Deprecated.V1Beta1.Ready(migration bridge for old CRs until the next breaking release).IPAddressClaimSpec.PoolReffromcorev1.TypedLocalObjectReferencetoipamv1.IPPoolReference.ptr.Toremoval: replaces the internalptr.To[T]()helper with Go 1.26's built-innew(expr)throughout the codebase.errors.AsType: replaceserrors.Aswith the new Go 1.26errors.AsType[T]()generic helper inisNotFound.test/e2e/data/shared/v1.11/metadata.yaml.v1beta1 → v1beta2 migration strategy
The deprecated
Status.Deprecated.V1Beta1.*fields andGetV1Beta1Conditions/SetV1Beta1Conditionsinterfaces are intentionally retained as a two-release migration bridge:Initialization.ProvisionedandDeprecated.V1Beta1.Ready; CAPI reads the new field, old tooling reads the deprecated one.Bug fixes from code review (commits 4–13)
fix(ipam)poolRef.APIGroupis nil instead of silently producing an invalidIPAddressClaimwith empty APIGroup (which the API server rejects with a validation error)fix(scope)MachineProvisionedConditionfromWithOwnedV1Beta1Conditions— no production code writes it as a v1beta1 condition; listing it without a writer means stale pre-upgrade v1beta1 conditions are never cleaned upfix(scope)Cluster.PatchObject: captureSetSummaryConditionerror, always callpatchHelper.Patch; skipping the patch on summary-condition error left in-memory state changes (pending request entries, network info) unpersistedfix(scope)Machine.PatchObject; skipping the patch could result in ProviderID / Initialization.Provisioned never being written before finalizer removalfix(server)MachineProvisionedReason = "Provisioned"constant; replace tautologicalReason: string(MachineProvisionedCondition)inconditions.Setfix(cluster-ctrl)ClusterProvisionedReason = "Provisioned"constant; same tautology fix for the cluster ready conditionfix(controllers)Messagestrings to allconditions.Setcalls; the v1beta1MarkTrue/MarkFalseenforced a message argument — the v1beta2 migration silently dropped it, leaving operators with blank condition textrefactor(ipam)toIPPoolRefhelper; deduplicates theTypedLocalObjectReference → IPPoolReferenceconversion that existed in both production code and the test filerefactor(scope)ownedMachineConditions/ownedClusterConditionspackage vars as a single source of truth; eliminates the need to keepWithOwnedV1Beta1ConditionsandWithOwnedConditionslists manually in syncfix(server)SetV1Beta1Ready(bool)helper toIonosCloudMachine; replaces inline nil-guard duplication inFinalizeMachineProvisioningwith the same pattern already used bySetV1Beta1ConditionsCompatibility
v1beta1(unchanged — CAPI v1.11 continues to serve v1beta1 resources)Status.Deprecated.V1Beta1migration bridge0.7 / contract: v1beta1Test plan
make testkubectl describe ionoscloudmachineshows non-empty conditionMessageand a distinctReasonfieldValidateResourceVersionStablee2e check passes (no continuous reconciles at steady state)🤖 Generated with Claude Code