Skip to content

Commit 960a0c2

Browse files
scotwellsclaude
andcommitted
feat: hub-side level-triggered companion GC (Component 6)
Add CompanionGCReconciler, a level-triggered backstop for referenced-data companions stranded by interrupted finalization. When the ReferencedDataController's WD finalizer is interrupted mid-flight (pod restart, SIGKILL), a hub companion can be left with a referenced-by annotation pointing at WDs that no longer exist on the hub. Without this controller, the companion persists forever (leaking data including Secret bytes), its ResourceBinding keeps a Work alive on the hub, and Karmada continuously re-creates the cell copy — exactly the cm-pristine/secret-pristine lab scenario. Design: - Watches referenced-data-labeled ConfigMaps and Secrets on the hub. The federationMgr cache for ConfigMaps and Secrets is restricted to objects carrying the ReferencedDataLabel via cache.Options.ByObject (cmd/main.go). This is the OOM guard: predicates filter events, not cache contents; without the cache-level scope the informer would list-and-watch every ConfigMap/Secret on the Karmada hub — the same unscoped-informer pattern that OOMKilled the cell CompanionGCReconciler. The label predicate on the controller is kept as belt-and-suspenders but is NOT the primary memory guard. - On each reconcile, parses the referenced-by annotation and checks each referenced WD by name in the companion's own hub namespace (ns-{project-uid}) via HubClient.Get — no MCManager needed because WDs are federated to the hub. - If ALL referrers are absent: deletes the companion and its ResourceBinding via the downstreamCompanionWriter path, driving the full Karmada cascade (RB → Work → cell copy deleted permanently). - Conservative safety: terminating WDs count as present; corrupt annotations and malformed WD keys are handled by skip-not-delete. - A companionGCPeriodicSweep backstop fires every 5 minutes to catch companions stranded before the controller started. - Wired in setupManagementControllers on the federationMgr alongside OrphanRBReconciler and InstanceProjector. Unit tests cover: orphaned CM/Secret deleted + RB torn down; live referrer preserved; terminating referrer counts as present; all-absent multi-referrer deleted; partial multi-referrer preserved; corrupt annotation preserved; empty annotation preserved; unlabeled object unaffected; RB-already-gone tolerated; periodic sweep drives reconciliation. Federated e2e (test/e2e/referenced-data-delete-cascade/): - HAPPY-PATH CASCADE: create WD + ConfigMap/Secret → assert companions on hub + cell + RBs present → delete WD → assert hub companion, RBs, cell copies all deleted and stay deleted (30 s anti-thrash poll). - STRANDED COMPANION BACKSTOP: inject a labeled companion with a dead WD key → assert CompanionGCReconciler reclaims it within the sweep interval. The e2e test is correct and self-contained but requires the Kind+Karmada harness (task e2e:up) which is not runnable headless. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit e419b54)
1 parent 0af1ecd commit 960a0c2

8 files changed

Lines changed: 1323 additions & 8 deletions

File tree

cmd/main.go

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,15 @@ import (
1616
"golang.org/x/sync/errgroup"
1717
_ "k8s.io/client-go/plugin/pkg/client/auth"
1818

19+
"k8s.io/apimachinery/pkg/labels"
1920
"k8s.io/apimachinery/pkg/runtime"
2021
"k8s.io/apimachinery/pkg/runtime/serializer"
2122
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
2223
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
2324
"k8s.io/client-go/rest"
2425
"k8s.io/client-go/tools/clientcmd"
2526
ctrl "sigs.k8s.io/controller-runtime"
27+
"sigs.k8s.io/controller-runtime/pkg/cache"
2628
"sigs.k8s.io/controller-runtime/pkg/client"
2729
"sigs.k8s.io/controller-runtime/pkg/cluster"
2830
"sigs.k8s.io/controller-runtime/pkg/healthz"
@@ -495,14 +497,36 @@ func ignoreCanceled(err error) error {
495497
// InstanceProjector). Called only when management controllers are enabled and
496498
// a federation REST config is available.
497499
func setupManagementControllers(mgr mcmanager.Manager, federationClient client.Client) ([]manager.Runnable, error) {
500+
// companionLabelSelector scopes the federation manager's ConfigMap and
501+
// Secret informer cache to referenced-data companions only. Without this,
502+
// For(&corev1.ConfigMap{}) in CompanionGCReconciler would establish a
503+
// cluster-wide ConfigMap+Secret informer that caches every object on the
504+
// Karmada hub — the same OOM pattern that killed the cell CompanionGCReconciler.
505+
// The label CACHE scope (not the predicate) is the correct OOM guard:
506+
// predicates filter events, not cache contents.
507+
companionLabelSelector := labels.SelectorFromSet(labels.Set{
508+
computev1alpha.ReferencedDataLabel: computev1alpha.ReferencedDataLabelValue,
509+
})
510+
498511
// The federation manager provides a cached, watchable handle to the Karmada
499-
// federation control plane. It backs the InstanceProjector's Instance watch
500-
// and the WorkloadDeploymentFederator's downstream WorkloadDeployment status
501-
// watch. A manager.Manager embeds a cluster.Cluster, so it can be passed
502-
// directly anywhere a watchable federation cluster source is required.
512+
// federation control plane. It backs the InstanceProjector's Instance watch,
513+
// the WorkloadDeploymentFederator's downstream WorkloadDeployment status watch,
514+
// and the CompanionGCReconciler. A manager.Manager embeds a cluster.Cluster, so
515+
// it can be passed directly anywhere a watchable federation cluster source is
516+
// required.
503517
federationMgr, err := manager.New(federationRestConfig, manager.Options{
504518
Scheme: scheme,
505519
Metrics: metricsserver.Options{BindAddress: "0"},
520+
Cache: cache.Options{
521+
// Scope ConfigMap and Secret informers to referenced-data companions.
522+
// CompanionGCReconciler is the only consumer on federationMgr that
523+
// reads these types; nothing else (InstanceProjector, OrphanRBReconciler)
524+
// needs non-companion CMs or Secrets from the cache.
525+
ByObject: map[client.Object]cache.ByObject{
526+
&corev1.ConfigMap{}: {Label: companionLabelSelector},
527+
&corev1.Secret{}: {Label: companionLabelSelector},
528+
},
529+
},
506530
})
507531
if err != nil {
508532
return nil, fmt.Errorf("federation manager: %w", err)
@@ -539,6 +563,16 @@ func setupManagementControllers(mgr mcmanager.Manager, federationClient client.C
539563
return nil, fmt.Errorf("OrphanRBReconciler: %w", err)
540564
}
541565

566+
// CompanionGCReconciler is a level-triggered backstop for stranded hub
567+
// companions: labeled ConfigMaps/Secrets whose referenced-by annotation
568+
// points at WDs that no longer exist on the hub. On each reconcile it
569+
// checks all referrer WDs in the hub namespace; if all are absent the
570+
// companion and its ResourceBinding are deleted, driving the Karmada
571+
// cascade to clean up Works and cell copies permanently.
572+
if err = controller.SetupCompanionGCWithManager(federationMgr, federationClient); err != nil {
573+
return nil, fmt.Errorf("CompanionGCReconciler: %w", err)
574+
}
575+
542576
return []manager.Runnable{federationMgr}, nil
543577
}
544578

0 commit comments

Comments
 (0)