Skip to content

Commit b1eef70

Browse files
authored
chore: backport (01/21/2026) (#1248)
2 parents 74761c7 + f0a270c commit b1eef70

18 files changed

Lines changed: 1024 additions & 57 deletions

File tree

.github/workflows/ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ on:
1414

1515
env:
1616
GO_VERSION: '1.24.9'
17+
CERT_MANAGER_VERSION: 'v1.16.2'
1718

1819
jobs:
1920
detect-noop:
@@ -143,6 +144,7 @@ jobs:
143144
PROPERTY_PROVIDER: 'azure'
144145
RESOURCE_SNAPSHOT_CREATION_MINIMUM_INTERVAL: ${{ matrix.resource-snapshot-creation-minimum-interval }}
145146
RESOURCE_CHANGES_COLLECTION_DURATION: ${{ matrix.resource-changes-collection-duration }}
147+
CERT_MANAGER_VERSION: ${{ env.CERT_MANAGER_VERSION }}
146148

147149
- name: Collect logs
148150
if: always()

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ Azure Fleet repo contains the code that the [Azure Kubernetes Fleet Manager](htt
1010
It follows the CNCF sandbox project [KubeFleet](https://github.com/kubefleet-dev/) and most of the development is done in the [KubeFleet](https://github.com/kubefleet-dev/).
1111

1212
## Get Involved
13-
For any questions, please see the [KubeFleet discussion board](https://github.com/kubefleet-dev/kubefleet/discussions).
13+
For any questions, please see the [KubeFleet discussion board](https://github.com/Azure/fleet/discussions).
1414

15-
For any issues, please open an issue in the [KubeFleet](https://github.com/kubefleet-dev/kubefleet/issues)
15+
For any issues, please open an issue in the [KubeFleet](https://github.com/Azure/fleet/issues)
1616

1717

1818
## Quickstart

SUPPORT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ feature request as a new Issue.
88

99
For help and questions about using this project, please
1010

11-
* start the conversation in the [GitHub Discussions](https://github.com/kubefleet-dev/kubefleet/discussions/).
11+
* start the conversation in the [GitHub Discussions](https://github.com/Azure/fleet/discussions/).
1212

1313
We are actively exploring other means for developers, system admins, and anyone who has an interest
1414
in the multi-cluster domain to engage with us. Please stay tuned.

charts/hub-agent/README.md

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,33 @@
22

33
## Install Chart
44

5+
### Default Installation (Self-Signed Certificates)
6+
57
```console
68
# Helm install with fleet-system namespace already created
79
helm install hub-agent ./charts/hub-agent/
810
```
911

12+
### Installation with cert-manager
13+
14+
When using cert-manager for certificate management, install cert-manager as a prerequisite first:
15+
16+
```console
17+
# Install cert-manager (omit --version to get latest, or specify a version like --version v1.16.2)
18+
# Note: See CERT_MANAGER_VERSION in .github/workflows/ci.yml for the version tested in CI
19+
helm repo add jetstack https://charts.jetstack.io
20+
helm repo update
21+
helm install cert-manager jetstack/cert-manager \
22+
--namespace cert-manager \
23+
--create-namespace \
24+
--set crds.enabled=true
25+
26+
# Then install hub-agent with cert-manager enabled
27+
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
28+
```
29+
30+
This configures cert-manager to manage webhook certificates.
31+
1032
## Upgrade Chart
1133

1234
```console
@@ -32,6 +54,12 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
3254
| `affinity` | Node affinity for hub-agent pods | `{}` |
3355
| `tolerations` | Tolerations for hub-agent pods | `[]` |
3456
| `logVerbosity` | Log level (klog V logs) | `5` |
57+
| `enableWebhook` | Enable webhook server | `true` |
58+
| `webhookServiceName` | Webhook service name | `fleetwebhook` |
59+
| `enableGuardRail` | Enable guard rail webhook configurations | `true` |
60+
| `webhookClientConnectionType` | Connection type for webhook client (service or url) | `service` |
61+
| `useCertManager` | Use cert-manager for webhook certificate management (requires `enableWorkload=true`) | `false` |
62+
| `webhookCertSecretName` | Name of the Secret where cert-manager stores the certificate | `fleet-webhook-server-cert` |
3563
| `enableV1Beta1APIs` | Watch for v1beta1 APIs | `true` |
3664
| `hubAPIQPS` | QPS for fleet-apiserver (not including events/node heartbeat) | `250` |
3765
| `hubAPIBurst` | Burst for fleet-apiserver (not including events/node heartbeat) | `1000` |
@@ -41,4 +69,55 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
4169
| `MaxFleetSizeSupported` | Max number of member clusters supported | `100` |
4270
| `resourceSnapshotCreationMinimumInterval` | The minimum interval at which resource snapshots could be created. | `30s` |
4371
| `resourceChangesCollectionDuration` | The duration for collecting resource changes into one snapshot. | `15s` |
44-
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |
72+
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |
73+
74+
## Certificate Management
75+
76+
The hub-agent supports two modes for webhook certificate management:
77+
78+
### Automatic Certificate Generation (Default)
79+
80+
By default, the hub-agent generates certificates automatically at startup. This mode:
81+
- Requires no external dependencies
82+
- Works out of the box
83+
- Certificates are valid for 10 years
84+
- **Limitation: Only supports single replica deployment** (replicaCount must be 1)
85+
86+
### cert-manager (Optional)
87+
88+
When `useCertManager=true`, certificates are managed by cert-manager. This mode:
89+
- Requires cert-manager to be installed as a prerequisite
90+
- Requires `enableWorkload=true` to allow cert-manager pods to run in the hub cluster (without this, pod creation would be blocked by the webhook)
91+
- Requires `enableWebhook=true` because cert-manager is only used for webhook certificate management
92+
- Handles certificate rotation automatically (90-day certificates)
93+
- Follows industry-standard certificate management practices
94+
- **Supports high availability with multiple replicas** (replicaCount > 1)
95+
- Suitable for production environments
96+
97+
To switch to cert-manager mode:
98+
```console
99+
# Install cert-manager first (omit --version to get latest, or specify a version like --version v1.16.2)
100+
# Note: See CERT_MANAGER_VERSION in .github/workflows/ci.yml for the version tested in CI
101+
helm repo add jetstack https://charts.jetstack.io
102+
helm repo update
103+
helm install cert-manager jetstack/cert-manager \
104+
--namespace cert-manager \
105+
--create-namespace \
106+
--set crds.enabled=true
107+
108+
# Then install hub-agent with cert-manager enabled
109+
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
110+
```
111+
112+
The `webhookCertSecretName` parameter specifies the Secret name for the certificate:
113+
- Default: `fleet-webhook-server-cert`
114+
- When using cert-manager, this is where cert-manager stores the certificate
115+
- Must match the secret name referenced in the deployment volume mount
116+
117+
Example with custom secret name:
118+
```console
119+
helm install hub-agent ./charts/hub-agent \
120+
--set useCertManager=true \
121+
--set enableWorkload=true \
122+
--set webhookCertSecretName=my-webhook-secret
123+
```
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
{{- if and .Values.enableWebhook .Values.useCertManager }}
2+
---
3+
apiVersion: cert-manager.io/v1
4+
kind: Certificate
5+
metadata:
6+
# This name must match FleetWebhookCertName in pkg/webhook/webhook.go
7+
name: fleet-webhook-certificate
8+
namespace: {{ .Values.namespace }}
9+
labels:
10+
{{- include "hub-agent.labels" . | nindent 4 }}
11+
spec:
12+
# Secret name where cert-manager will store the certificate
13+
secretName: {{ .Values.webhookCertSecretName }}
14+
15+
# Certificate duration (90 days is cert-manager's default and recommended)
16+
duration: 2160h # 90 days
17+
18+
# Renew certificate 30 days before expiry
19+
renewBefore: 720h # 30 days
20+
21+
# Subject configuration
22+
subject:
23+
organizations:
24+
- KubeFleet
25+
26+
# Common name
27+
commonName: fleet-webhook.{{ .Values.namespace }}.svc
28+
29+
# DNS names for the certificate
30+
dnsNames:
31+
- {{ .Values.webhookServiceName }}
32+
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}
33+
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc
34+
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc.cluster.local
35+
36+
# Issuer reference - using self-signed issuer
37+
issuerRef:
38+
name: fleet-selfsigned-issuer
39+
kind: Issuer
40+
group: cert-manager.io
41+
42+
# Private key configuration
43+
privateKey:
44+
algorithm: ECDSA
45+
size: 256
46+
47+
# Key usages
48+
usages:
49+
- digital signature
50+
- key encipherment
51+
- server auth
52+
---
53+
# Self-signed issuer for generating the certificate
54+
apiVersion: cert-manager.io/v1
55+
kind: Issuer
56+
metadata:
57+
name: fleet-selfsigned-issuer
58+
namespace: {{ .Values.namespace }}
59+
labels:
60+
{{- include "hub-agent.labels" . | nindent 4 }}
61+
spec:
62+
selfSigned: {}
63+
{{- end }}

charts/hub-agent/templates/deployment.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
{{- if and (not .Values.useCertManager) (gt (.Values.replicaCount | int) 1) }}
2+
{{- fail "ERROR: replicaCount > 1 requires useCertManager=true (self-signed certificates cannot be shared across replicas)" }}
3+
{{- end }}
14
apiVersion: apps/v1
25
kind: Deployment
36
metadata:
@@ -6,6 +9,7 @@ metadata:
69
labels:
710
{{- include "hub-agent.labels" . | nindent 4 }}
811
spec:
12+
replicas: {{ .Values.replicaCount }}
913
selector:
1014
matchLabels:
1115
{{- include "hub-agent.selectorLabels" . | nindent 6 }}
@@ -34,6 +38,7 @@ spec:
3438
- --webhook-service-name={{ .Values.webhookServiceName }}
3539
- --enable-guard-rail={{ .Values.enableGuardRail }}
3640
- --enable-workload={{ .Values.enableWorkload }}
41+
- --use-cert-manager={{ .Values.useCertManager }}
3742
- --whitelisted-users=system:serviceaccount:fleet-system:hub-agent-sa
3843
- --webhook-client-connection-type={{.Values.webhookClientConnectionType}}
3944
- --v={{ .Values.logVerbosity }}
@@ -82,6 +87,22 @@ spec:
8287
fieldPath: metadata.namespace
8388
resources:
8489
{{- toYaml .Values.resources | nindent 12 }}
90+
{{- if .Values.useCertManager }}
91+
volumeMounts:
92+
- name: webhook-cert
93+
# This path must match FleetWebhookCertDir in pkg/webhook/webhook.go
94+
mountPath: /tmp/k8s-webhook-server/serving-certs
95+
readOnly: true
96+
{{- end }}
97+
{{- if .Values.useCertManager }}
98+
volumes:
99+
- name: webhook-cert
100+
secret:
101+
secretName: {{ .Values.webhookCertSecretName }}
102+
# defaultMode 0444 (read for all) allows the container process to read the certs
103+
# regardless of the user/group it runs as
104+
defaultMode: 0444
105+
{{- end }}
85106
{{- with .Values.affinity }}
86107
affinity:
87108
{{- toYaml . | nindent 8 }}

charts/hub-agent/values.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,20 @@ webhookServiceName: fleetwebhook
2626
enableGuardRail: true
2727
webhookClientConnectionType: service
2828
enableWorkload: false
29+
# useCertManager enables cert-manager for webhook certificate management
30+
# When enabled, cert-manager must be installed as a prerequisite (it is not installed automatically by this chart)
31+
# and a Certificate resource will be created
32+
useCertManager: false
33+
# webhookCertSecretName is ONLY used when useCertManager=true
34+
# It specifies the name of the Secret where cert-manager stores the certificate
35+
# webhookCertSecretName: fleet-webhook-server-cert
36+
2937
forceDeleteWaitTime: 15m0s
3038
clusterUnhealthyThreshold: 3m0s
3139
resourceSnapshotCreationMinimumInterval: 30s
3240
resourceChangesCollectionDuration: 15s
3341

34-
namespace:
35-
fleet-system
42+
namespace: fleet-system
3643

3744
resources:
3845
limits:

cmd/hubagent/main.go

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ import (
2020
"flag"
2121
"fmt"
2222
"math"
23+
"net/http"
2324
"os"
24-
"strings"
2525
"sync"
2626

2727
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
@@ -66,8 +66,7 @@ var (
6666
)
6767

6868
const (
69-
FleetWebhookCertDir = "/tmp/k8s-webhook-server/serving-certs"
70-
FleetWebhookPort = 9443
69+
FleetWebhookPort = 9443
7170
)
7271

7372
func init() {
@@ -122,7 +121,7 @@ func main() {
122121
},
123122
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
124123
Port: FleetWebhookPort,
125-
CertDir: FleetWebhookCertDir,
124+
CertDir: webhook.FleetWebhookCertDir,
126125
}),
127126
}
128127
if opts.EnablePprof {
@@ -158,12 +157,31 @@ func main() {
158157
}
159158

160159
if opts.EnableWebhook {
161-
whiteListedUsers := strings.Split(opts.WhiteListedUsers, ",")
162-
if err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers,
163-
opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload, opts.NetworkingAgentsEnabled); err != nil {
160+
// Generate webhook configuration with certificates
161+
webhookConfig, err := webhook.NewWebhookConfigFromOptions(mgr, opts, FleetWebhookPort)
162+
if err != nil {
163+
klog.ErrorS(err, "unable to create webhook config")
164+
exitWithErrorFunc()
165+
}
166+
167+
// Setup webhooks with the manager
168+
if err := SetupWebhook(mgr, webhookConfig); err != nil {
164169
klog.ErrorS(err, "unable to set up webhook")
165170
exitWithErrorFunc()
166171
}
172+
173+
// When using cert-manager, add a readiness check to ensure CA bundles are injected before marking ready.
174+
// This prevents the pod from accepting traffic before cert-manager has populated the webhook CA bundles,
175+
// which would cause webhook calls to fail.
176+
if opts.UseCertManager {
177+
if err := mgr.AddReadyzCheck("cert-manager-ca-injection", func(req *http.Request) error {
178+
return webhookConfig.CheckCAInjection(req.Context())
179+
}); err != nil {
180+
klog.ErrorS(err, "unable to set up cert-manager CA injection readiness check")
181+
exitWithErrorFunc()
182+
}
183+
klog.V(2).InfoS("Added cert-manager CA injection readiness check")
184+
}
167185
}
168186

169187
ctx := ctrl.SetupSignalHandler()
@@ -213,20 +231,13 @@ func main() {
213231
wg.Wait()
214232
}
215233

216-
// SetupWebhook generates the webhook cert and then set up the webhook configurator.
217-
func SetupWebhook(mgr manager.Manager, webhookClientConnectionType options.WebhookClientConnectionType, webhookServiceName string,
218-
whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool, networkingAgentsEnabled bool) error {
219-
// Generate self-signed key and crt files in FleetWebhookCertDir for the webhook server to start.
220-
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, FleetWebhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload)
221-
if err != nil {
222-
klog.ErrorS(err, "fail to generate WebhookConfig")
223-
return err
224-
}
225-
if err = mgr.Add(w); err != nil {
234+
// SetupWebhook registers the webhook config and webhook handlers with the manager.
235+
func SetupWebhook(mgr manager.Manager, webhookConfig *webhook.Config) error {
236+
if err := mgr.Add(webhookConfig); err != nil {
226237
klog.ErrorS(err, "unable to add WebhookConfig")
227238
return err
228239
}
229-
if err = webhook.AddToManager(mgr, whiteListedUsers, denyModifyMemberClusterLabels, networkingAgentsEnabled); err != nil {
240+
if err := webhook.AddToManager(mgr, webhookConfig); err != nil {
230241
klog.ErrorS(err, "unable to register webhooks to the manager")
231242
return err
232243
}

cmd/hubagent/options/options.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,9 @@ type Options struct {
110110
// EnableWorkload enables workload resources (pods and replicasets) to be created in the hub cluster.
111111
// When set to true, the pod and replicaset validating webhooks are disabled.
112112
EnableWorkload bool
113+
// UseCertManager indicates whether to use cert-manager for webhook certificate management.
114+
// When enabled, webhook certificates are managed by cert-manager instead of self-signed generation.
115+
UseCertManager bool
113116
// ResourceSnapshotCreationMinimumInterval is the minimum interval at which resource snapshots could be created.
114117
// Whether the resource snapshot is created or not depends on the both ResourceSnapshotCreationMinimumInterval and ResourceChangesCollectionDuration.
115118
ResourceSnapshotCreationMinimumInterval time.Duration
@@ -187,6 +190,7 @@ func (o *Options) AddFlags(flags *flag.FlagSet) {
187190
flags.IntVar(&o.PprofPort, "pprof-port", 6065, "The port for pprof profiling.")
188191
flags.BoolVar(&o.DenyModifyMemberClusterLabels, "deny-modify-member-cluster-labels", false, "If set, users not in the system:masters cannot modify member cluster labels.")
189192
flags.BoolVar(&o.EnableWorkload, "enable-workload", false, "If set, workloads (pods and replicasets) can be created in the hub cluster. This disables the pod and replicaset validating webhooks.")
193+
flags.BoolVar(&o.UseCertManager, "use-cert-manager", false, "If set, cert-manager will be used for webhook certificate management instead of self-signed certificates.")
190194
flags.DurationVar(&o.ResourceSnapshotCreationMinimumInterval, "resource-snapshot-creation-minimum-interval", 30*time.Second, "The minimum interval at which resource snapshots could be created.")
191195
flags.DurationVar(&o.ResourceChangesCollectionDuration, "resource-changes-collection-duration", 15*time.Second,
192196
"The duration for collecting resource changes into one snapshot. The default is 15 seconds, which means that the controller will collect resource changes for 15 seconds before creating a resource snapshot.")

cmd/hubagent/options/validation.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ func (o *Options) Validate() field.ErrorList {
5252
errs = append(errs, field.Invalid(newPath.Child("WebhookServiceName"), o.WebhookServiceName, "Webhook service name is required when webhook is enabled"))
5353
}
5454

55+
if o.UseCertManager && !o.EnableWorkload {
56+
errs = append(errs, field.Invalid(newPath.Child("UseCertManager"), o.UseCertManager, "UseCertManager requires EnableWorkload to be true (when EnableWorkload is false, a validating webhook blocks pod creation except for certain system pods; cert-manager controller pods must be allowed to run in the hub cluster)"))
57+
}
58+
5559
connectionType := o.WebhookClientConnectionType
5660
if _, err := parseWebhookClientConnectionString(connectionType); err != nil {
5761
errs = append(errs, field.Invalid(newPath.Child("WebhookClientConnectionType"), o.WebhookClientConnectionType, err.Error()))

0 commit comments

Comments
 (0)