Skip to content

Commit e65e3ed

Browse files
committed
docs: add ztvp-certificates scenario documentation
Covers architecture, extraction phases, platform-specific handling (BareMetal/VSphere proxy CA, custom enterprise CAs, image pull trust), ACM Policy distribution, and automatic rollout strategies. Signed-off-by: Min Zhang <minzhang@redhat.com>
1 parent 12243a0 commit e65e3ed

1 file changed

Lines changed: 325 additions & 0 deletions

File tree

docs/ztvp-certificates.md

Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
# ZTVP Certificates
2+
3+
The `ztvp-certificates` chart manages CA certificate extraction, validation,
4+
bundling, and distribution across the Zero Trust Validated Pattern. It runs as
5+
an ArgoCD-managed application in the `openshift-config` namespace at sync-wave
6+
**21**, ensuring certificates are available before any workload that needs TLS
7+
verification.
8+
9+
## Architecture
10+
11+
```text
12+
IngressControllers Service CA Cluster trusted-ca-bundle
13+
(openshift-ingress) (openshift-config) (openshift-config-managed)
14+
| | |
15+
+-----------+-----------+--------------------------+
16+
|
17+
extract-certificates.sh <-- runs as Job (initial) + CronJob (daily)
18+
|
19+
validates & combines
20+
|
21+
v
22+
ConfigMap: ztvp-trusted-ca
23+
(openshift-config)
24+
|
25+
+-----------+-----------+-----------------------+
26+
| | |
27+
ACM Policy distributes proxyCA patches imagePullTrust
28+
to target namespaces proxy/cluster patches image.config
29+
(e.g. qtodo) (all platforms) (when enabled)
30+
```
31+
32+
### Kubernetes Resources
33+
34+
| Resource | Purpose |
35+
|---|---|
36+
| **ServiceAccount / RBAC** | Grants the extraction Job read access to secrets, configmaps, ingresscontrollers, and proxy across namespaces |
37+
| **ConfigMap (script)** | Holds the templated `extract-certificates.sh` script |
38+
| **Job (initial)** | Runs once at first sync (sync-wave 23, `Prune=false`) to populate the CA bundle |
39+
| **CronJob** | Runs on schedule (default daily at 02:00) for automatic rotation |
40+
| **ACM Policy + Placement** | Distributes the `ztvp-trusted-ca` ConfigMap into target namespaces via ACM governance |
41+
| **ManagedClusterSetBinding** | Binds the `default` ManagedClusterSet in `openshift-config` so the Placement can target `local-cluster` |
42+
43+
## Extraction Phases
44+
45+
The extraction script runs through a deterministic sequence of phases. Each
46+
phase is independently gated by values, so the script adapts to the active
47+
configuration.
48+
49+
| Phase | Gate | What It Does |
50+
|---|---|---|
51+
| 1 -- Custom CA | `customCA.secretRef.enabled` | Reads a user-supplied secret and writes `custom-ca.crt` |
52+
| 2 -- Ingress CA | `autoDetect` | Loops over every `IngressController`, extracts `tls.crt` from the referenced or default router secret |
53+
| 3 -- Service CA | `autoDetect` | Reads `openshift-service-ca.crt` ConfigMap |
54+
| 4 -- Cluster CA Bundle | `autoDetect` | Reads `trusted-ca-bundle` from `openshift-config-managed` (present when a corporate proxy injects CAs) |
55+
| 5 -- Additional Certs | `customCA.additionalCertificates[]` | Reads each additional secret and writes a `.crt` file |
56+
| 6 -- Validation | `validation.enabled` | Checks minimum size and `openssl x509` parse for every `.crt` |
57+
| 7 -- Combine | always | Concatenates all `.crt` files into `tls-ca-bundle.pem`; fails if bundle < 100 bytes |
58+
| 8 -- ConfigMap | always | `oc apply` the `ztvp-trusted-ca` ConfigMap with annotations recording extraction metadata |
59+
| 8.5 -- Proxy CA | `proxyCA.enabled` | Creates a separate ConfigMap with ingress + service CAs only |
60+
| 8.6 -- Proxy Patch | `proxyCA.enabled` | Patches `proxy/cluster` to set `trustedCA` (only if not already set to another value) |
61+
| 9 -- Image Pull Trust | `imagePullTrust.enabled` | Creates a ConfigMap keyed by registry hostname and patches `image.config.openshift.io/cluster` |
62+
| 10 -- Rollout | `rollout.enabled` | Restarts Deployments/StatefulSets that consume the certificate bundle |
63+
64+
## Scenario Handling
65+
66+
### Scenario 1: Cloud Cluster with Public Certificates (Default)
67+
68+
Applies to AWS, Azure, GCP, and any cluster whose ingress uses certificates
69+
signed by a public CA.
70+
71+
**Active settings:**
72+
73+
* `autoDetect: true`
74+
* `proxyCA.enabled: true` (default -- ensures ACS Central and other workloads
75+
that verify TLS on routes can trust the ingress CA without per-pod volume mounts)
76+
* `imagePullTrust.enabled: false`
77+
78+
**What happens:**
79+
80+
1. The Job auto-detects the ingress CA from each `IngressController`'s router
81+
secret in `openshift-ingress`.
82+
2. The service CA is read from `openshift-service-ca.crt`.
83+
3. If a cluster-wide proxy bundle exists, it is included.
84+
4. All certificates are combined into `ztvp-trusted-ca` and distributed via
85+
ACM Policy to target namespaces.
86+
5. A proxy CA ConfigMap (`ztvp-proxy-ca`) is created with ingress + service
87+
CAs and `proxy/cluster` is patched so the Cluster Network Operator injects
88+
these CAs into all workloads automatically.
89+
90+
No platform override file is needed. The chart's default `values.yaml` handles
91+
this scenario out of the box.
92+
93+
### Scenario 2: Bare Metal with Self-Signed Ingress
94+
95+
Bare metal clusters typically use self-signed certificates for the default
96+
ingress. Since `proxyCA` is enabled by default (see Scenario 1), the ingress
97+
CA is automatically injected cluster-wide. Workloads that verify TLS on routes
98+
(e.g., ACS Central connecting to Keycloak) work without extra configuration.
99+
100+
**Platform override** (`overrides/values-ztvp-certificates-BareMetal.yaml`):
101+
102+
```yaml
103+
proxyCA:
104+
enabled: true
105+
```
106+
107+
> **Note:** This override is now redundant because the chart default is
108+
> `proxyCA.enabled: true`. It is retained for clarity and backward
109+
> compatibility with older chart versions.
110+
111+
**Behavior is identical to Scenario 1** -- Phases 8.5 and 8.6 run by default:
112+
113+
1. Phase 8.5 builds a proxy-specific bundle containing only the ingress and
114+
service CAs (the Cluster Network Operator merges these with system CAs).
115+
2. Phase 8.6 patches `proxy/cluster` to set `spec.trustedCA.name` to
116+
`ztvp-proxy-ca`.
117+
3. The CNO propagates the merged bundle to every node, making the ingress CA
118+
trusted system-wide for all pods without explicit volume mounts.
119+
120+
### Scenario 3: vSphere with Self-Signed Ingress
121+
122+
Identical behavior to Bare Metal. vSphere clusters also typically use
123+
self-signed ingress certificates.
124+
125+
**Platform override** (`overrides/values-ztvp-certificates-VSphere.yaml`):
126+
127+
```yaml
128+
proxyCA:
129+
enabled: true
130+
```
131+
132+
> **Note:** This override is also redundant; the chart default already enables
133+
> `proxyCA`.
134+
135+
### Scenario 4: Enterprise Custom CA
136+
137+
When the organization uses a private PKI (e.g., a corporate root CA that
138+
signed the cluster's ingress certificate), the administrator creates a
139+
Kubernetes secret with the CA and enables `customCA.secretRef`.
140+
141+
**Setup:**
142+
143+
```bash
144+
oc create secret generic custom-ca-bundle \
145+
--from-file=ca.crt=/path/to/corporate-root-ca.crt \
146+
-n openshift-config
147+
```
148+
149+
**values-hub.yaml overrides:**
150+
151+
```yaml
152+
- name: customCA.secretRef.enabled
153+
value: "true"
154+
- name: customCA.secretRef.name
155+
value: custom-ca-bundle
156+
- name: customCA.secretRef.namespace
157+
value: openshift-config
158+
```
159+
160+
**What happens:**
161+
162+
1. Phase 1 extracts the custom CA from the referenced secret.
163+
2. Auto-detect (phases 2-4) still runs, so ingress and service CAs are
164+
included alongside the custom CA.
165+
3. The combined bundle contains both the custom CA and the auto-detected
166+
certificates.
167+
168+
### Scenario 5: Multiple Additional CAs
169+
170+
When several external CAs are needed (e.g., corporate root CA, a partner CA,
171+
and an intermediate CA), use `additionalCertificates` via the
172+
`extraValueFiles` mechanism.
173+
174+
**Configuration** (`overrides/values-ztvp-certificates.yaml`):
175+
176+
```yaml
177+
customCA:
178+
additionalCertificates:
179+
- name: corporate-root-ca
180+
secretRef:
181+
name: corporate-root-ca
182+
namespace: openshift-config
183+
key: ca.crt
184+
- name: partner-ca
185+
secretRef:
186+
name: partner-ca
187+
namespace: openshift-config
188+
key: ca.crt
189+
```
190+
191+
**What happens:**
192+
193+
1. Phase 5 iterates over each entry and extracts the certificate from its
194+
secret. Missing secrets produce a warning but do not fail the job.
195+
2. All additional certificates are combined with auto-detected and custom CAs
196+
in Phase 7.
197+
198+
### Scenario 6: Image Pull Trust for Built-In Registry
199+
200+
When an image registry (e.g., Quay or the embedded OpenShift registry) is
201+
exposed behind the cluster ingress with a self-signed or internal CA, kubelet
202+
image pulls fail with `x509: certificate signed by unknown authority`. The
203+
`imagePullTrust` feature solves this at the node level.
204+
205+
**values-hub.yaml overrides:**
206+
207+
```yaml
208+
- name: imagePullTrust.enabled
209+
value: "true"
210+
- name: imagePullTrust.registries[0]
211+
value: quay-registry-quay-quay-enterprise.apps.example.com
212+
```
213+
214+
**What happens:**
215+
216+
1. Phase 9 combines all extracted ingress CAs into a single PEM.
217+
2. A ConfigMap (`ztvp-registry-cas`) is created in `openshift-config` with
218+
each registry hostname as a key and the ingress CA PEM as the value.
219+
3. `image.config.openshift.io/cluster` is patched to set
220+
`additionalTrustedCA.name` to that ConfigMap.
221+
4. The Machine Config Operator rolls the trust configuration out to all nodes.
222+
223+
### Scenario 7: Custom Source Locations
224+
225+
In non-standard environments where the ingress CA or service CA are stored in
226+
different locations, `customSource` overrides the default auto-detection
227+
targets.
228+
229+
```yaml
230+
customSource:
231+
ingressCA:
232+
secretName: my-ingress-ca
233+
secretNamespace: my-namespace
234+
secretKey: tls.crt
235+
serviceCA:
236+
configMapName: my-service-ca
237+
configMapNamespace: my-namespace
238+
configMapKey: service-ca.crt
239+
```
240+
241+
Auto-detection will read from the specified locations instead of the standard
242+
OpenShift defaults.
243+
244+
## Distribution
245+
246+
Certificate distribution uses **ACM Governance Policies** to replicate the
247+
`ztvp-trusted-ca` ConfigMap from `openshift-config` into each target
248+
namespace.
249+
250+
```text
251+
openshift-config/ztvp-trusted-ca ---ACM Policy---> qtodo/ztvp-trusted-ca
252+
rhtpa/ztvp-trusted-ca
253+
...
254+
```
255+
256+
The policy uses `fromConfigMap` hub templates so that the ConfigMap data is
257+
always sourced from the hub cluster's copy. Target namespaces are configured
258+
via `distribution.targetNamespaces`.
259+
260+
**Requirements:**
261+
262+
* ACM (Advanced Cluster Management) must be installed
263+
* A `ManagedClusterSetBinding` for the `default` cluster set is created
264+
automatically by the chart
265+
* The `Placement` targets clusters with `local-cluster: "true"`
266+
267+
## Automatic Rollout
268+
269+
When certificates are updated, consuming workloads need to pick up the new
270+
bundle. The chart supports three rollout strategies:
271+
272+
| Strategy | Behavior |
273+
|---|---|
274+
| `labeled` (default) | Restarts Deployments/StatefulSets matching `ztvp.io/uses-certificates: "true"` in distribution target namespaces |
275+
| `all` | Restarts all Deployments/StatefulSets in target namespaces |
276+
| `specific` | Restarts only the named resources listed in `rollout.targets` |
277+
278+
To opt a workload into automatic restart, add the label:
279+
280+
```yaml
281+
metadata:
282+
labels:
283+
ztvp.io/uses-certificates: "true"
284+
```
285+
286+
## Sync Wave Ordering
287+
288+
The chart's resources are ordered within the ArgoCD sync:
289+
290+
| Wave | Resources |
291+
|---|---|
292+
| 22 | ServiceAccount, RBAC (Role, RoleBinding, ClusterRole, ClusterRoleBinding) |
293+
| 23 | Initial Job, CronJob, ConfigMap (script) |
294+
| 25 | ManagedClusterSetBinding |
295+
| 26 | ACM Policy, PlacementBinding, Placement |
296+
297+
The application itself sits at sync-wave **21** in `values-hub.yaml`, ensuring
298+
it deploys before operators and workloads that depend on the CA bundle.
299+
300+
## Configuration Reference
301+
302+
### Top-Level Values
303+
304+
| Value | Default | Description |
305+
|---|---|---|
306+
| `enabled` | `true` | Master toggle for all chart resources |
307+
| `autoDetect` | `true` | Auto-detect ingress, service, and cluster CAs from OpenShift |
308+
| `configMapName` | `ztvp-trusted-ca` | Name of the output ConfigMap |
309+
| `proxyCA.enabled` | `true` | Create a proxy CA ConfigMap and patch `proxy/cluster` |
310+
| `imagePullTrust.enabled` | `false` | Configure node-level registry trust via `image.config` |
311+
| `rollout.enabled` | `true` | Restart consuming workloads after certificate updates |
312+
| `rollout.strategy` | `labeled` | One of: `labeled`, `all`, `specific` |
313+
| `distribution.enabled` | `true` | Distribute CA bundle via ACM Policy |
314+
| `distribution.method` | `acm-policy` | Distribution mechanism |
315+
| `cronJob.schedule` | `0 2 * * *` | Cron schedule for automatic re-extraction |
316+
| `validation.enabled` | `true` | Validate certificate size and format |
317+
| `debug.verbose` | `false` | Enable `set -x` in the extraction script |
318+
319+
### Platform Override Files
320+
321+
| File | When Applied | Effect |
322+
|---|---|---|
323+
| `overrides/values-ztvp-certificates.yaml` | Always | Additional CAs, rollout config |
324+
| `overrides/values-ztvp-certificates-BareMetal.yaml` | `clusterPlatform == BareMetal` | Confirms `proxyCA` (redundant; default is already `true`) |
325+
| `overrides/values-ztvp-certificates-VSphere.yaml` | `clusterPlatform == VSphere` | Confirms `proxyCA` (redundant; default is already `true`) |

0 commit comments

Comments
 (0)