You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(networking): clusterDomain is pinned on tenants, drop stale 0.7.0 references (#534)
## What changed
The `Cluster DNS domain on tenants` section is rewritten. The previous
text described the platform-agnostic auto-detect path as the default —
that turned out to be the actual `/readyz` NXDOMAIN failure mode on
cozystack tenants, which is why cozystack PR #2596 pins
`controller.clusterDomain: cluster.local` in the tenant wrapper.
Reason for the pin: tenant kubelet injects the host platform's
clusterDomain (typically `cozy.local`) into pod `resolv.conf` as the
search domain, while Kamaji-managed tenant CoreDNS serves
`cluster.local` per `TenantControlPlane.networkProfile.clusterDomain`.
The auto-detect composed `...svc.cozy.local.`, tenant CoreDNS did not
serve it, proxy `/readyz` NXDOMAIN'd forever.
The override path
(`addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain`)
is now framed as the escape hatch for non-default tenant cluster-domains
rather than the recommended path.
Also drops stale references to `0.7.0` chart version and hardcoded
digests in the Air-gapped operators section. The Makefile and
`values.yaml` are the single source of truth; the mirror recipe points
operators at those files at the cozystack release they are mirroring
instead of citing a specific digest that drifts every chart bump.
## Companion PR
cozystack/cozystack#2596 (`fix/ouroboros-tenant-readyz`).
Copy file name to clipboardExpand all lines: content/en/docs/next/networking/hairpin-proxy-protocol.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,9 +63,9 @@ cozystack ships its own thin wrapper around the upstream `coredns/helm-charts` p
63
63
64
64
### Cluster DNS domain on tenants
65
65
66
-
The ouroboros chart 0.7.0+ is platform-agnostic by default. cozystack leaves `controller.clusterDomain` unset in the tenant addon wrapper, which tells the chart to emit `--proxy-service-name` + `--proxy-service-namespace` instead of a chart-baked `--proxy-fqdn`. The controller binary then composes the rewrite-target FQDN at startup as `<service>.<namespace>.svc.<cluster-domain>.` using the cluster-domain auto-detected from `/etc/resolv.conf` inside the pod — whatever the tenant's kube-dns actually serves (`cluster.local` on kubeadm-bootstrapped tenants, `cozy.local` on tenants with a non-default `kubelet --cluster-domain`, `k8s.example.com` on federations) flows through transparently.
66
+
The ouroboros chart 0.7.0+ supports two `clusterDomain` resolution paths: an explicit pin via `controller.clusterDomain`, or runtime auto-detect from `/etc/resolv.conf`. cozystack pins the tenant addon wrapper to `controller.clusterDomain: cluster.local` because the auto-detect path returns the wrong value on cozystack tenants. Tenant kubelet injects the host platform's clusterDomain (typically `cozy.local`) into pod `resolv.conf` as the search domain, while Kamaji-managed tenant CoreDNS serves the tenant's own cluster domain (`cluster.local` per `TenantControlPlane.networkProfile.clusterDomain`). The auto-detect would compose `<service>.<namespace>.svc.cozy.local.`, which tenant CoreDNS does not serve, and the proxy `/readyz` would NXDOMAIN forever. The pin matches what tenant CoreDNS actually serves, and the controller emits `--proxy-fqdn=<service>.<namespace>.svc.cluster.local.`.
67
67
68
-
Tenant operators that prefer to bake the FQDN at render time can still override via `addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain` in the `Kubernetes` CR — the chart then takes the explicit-override path and emits `--proxy-fqdn=<service>.<namespace>.svc.<cluster-domain>.`. The runtime auto-detect becomes a sanity-check WARN log in that path, comparing the operator-supplied value against `/etc/resolv.conf`. Setting `_cluster.cluster-domain` (the cozystack platform's host-side convention, often `cozy.local`) into the wrapper would be wrong: that value reflects the host platform's clusterDomain, not the tenant's actual kubelet `--cluster-domain`, and forcing it through would yield NXDOMAIN on tenants whose kubelet uses anything else.
68
+
Tenant operators on a non-default tenant cluster-domain (federations, custom Kamaji `TenantControlPlane.networkProfile.clusterDomain`, etc.) can override via `addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain` in the `Kubernetes` CR. The chart honours the override and emits `--proxy-fqdn=<service>.<namespace>.svc.<cluster-domain>.`. Setting `_cluster.cluster-domain` (the cozystack platform's host-side convention, often `cozy.local`) would be wrong here: that value reflects the host platform's clusterDomain, which is not what tenant CoreDNS serves.
69
69
70
70
## Enabling per tenant
71
71
@@ -116,7 +116,7 @@ Disabling has different shapes on the two layers, by design. The host path has a
116
116
- ingress-nginx side: the next reconcile re-emits `cozystack.ingress-application` without `use-proxy-protocol` / `real-ip-header` / `enable-real-ip` (cozystack deliberately does NOT set `use-forwarded-headers` or `compute-full-forwarded-for` on the host: those keys would let any upstream proxy spoof `X-Forwarded-For` without a paired `proxy-real-ip-cidr`). ingress-nginx stops accepting PROXY-protocol headers. **If the upstream L4 LB is still prepending PROXY frames at that moment, every external request to ingress-nginx breaks until the LB is also reconfigured.** Flip the LB OFF first, then flip this flag.
117
117
- ouroboros side: stops emitting the `cozystack.ouroboros` Package CR, but every Package CR carries `helm.sh/resource-policy: keep`. Helm leaves the existing Package on the cluster, ouroboros stays installed, and the live Corefile rewrite block keeps pointing at the still-running `ouroboros-proxy` Service. The flag flip alone does **not** uninstall ouroboros.
118
118
119
-
The platform refuses to render the bare flag flip when a `cozystack.ouroboros` Package CR is already on the cluster — `helm template` / `helm upgrade` fails fast with an error that points at `kubectl delete package.cozystack.io cozystack.ouroboros` (which triggers helm uninstall and the chart's pre-delete cleanup hook) and the acknowledgement field. The chart shipped at `0.7.0` carries a vendored pre-delete hook (`charts/ouroboros/templates/coredns-cleanup-hook.yaml`) that quiesces the controller and `sed`-strips the `# === BEGIN ouroboros … END ouroboros ===` block from `kube-system/coredns` automatically when helm actually uninstalls the chart — operators do **not** need to run the manual `sed` recipe in the normal disable path. The full host disable sequence is:
119
+
The platform refuses to render the bare flag flip when a `cozystack.ouroboros` Package CR is already on the cluster — `helm template` / `helm upgrade` fails fast with an error that points at `kubectl delete package.cozystack.io cozystack.ouroboros` (which triggers helm uninstall and the chart's pre-delete cleanup hook) and the acknowledgement field. The vendored chart carries a pre-delete hook (`charts/ouroboros/templates/coredns-cleanup-hook.yaml`) that quiesces the controller and `sed`-strips the `# === BEGIN ouroboros … END ouroboros ===` block from `kube-system/coredns` automatically when helm actually uninstalls the chart — operators do **not** need to run the manual `sed` recipe in the normal disable path. The full host disable sequence is:
120
120
121
121
1. Flip the upstream LB off PROXY-protocol injection (external traffic precondition).
122
122
2. Remove the Package CR with `kubectl delete package.cozystack.io cozystack.ouroboros` (or add it to `bundles.disabledPackages`). This triggers helm uninstall, which fires the chart's pre-delete hook and patches `kube-system/coredns` automatically.
@@ -126,7 +126,7 @@ If the operator has reason to flip `publishing.proxyProtocol: false` BEFORE dele
126
126
127
127
The host cleanup recipe below is the manual fallback for the rare case where the chart's pre-delete hook fails to land (controller pod stuck in CrashLoop blocking the quiesce step, ConfigMap RBAC drift, helm uninstall interrupted before the hook ran). Reach for it only when the automatic path failed and the BEGIN/END block is still in `kube-system/coredns` after the package was deleted.
128
128
129
-
**Known hole**: an operator who deletes the Package CR before flipping the flag bypasses the guard entirely (the `lookup` returns nil, the platform render proceeds). In `0.7.0+` that path is mostly safe — `kubectl delete package` triggers helm uninstall and the chart's pre-delete hook patches the Corefile on its way out. The hole is the rare case where the hook itself fails (controller stuck, RBAC drift, hook timeout) and the operator never noticed. The acknowledgement gate is defence-in-depth, not an airtight lock.
129
+
**Known hole**: an operator who deletes the Package CR before flipping the flag bypasses the guard entirely (the `lookup` returns nil, the platform render proceeds). That path is mostly safe — `kubectl delete package` triggers helm uninstall and the chart's pre-delete hook patches the Corefile on its way out. The hole is the rare case where the hook itself fails (controller stuck, RBAC drift, hook timeout) and the operator never noticed. The acknowledgement gate is defence-in-depth, not an airtight lock.
130
130
131
131
**Tenant scope.** Flipping `addons.ouroboros.enabled` from `true` to `false` stops the HelmRelease from rendering and Flux uninstalls the chart on the next tenant-side reconcile. The `helm.sh/resource-policy: keep` annotation referenced in the host scope above lives on the platform Package CR — it is **not** in play on the tenant side, so disabling there really does delete the workload, not just stop emitting it. helm uninstall fires the same chart pre-delete hook on the tenant: it nulls the `ouroboros.override` key in `kube-system/coredns-custom` automatically before the controller pod goes away. The full tenant disable sequence is one step:
The chart and image are pulled directly from the upstream `lexfrei/ouroboros` registry — they are not mirrored under `ghcr.io/cozystack/*`. Air-gapped operators have to mirror two additional locations:
174
174
175
-
- `oci://ghcr.io/lexfrei/charts/ouroboros:0.7.0`(the chart, digest-pinned in `packages/system/ouroboros/Makefile` as `OUROBOROS_CHART_DIGEST=sha256:f28bfac9fd7070b1f3357983ff5aad28c16115e7de1cfc16ee568a3b4cfc9d7e`);
176
-
- `ghcr.io/lexfrei/ouroboros:0.7.0@sha256:478665e05dd0ffc4c1f7764320ea24b52251781884ddfd668d93a03e39d9094c`(the image, digest-pinned in `packages/system/ouroboros/values.yaml` — feed this exact reference to `regsync` / `crane copy` / `skopeo copy`).
175
+
- `oci://ghcr.io/lexfrei/charts/ouroboros:<version>`(the chart, digest-pinned in `packages/system/ouroboros/Makefile` as `OUROBOROS_CHART_DIGEST=sha256:…` — read the exact `OUROBOROS_CHART_VERSION` and `OUROBOROS_CHART_DIGEST` values from the Makefile at the cozystack release you are mirroring);
176
+
- `ghcr.io/lexfrei/ouroboros:<version>@sha256:…`(the image, digest-pinned in `packages/system/ouroboros/values.yaml` under `image.tag` — feed this exact reference to `regsync` / `crane copy` / `skopeo copy`).
177
177
178
178
The cozystack image reference includes the `@sha256:…` digest above. Mirror tooling has to either preserve that digest end-to-end (the standard behaviour of `regsync`, `crane copy`, `skopeo copy`) or drop the `@sha256:…` pin from `values.yaml` post-mirror — otherwise the kubelet pull resolves against the upstream digest, fails to find it under the mirror's tags, and lands in `ErrImagePull`.
0 commit comments