Skip to content

Commit 6ded60e

Browse files
authored
docs(networking): clusterDomain is pinned on tenants, drop stale 0.7.0 references (#534)
## What changed The `Cluster DNS domain on tenants` section is rewritten. The previous text described the platform-agnostic auto-detect path as the default — that turned out to be the actual `/readyz` NXDOMAIN failure mode on cozystack tenants, which is why cozystack PR #2596 pins `controller.clusterDomain: cluster.local` in the tenant wrapper. Reason for the pin: tenant kubelet injects the host platform's clusterDomain (typically `cozy.local`) into pod `resolv.conf` as the search domain, while Kamaji-managed tenant CoreDNS serves `cluster.local` per `TenantControlPlane.networkProfile.clusterDomain`. The auto-detect composed `...svc.cozy.local.`, tenant CoreDNS did not serve it, proxy `/readyz` NXDOMAIN'd forever. The override path (`addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain`) is now framed as the escape hatch for non-default tenant cluster-domains rather than the recommended path. Also drops stale references to `0.7.0` chart version and hardcoded digests in the Air-gapped operators section. The Makefile and `values.yaml` are the single source of truth; the mirror recipe points operators at those files at the cozystack release they are mirroring instead of citing a specific digest that drifts every chart bump. ## Companion PR cozystack/cozystack#2596 (`fix/ouroboros-tenant-readyz`).
2 parents 58b4a54 + af35532 commit 6ded60e

1 file changed

Lines changed: 6 additions & 6 deletions

File tree

content/en/docs/next/networking/hairpin-proxy-protocol.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,9 @@ cozystack ships its own thin wrapper around the upstream `coredns/helm-charts` p
6363

6464
### Cluster DNS domain on tenants
6565

66-
The ouroboros chart 0.7.0+ is platform-agnostic by default. cozystack leaves `controller.clusterDomain` unset in the tenant addon wrapper, which tells the chart to emit `--proxy-service-name` + `--proxy-service-namespace` instead of a chart-baked `--proxy-fqdn`. The controller binary then composes the rewrite-target FQDN at startup as `<service>.<namespace>.svc.<cluster-domain>.` using the cluster-domain auto-detected from `/etc/resolv.conf` inside the pod — whatever the tenant's kube-dns actually serves (`cluster.local` on kubeadm-bootstrapped tenants, `cozy.local` on tenants with a non-default `kubelet --cluster-domain`, `k8s.example.com` on federations) flows through transparently.
66+
The ouroboros chart 0.7.0+ supports two `clusterDomain` resolution paths: an explicit pin via `controller.clusterDomain`, or runtime auto-detect from `/etc/resolv.conf`. cozystack pins the tenant addon wrapper to `controller.clusterDomain: cluster.local` because the auto-detect path returns the wrong value on cozystack tenants. Tenant kubelet injects the host platform's clusterDomain (typically `cozy.local`) into pod `resolv.conf` as the search domain, while Kamaji-managed tenant CoreDNS serves the tenant's own cluster domain (`cluster.local` per `TenantControlPlane.networkProfile.clusterDomain`). The auto-detect would compose `<service>.<namespace>.svc.cozy.local.`, which tenant CoreDNS does not serve, and the proxy `/readyz` would NXDOMAIN forever. The pin matches what tenant CoreDNS actually serves, and the controller emits `--proxy-fqdn=<service>.<namespace>.svc.cluster.local.`.
6767

68-
Tenant operators that prefer to bake the FQDN at render time can still override via `addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain` in the `Kubernetes` CR — the chart then takes the explicit-override path and emits `--proxy-fqdn=<service>.<namespace>.svc.<cluster-domain>.`. The runtime auto-detect becomes a sanity-check WARN log in that path, comparing the operator-supplied value against `/etc/resolv.conf`. Setting `_cluster.cluster-domain` (the cozystack platform's host-side convention, often `cozy.local`) into the wrapper would be wrong: that value reflects the host platform's clusterDomain, not the tenant's actual kubelet `--cluster-domain`, and forcing it through would yield NXDOMAIN on tenants whose kubelet uses anything else.
68+
Tenant operators on a non-default tenant cluster-domain (federations, custom Kamaji `TenantControlPlane.networkProfile.clusterDomain`, etc.) can override via `addons.ouroboros.valuesOverride.ouroboros.controller.clusterDomain` in the `Kubernetes` CR. The chart honours the override and emits `--proxy-fqdn=<service>.<namespace>.svc.<cluster-domain>.`. Setting `_cluster.cluster-domain` (the cozystack platform's host-side convention, often `cozy.local`) would be wrong here: that value reflects the host platform's clusterDomain, which is not what tenant CoreDNS serves.
6969

7070
## Enabling per tenant
7171

@@ -116,7 +116,7 @@ Disabling has different shapes on the two layers, by design. The host path has a
116116
- ingress-nginx side: the next reconcile re-emits `cozystack.ingress-application` without `use-proxy-protocol` / `real-ip-header` / `enable-real-ip` (cozystack deliberately does NOT set `use-forwarded-headers` or `compute-full-forwarded-for` on the host: those keys would let any upstream proxy spoof `X-Forwarded-For` without a paired `proxy-real-ip-cidr`). ingress-nginx stops accepting PROXY-protocol headers. **If the upstream L4 LB is still prepending PROXY frames at that moment, every external request to ingress-nginx breaks until the LB is also reconfigured.** Flip the LB OFF first, then flip this flag.
117117
- ouroboros side: stops emitting the `cozystack.ouroboros` Package CR, but every Package CR carries `helm.sh/resource-policy: keep`. Helm leaves the existing Package on the cluster, ouroboros stays installed, and the live Corefile rewrite block keeps pointing at the still-running `ouroboros-proxy` Service. The flag flip alone does **not** uninstall ouroboros.
118118

119-
The platform refuses to render the bare flag flip when a `cozystack.ouroboros` Package CR is already on the cluster — `helm template` / `helm upgrade` fails fast with an error that points at `kubectl delete package.cozystack.io cozystack.ouroboros` (which triggers helm uninstall and the chart's pre-delete cleanup hook) and the acknowledgement field. The chart shipped at `0.7.0` carries a vendored pre-delete hook (`charts/ouroboros/templates/coredns-cleanup-hook.yaml`) that quiesces the controller and `sed`-strips the `# === BEGIN ouroboros … END ouroboros ===` block from `kube-system/coredns` automatically when helm actually uninstalls the chart — operators do **not** need to run the manual `sed` recipe in the normal disable path. The full host disable sequence is:
119+
The platform refuses to render the bare flag flip when a `cozystack.ouroboros` Package CR is already on the cluster — `helm template` / `helm upgrade` fails fast with an error that points at `kubectl delete package.cozystack.io cozystack.ouroboros` (which triggers helm uninstall and the chart's pre-delete cleanup hook) and the acknowledgement field. The vendored chart carries a pre-delete hook (`charts/ouroboros/templates/coredns-cleanup-hook.yaml`) that quiesces the controller and `sed`-strips the `# === BEGIN ouroboros … END ouroboros ===` block from `kube-system/coredns` automatically when helm actually uninstalls the chart — operators do **not** need to run the manual `sed` recipe in the normal disable path. The full host disable sequence is:
120120

121121
1. Flip the upstream LB off PROXY-protocol injection (external traffic precondition).
122122
2. Remove the Package CR with `kubectl delete package.cozystack.io cozystack.ouroboros` (or add it to `bundles.disabledPackages`). This triggers helm uninstall, which fires the chart's pre-delete hook and patches `kube-system/coredns` automatically.
@@ -126,7 +126,7 @@ If the operator has reason to flip `publishing.proxyProtocol: false` BEFORE dele
126126

127127
The host cleanup recipe below is the manual fallback for the rare case where the chart's pre-delete hook fails to land (controller pod stuck in CrashLoop blocking the quiesce step, ConfigMap RBAC drift, helm uninstall interrupted before the hook ran). Reach for it only when the automatic path failed and the BEGIN/END block is still in `kube-system/coredns` after the package was deleted.
128128

129-
**Known hole**: an operator who deletes the Package CR before flipping the flag bypasses the guard entirely (the `lookup` returns nil, the platform render proceeds). In `0.7.0+` that path is mostly safe — `kubectl delete package` triggers helm uninstall and the chart's pre-delete hook patches the Corefile on its way out. The hole is the rare case where the hook itself fails (controller stuck, RBAC drift, hook timeout) and the operator never noticed. The acknowledgement gate is defence-in-depth, not an airtight lock.
129+
**Known hole**: an operator who deletes the Package CR before flipping the flag bypasses the guard entirely (the `lookup` returns nil, the platform render proceeds). That path is mostly safe — `kubectl delete package` triggers helm uninstall and the chart's pre-delete hook patches the Corefile on its way out. The hole is the rare case where the hook itself fails (controller stuck, RBAC drift, hook timeout) and the operator never noticed. The acknowledgement gate is defence-in-depth, not an airtight lock.
130130

131131
**Tenant scope.** Flipping `addons.ouroboros.enabled` from `true` to `false` stops the HelmRelease from rendering and Flux uninstalls the chart on the next tenant-side reconcile. The `helm.sh/resource-policy: keep` annotation referenced in the host scope above lives on the platform Package CR — it is **not** in play on the tenant side, so disabling there really does delete the workload, not just stop emitting it. helm uninstall fires the same chart pre-delete hook on the tenant: it nulls the `ouroboros.override` key in `kube-system/coredns-custom` automatically before the controller pod goes away. The full tenant disable sequence is one step:
132132

@@ -172,8 +172,8 @@ kubectl --kubeconfig <tenant-admin-kubeconfig> --namespace kube-system patch \
172172

173173
The chart and image are pulled directly from the upstream `lexfrei/ouroboros` registry — they are not mirrored under `ghcr.io/cozystack/*`. Air-gapped operators have to mirror two additional locations:
174174

175-
- `oci://ghcr.io/lexfrei/charts/ouroboros:0.7.0` (the chart, digest-pinned in `packages/system/ouroboros/Makefile` as `OUROBOROS_CHART_DIGEST=sha256:f28bfac9fd7070b1f3357983ff5aad28c16115e7de1cfc16ee568a3b4cfc9d7e`);
176-
- `ghcr.io/lexfrei/ouroboros:0.7.0@sha256:478665e05dd0ffc4c1f7764320ea24b52251781884ddfd668d93a03e39d9094c` (the image, digest-pinned in `packages/system/ouroboros/values.yaml` — feed this exact reference to `regsync` / `crane copy` / `skopeo copy`).
175+
- `oci://ghcr.io/lexfrei/charts/ouroboros:<version>` (the chart, digest-pinned in `packages/system/ouroboros/Makefile` as `OUROBOROS_CHART_DIGEST=sha256:…` — read the exact `OUROBOROS_CHART_VERSION` and `OUROBOROS_CHART_DIGEST` values from the Makefile at the cozystack release you are mirroring);
176+
- `ghcr.io/lexfrei/ouroboros:<version>@sha256:` (the image, digest-pinned in `packages/system/ouroboros/values.yaml` under `image.tag` — feed this exact reference to `regsync` / `crane copy` / `skopeo copy`).
177177

178178
The cozystack image reference includes the `@sha256:…` digest above. Mirror tooling has to either preserve that digest end-to-end (the standard behaviour of `regsync`, `crane copy`, `skopeo copy`) or drop the `@sha256:…` pin from `values.yaml` post-mirror — otherwise the kubelet pull resolves against the upstream digest, fails to find it under the mirror's tags, and lands in `ErrImagePull`.
179179

0 commit comments

Comments
 (0)