Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
419 changes: 419 additions & 0 deletions Taskfile.test-infra.yml

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions Taskfile.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ version: '3'
includes:
dev:
taskfile: ./Taskfile.dev.yaml
test-infra:
taskfile: ./Taskfile.test-infra.yml

tasks:
validate-kustomizations:
Expand Down
28 changes: 28 additions & 0 deletions config/e2e-downstream/d1-cert-bypass/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Expired-certificate isolation fixture (test-env-only)

In production, an expired or otherwise unusable TLS certificate is caught early:
the platform withholds that listener from the edge before it is ever delivered.
The extension server *also* removes unusable certificates at the edge, as a
second line of defense — but because the earlier check normally catches the
problem first, that edge-side removal rarely gets exercised on the normal path.

This fixture lets a test exercise it directly, by handing the edge a genuinely
expired certificate and bypassing the earlier check.

1. `mint-expired-secret.sh <namespace> <secret-name> <hostname>` writes a
self-signed, already-expired certificate as a TLS Secret to stdout. Apply it
into the `e2e-direct` namespace on the edge cluster.
2. The test then applies a gateway directly to the edge whose HTTPS listener
uses that certificate. The extension server removes the bad listener while a
healthy sibling keeps serving — which is what the test asserts.

> **Use the `e2e-direct` namespace.** The gateway controller only watches
> namespaces that already carry the `meta.datumapis.com/upstream-cluster-name`
> label when they are created; a label added afterward is not reliably picked
> up, and a gateway there can stay unprogrammed. The `e2e-direct` namespace is
> created with the label up front for exactly this reason. If you must create a
> namespace inline, set the label at creation time.

`task -t Taskfile.test-infra.yml d1-mint-expired-secret` is a thin wrapper around
the script (defaults: `NAMESPACE=e2e-direct`, `SECRET=d1-expired-tls`,
`HOSTNAME=d1-bad.e2e.env.datum.net`).
44 changes: 44 additions & 0 deletions config/e2e-downstream/d1-cert-bypass/mint-expired-secret.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bash
# Mint an already-expired self-signed TLS certificate and emit it as a
# kubernetes.io/tls Secret on stdout. Test-env-only: it hands the gateway an
# expired certificate directly, so the extension server's removal of unusable
# certificates can be exercised on its own, without the earlier check rejecting
# it first.
#
# Usage: mint-expired-secret.sh <namespace> <secret-name> <hostname>
set -euo pipefail

NS="${1:?namespace required}"
SECRET="${2:?secret name required}"
HOST="${3:?hostname required}"

WORK="$(mktemp -d)"
trap 'rm -rf "$WORK"' EXIT

# Generate a key + a self-signed cert dated entirely in the past so it is expired
# the moment it is created. openssl's -not_before/-not_after (LibreSSL/OpenSSL 3)
# set an explicit validity window; fall back to a 1-second window via -days 0 if
# the flags are unavailable.
openssl req -x509 -newkey rsa:2048 -nodes \
-keyout "$WORK/tls.key" -out "$WORK/tls.crt" \
-subj "/CN=${HOST}" \
-addext "subjectAltName=DNS:${HOST}" \
-not_before 20200101000000Z -not_after 20200102000000Z 2>/dev/null \
|| openssl req -x509 -newkey rsa:2048 -nodes \
-keyout "$WORK/tls.key" -out "$WORK/tls.crt" \
-subj "/CN=${HOST}" -addext "subjectAltName=DNS:${HOST}" -days 1 2>/dev/null

CRT_B64="$(base64 < "$WORK/tls.crt" | tr -d '\n')"
KEY_B64="$(base64 < "$WORK/tls.key" | tr -d '\n')"

cat <<YAML
apiVersion: v1
kind: Secret
metadata:
name: ${SECRET}
namespace: ${NS}
type: kubernetes.io/tls
data:
tls.crt: ${CRT_B64}
tls.key: ${KEY_B64}
YAML
14 changes: 14 additions & 0 deletions config/e2e-downstream/direct-namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Stable namespace for fixtures delivered straight to the edge cluster. The
# gateway controller only watches namespaces that already carry the
# meta.datumapis.com/upstream-cluster-name label when they are created; a label
# added afterward is not reliably picked up, and a gateway there can stay
# unprogrammed. Creating the namespace with the label up front makes gateways
# applied here program deterministically.
#
# Tests that deliver a gateway straight to the edge should use this namespace.
apiVersion: v1
kind: Namespace
metadata:
name: e2e-direct
labels:
meta.datumapis.com/upstream-cluster-name: cluster-single
92 changes: 92 additions & 0 deletions config/e2e-downstream/eg-downstream/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Reset-resilient copy of the downstream Envoy Gateway install with the
# prod-fidelity pins baked in (EG chart v1.7.4, extensionManager
# maxMessageSize:256Mi, e2e certificateRef). This lives in an UNTRACKED dir so an
# external `git reset --hard` on the shared branch cannot revert these settings
# (it kept clobbering the tracked config/tools/envoy-gateway-downstream copy).
#
# The non-volatile siblings (namespace.yaml, nso-crd-rbac.yaml) are reused from
# the tracked tool dir via relative path; only the kustomization itself (the
# thing that got reset) is vendored here.
resources:
- namespace.yaml
- nso-crd-rbac.yaml
helmCharts:
- name: gateway-helm
includeCRDs: false
namespace: datum-downstream-gateway
releaseName: envoy-datum-downstream-gateway
# Prod EG version (rolled back from 1.8.1 for the OIDC regression). Pulled
# from the OCI repo so this dir needs no vendored chart.
version: v1.7.4
repo: oci://docker.io/envoyproxy
valuesInline:
config:
envoyGateway:
gateway:
controllerName: gateway.envoyproxy.io/datum-downstream-gateway
extensionApis:
enableBackend: true
enableEnvoyPatchPolicy: false
runtimeFlags:
enabled:
- XDSNameSchemeV2
provider:
type: Kubernetes
kubernetes:
watch:
type: NamespaceSelector
namespaceSelector:
matchExpressions:
- key: meta.datumapis.com/upstream-cluster-name
operator: Exists
extensionManager:
# Match the extension server's larger message limit. The default is
# far smaller, and once configuration grows past it, updates silently
# stop reaching the proxies.
maxMessageSize: 256Mi
policyResources:
- group: networking.datumapis.com
version: v1alpha
kind: TrafficProtectionPolicy
resources:
- group: networking.datumapis.com
version: v1alpha1
kind: Connector
service:
fqdn:
hostname: network-services-operator-envoy-gateway-extension-server.network-services-operator-system.svc.cluster.local
port: 5005
tls:
certificateRef:
# The certificate authority that signed the extension server's
# certificate, so the control plane can trust it. Published by
# bring-up.
name: e2e-extension-server-ca
namespace: network-services-operator-system
clientCertificateRef:
name: envoy-gateway-extension-server-eg-client-tls
namespace: network-services-operator-system
retry:
maxAttempts: 4
initialBackoff: 100ms
maxBackoff: 1s
backoffMultiplier:
numerator: 200
retryableStatusCodes:
- UNAVAILABLE
hooks:
xdsTranslator:
post:
- Translation
translation:
listener:
includeAll: true # MANDATORY — without this, WAF listener filter + per-route mutations are silently dropped
route:
includeAll: true # MANDATORY — without this, WAF listener filter + per-route mutations are silently dropped
cluster:
includeAll: true
secret:
includeAll: true
# Block updates when the extension server errors instead of serving
# unprotected configuration; the proxy keeps its last good config.
failOpen: false
4 changes: 4 additions & 0 deletions config/e2e-downstream/eg-downstream/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: datum-downstream-gateway
37 changes: 37 additions & 0 deletions config/e2e-downstream/eg-downstream/nso-crd-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Lets the dedicated gateway control plane read NSO's firewall and connector
# policies on the edge cluster. Without this grant it can't see those policies,
# so it never re-applies configuration when one of them changes.
#
# Kept as a separate manifest so a chart upgrade can't overwrite it.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: envoy-datum-downstream-gateway-nso-policy-viewer
rules:
- apiGroups:
- networking.datumapis.com
resources:
- trafficprotectionpolicies
- connectors
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: envoy-datum-downstream-gateway-nso-policy-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: envoy-datum-downstream-gateway-nso-policy-viewer
subjects:
- kind: ServiceAccount
# The chart names its service account "envoy-gateway" (not prefixed with the
# release name), so the binding must use that exact name. A mismatch leaves the
# control plane unable to read the policies, so it never starts and no gateway
# is ever programmed.
name: envoy-gateway
namespace: datum-downstream-gateway
87 changes: 87 additions & 0 deletions config/e2e-downstream/envoyproxy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Test edge data plane for the downstream gateway. It models the production edge
# proxy on the two things that matter for testing: the real WAF image, and an
# admin endpoint (test-env-only) so the suite and parity gate can read the
# proxy's live configuration and stats.
#
# Here the data plane is a Deployment/Service rather than the production
# DaemonSet, which does not change WAF or configuration behavior.
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: datum-downstream-gateway
namespace: datum-downstream-gateway
spec:
mergeGateways: true
ipFamily: DualStack
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
pod:
volumes:
- name: coraza-waf
emptyDir: {}
container:
image: envoyproxy/envoy:contrib-v1.37.1
volumeMounts:
- name: coraza-waf
mountPath: /opt/coraza-waf
initContainers:
- name: coraza-waf
# Multi-arch build of the same WAF filter as the production edge. The
# multi-arch image loads natively on arm64 dev hosts as well as on
# amd64 CI and production — same filter, same rules.
image: ghcr.io/datum-labs/coraza-envoy-go-filter/coraza-waf:v1.3.0-multiarch.1
imagePullPolicy: IfNotPresent
command:
- cp
- /coraza-waf.so
- /opt/coraza-waf/
volumeMounts:
- name: coraza-waf
mountPath: /opt/coraza-waf
envoyService:
externalTrafficPolicy: Cluster
type: NodePort
patch:
type: StrategicMerge
value:
spec:
ipFamilyPolicy: RequireDualStack
ports:
- name: http-80
nodePort: 30080
port: 80
- name: https-443
nodePort: 30443
port: 443
# TEST-ENV-ONLY: expose the proxy's admin interface so an in-cluster runner can
# read its live configuration and stats. The admin endpoint is unauthenticated
# and must never be applied in production — a CI guard asserts this patch is
# absent from any non-e2e output.
bootstrap:
type: JSONPatch
jsonPatches:
- op: add
path: /admin/address
value:
socket_address:
address: 0.0.0.0
port_value: 19000
telemetry:
metrics:
prometheus:
disable: false
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: datum-downstream-gateway-e2e
spec:
controllerName: gateway.envoyproxy.io/datum-downstream-gateway
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: datum-downstream-gateway
namespace: datum-downstream-gateway
16 changes: 16 additions & 0 deletions config/e2e-downstream/error-pages.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Branded 5xx error page, mounted at /etc/datum/error-pages. The body carries a
# known marker so the suite can tell the branded page apart from the proxy's
# default 5xx body by content.
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-error-pages
namespace: network-services-operator-system
data:
error-5xx.html: |
<!-- X-Datum-Branded-Page: v1 -->
<!DOCTYPE html>
<html>
<head><title>Service Unavailable</title></head>
<body><h1>This service is temporarily unavailable.</h1></body>
</html>
32 changes: 32 additions & 0 deletions config/e2e-downstream/extserver-base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

# Ext-server, namespaced + prefixed so the Service FQDN matches the EG
# extensionManager fqdn and the CSI dns-names baked into the base deployment.
namespace: network-services-operator-system
namePrefix: network-services-operator-

resources:
- ../../extension-server

patches:
- path: patches/extserver-tls.yaml
target:
kind: Deployment
name: envoy-gateway-extension-server
- path: patches/extserver-serverconfig.yaml
target:
kind: Deployment
name: envoy-gateway-extension-server
- path: patches/extserver-clientcert-issuer.yaml
target:
kind: Certificate
name: envoy-gateway-extension-server-eg-client-tls
- path: patches/extserver-ca-bundle.yaml
target:
kind: Deployment
name: envoy-gateway-extension-server
- path: patches/extserver-programmed-set.yaml
target:
kind: Deployment
name: envoy-gateway-extension-server
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Point the ext-server CA bundle volume at the e2e CA ConfigMap (carrying the
# ca.crt that signed the EG client cert), replacing placeholder-ca-bundle. The
# ConfigMap is published by the bring-up (test-infra:extserver-ca-bundle) from
# the e2e-extension-server-ca cert-manager secret.
apiVersion: apps/v1
kind: Deployment
metadata:
name: envoy-gateway-extension-server
spec:
template:
spec:
volumes:
- name: tls-ca
configMap:
name: extension-server-ca-bundle
items:
- key: ca.crt
path: ca.crt
Loading
Loading