Skip to content

CLO-13: add in optional public endpoints / TLS support#164

Open
pH14 wants to merge 6 commits into
MaterializeInc:mainfrom
pH14:public-tls-upstream
Open

CLO-13: add in optional public endpoints / TLS support#164
pH14 wants to merge 6 commits into
MaterializeInc:mainfrom
pH14:public-tls-upstream

Conversation

@pH14

@pH14 pH14 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Human preamble: I am not sure the best way to write / PR / test this change. What follows is a Claude generalization of the work in https://github.com/MaterializeInc/mz-context-graph/pull/2 on GCP to create stable public endpoints + certs for the console + balancer endpoints. I don't really know Terraform, nor have I touched golang in like 7 years 😅

The changes are best reviewed in commit order.

AI writeup ensues:


Add opt-in public TLS support across all three clouds

Adds end-to-end public TLS access to Materialize deployments behind an enable_public_tls = false flag. When enabled,
services get publicly-trusted certificates via cert-manager ACME with cloud-native DNS01 solvers, static IPs (GCP/Azure), and
DNS records.

Zero-diff when enable_public_tls = false (the default).

Architecture

All three clouds follow the same pattern:

  1. Cloud IAM grants cert-manager access to the DNS provider
  2. ACME ClusterIssuer issues public certificates via DNS01 challenge
  3. DNS records point public hostnames to the load balancers
  4. Dual issuer refs — external certs use ACME, internal certs use self-signed

DNS strategy differs by cloud:

  • GCP/Azure: Reserve static IPs, pin load balancers, create DNS A records
  • AWS: Route53 ALIAS records point directly to existing NLB DNS name (no static IPs needed)

Default ACME CAs:

  • GCP: Google Trust Services (pki.goog)
  • AWS/Azure: Let's Encrypt

All configurable via acme_server.

New modules

Cloud Module Purpose
Shared kubernetes/modules/acme-cluster-issuer Generic ACME ClusterIssuer with pluggable DNS solver
GCP gcp/modules/static_ips Regional google_compute_address for balancerd + console
GCP gcp/modules/dns_records Cloud DNS A records
GCP gcp/modules/cert_manager_wi Workload Identity for cert-manager → Cloud DNS
AWS aws/modules/route53_dns Route53 ALIAS records to NLB
AWS aws/modules/cert_manager_irsa IRSA role for cert-manager → Route53
Azure azure/modules/static_ips Standard/Static public IPs for balancerd + console
Azure azure/modules/dns_records Azure DNS A records
Azure azure/modules/cert_manager_identity Managed identity for cert-manager → Azure DNS

Modified modules

Module Change
kubernetes/modules/materialize-instance Configurable DNS names, internal_issuer_ref, system parameters
kubernetes/modules/cert-manager service_account_annotations, pod_labels for cloud IAM
gcp/modules/load_balancers Optional static IP on services
azure/modules/load_balancers Optional static IP on services
aws/modules/nlb Expose nlb_zone_id output for Route53 ALIAS

Tests

New testPublicTLS stage for all three clouds, gated on DNS zone env vars (skipped by default). Validates DNS resolution,
TLS certificate SANs, and HTTPS connectivity. Shared helpers in test/utils/helpers/connectivity.go.

Commit structure

  1. Cross-cloud Kubernetes foundation
  2. GCP modules
  3. AWS modules
  4. Azure modules
  5. Wire into cloud examples
  6. Test infrastructure

pH14 added 6 commits March 10, 2026 16:32
Extend the shared Kubernetes modules to support opt-in public TLS:

- materialize-instance: add internal_issuer_ref (separate issuer for
  internal certs), balancerd_dns_names/console_dns_names (configurable
  certificate SANs), and system_parameters (ConfigMap support)
- cert-manager: add service_account_annotations and pod_labels for
  cloud-native identity federation (GCP WI, AWS IRSA, Azure WI)
- acme-cluster-issuer: new generic module that creates an ACME
  ClusterIssuer with a pluggable DNS01 solver config

All changes are additive with defaults that preserve existing behavior.
New modules for GCP public TLS infrastructure:

- static_ips: reserves regional external IPs for balancerd and console
- dns_records: creates Cloud DNS A records pointing to static IPs
- cert_manager_wi: sets up Workload Identity so cert-manager can solve
  DNS01 challenges via Cloud DNS (GSA + roles/dns.admin + WI binding)
- load_balancers: accept optional static IP addresses (null = ephemeral,
  preserving current behavior)
New modules for AWS public TLS infrastructure:

- route53_dns: creates Route53 ALIAS A records pointing to the NLB
  (no static IPs needed since NLBs have stable DNS names)
- cert_manager_irsa: creates an IRSA role granting cert-manager
  Route53 access for DNS01 challenge solving
- nlb: add nlb_zone_id output (required for Route53 ALIAS records)
New modules:
- azure/modules/static_ips: reserve Standard/Static public IPs for
  balancerd and console services
- azure/modules/dns_records: create Azure DNS A records for public
  hostnames
- azure/modules/cert_manager_identity: managed identity with federated
  credential for cert-manager to access Azure DNS (DNS Zone Contributor)

Modified:
- azure/modules/load_balancers: accept optional static IP addresses
  for balancerd and console services (null = ephemeral, preserving
  current behavior)
Add opt-in public TLS support (enable_public_tls = false by default)
to all three cloud examples with zero-diff when disabled.

GCP (examples/simple):
- static_ips, cert_manager_wi, acme_cluster_issuer, dns_records modules
- cert-manager gets GCP Workload Identity SA annotation
- Default ACME CA: Google Trust Services (pki.goog)

AWS (examples/simple):
- cert_manager_irsa, acme_cluster_issuer, route53_dns modules
- cert-manager gets IRSA role annotation
- Route53 ALIAS records point to existing NLB
- Default ACME CA: Let's Encrypt

Azure (examples/simple):
- static_ips, cert_manager_identity, acme_cluster_issuer, dns_records
- cert-manager gets Azure Workload Identity annotation + pod label
- Default ACME CA: Let's Encrypt

All clouds: materialize_instance gets public DNS names, dual issuer
refs (ACME external + self-signed internal), and load balancers get
optional static IPs when public TLS is enabled.
Test utilities:
- test/utils/helpers/connectivity.go: DNS resolution, TLS certificate,
  and HTTPS validation helpers
- test/utils/constants.go: add PublicTLSSuffix and MaterializePublicTLSDir

Test fixtures (all clouds):
- Add conditional public TLS module wiring matching the examples
- New variables: enable_public_tls, DNS zone, hostnames, ACME config
- New outputs: conditional balancerd/console hostnames

Test stages (all clouds):
- testPublicTLS stage gated on DNS_ZONE_NAME (GCP/Azure) or
  ROUTE53_HOSTED_ZONE_ID (AWS) environment variable
- Validates DNS resolution, TLS certificate SANs, and HTTPS connectivity
- Wired into TestFullDeployment as Stage 4 with cleanup in TearDownSuite
@pH14 pH14 changed the title Public tls upstream add in optional public endpoitns / TLS support Mar 10, 2026
@pH14 pH14 changed the title add in optional public endpoitns / TLS support add in optional public endpoints / TLS support Mar 10, 2026
@pH14 pH14 changed the title add in optional public endpoints / TLS support CLO-13: add in optional public endpoints / TLS support Mar 10, 2026
variable "acme_server" {
description = "ACME server URL for certificate issuance"
type = string
default = "https://acme-v02.api.letsencrypt.org/directory"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always default to staging, since prod is rate limited and will block you if you mess up too many times!

Suggested change
default = "https://acme-v02.api.letsencrypt.org/directory"
default = "https://acme-staging-v02.api.letsencrypt.org/directory"

default = null
}

variable "balancerd_dns_names" {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at time of writing, we now support balancerd_extra_dns_names and console_extra_dns_names. we can probably drop those variants

}

variable "enable_public_tls" {
description = "Enable public TLS with ACME certificates and Route53 DNS"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in aws, cert-manager also supports the aws-privateca-issuer which may be desirable for some enterprises

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants