CLO-13: add in optional public endpoints / TLS support#164
Open
pH14 wants to merge 6 commits into
Open
Conversation
Extend the shared Kubernetes modules to support opt-in public TLS: - materialize-instance: add internal_issuer_ref (separate issuer for internal certs), balancerd_dns_names/console_dns_names (configurable certificate SANs), and system_parameters (ConfigMap support) - cert-manager: add service_account_annotations and pod_labels for cloud-native identity federation (GCP WI, AWS IRSA, Azure WI) - acme-cluster-issuer: new generic module that creates an ACME ClusterIssuer with a pluggable DNS01 solver config All changes are additive with defaults that preserve existing behavior.
New modules for GCP public TLS infrastructure: - static_ips: reserves regional external IPs for balancerd and console - dns_records: creates Cloud DNS A records pointing to static IPs - cert_manager_wi: sets up Workload Identity so cert-manager can solve DNS01 challenges via Cloud DNS (GSA + roles/dns.admin + WI binding) - load_balancers: accept optional static IP addresses (null = ephemeral, preserving current behavior)
New modules for AWS public TLS infrastructure: - route53_dns: creates Route53 ALIAS A records pointing to the NLB (no static IPs needed since NLBs have stable DNS names) - cert_manager_irsa: creates an IRSA role granting cert-manager Route53 access for DNS01 challenge solving - nlb: add nlb_zone_id output (required for Route53 ALIAS records)
New modules: - azure/modules/static_ips: reserve Standard/Static public IPs for balancerd and console services - azure/modules/dns_records: create Azure DNS A records for public hostnames - azure/modules/cert_manager_identity: managed identity with federated credential for cert-manager to access Azure DNS (DNS Zone Contributor) Modified: - azure/modules/load_balancers: accept optional static IP addresses for balancerd and console services (null = ephemeral, preserving current behavior)
Add opt-in public TLS support (enable_public_tls = false by default) to all three cloud examples with zero-diff when disabled. GCP (examples/simple): - static_ips, cert_manager_wi, acme_cluster_issuer, dns_records modules - cert-manager gets GCP Workload Identity SA annotation - Default ACME CA: Google Trust Services (pki.goog) AWS (examples/simple): - cert_manager_irsa, acme_cluster_issuer, route53_dns modules - cert-manager gets IRSA role annotation - Route53 ALIAS records point to existing NLB - Default ACME CA: Let's Encrypt Azure (examples/simple): - static_ips, cert_manager_identity, acme_cluster_issuer, dns_records - cert-manager gets Azure Workload Identity annotation + pod label - Default ACME CA: Let's Encrypt All clouds: materialize_instance gets public DNS names, dual issuer refs (ACME external + self-signed internal), and load balancers get optional static IPs when public TLS is enabled.
Test utilities: - test/utils/helpers/connectivity.go: DNS resolution, TLS certificate, and HTTPS validation helpers - test/utils/constants.go: add PublicTLSSuffix and MaterializePublicTLSDir Test fixtures (all clouds): - Add conditional public TLS module wiring matching the examples - New variables: enable_public_tls, DNS zone, hostnames, ACME config - New outputs: conditional balancerd/console hostnames Test stages (all clouds): - testPublicTLS stage gated on DNS_ZONE_NAME (GCP/Azure) or ROUTE53_HOSTED_ZONE_ID (AWS) environment variable - Validates DNS resolution, TLS certificate SANs, and HTTPS connectivity - Wired into TestFullDeployment as Stage 4 with cleanup in TearDownSuite
Alphadelta14
reviewed
Jun 5, 2026
| variable "acme_server" { | ||
| description = "ACME server URL for certificate issuance" | ||
| type = string | ||
| default = "https://acme-v02.api.letsencrypt.org/directory" |
Member
There was a problem hiding this comment.
Always default to staging, since prod is rate limited and will block you if you mess up too many times!
Suggested change
| default = "https://acme-v02.api.letsencrypt.org/directory" | |
| default = "https://acme-staging-v02.api.letsencrypt.org/directory" |
| default = null | ||
| } | ||
|
|
||
| variable "balancerd_dns_names" { |
Member
There was a problem hiding this comment.
at time of writing, we now support balancerd_extra_dns_names and console_extra_dns_names. we can probably drop those variants
| } | ||
|
|
||
| variable "enable_public_tls" { | ||
| description = "Enable public TLS with ACME certificates and Route53 DNS" |
Member
There was a problem hiding this comment.
in aws, cert-manager also supports the aws-privateca-issuer which may be desirable for some enterprises
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Human preamble: I am not sure the best way to write / PR / test this change. What follows is a Claude generalization of the work in https://github.com/MaterializeInc/mz-context-graph/pull/2 on GCP to create stable public endpoints + certs for the console + balancer endpoints. I don't really know Terraform, nor have I touched golang in like 7 years 😅
The changes are best reviewed in commit order.
AI writeup ensues:
Add opt-in public TLS support across all three clouds
Adds end-to-end public TLS access to Materialize deployments behind an
enable_public_tls = falseflag. When enabled,services get publicly-trusted certificates via cert-manager ACME with cloud-native DNS01 solvers, static IPs (GCP/Azure), and
DNS records.
Zero-diff when
enable_public_tls = false(the default).Architecture
All three clouds follow the same pattern:
DNS strategy differs by cloud:
Default ACME CAs:
pki.goog)All configurable via
acme_server.New modules
kubernetes/modules/acme-cluster-issuergcp/modules/static_ipsgoogle_compute_addressfor balancerd + consolegcp/modules/dns_recordsgcp/modules/cert_manager_wiaws/modules/route53_dnsaws/modules/cert_manager_irsaazure/modules/static_ipsazure/modules/dns_recordsazure/modules/cert_manager_identityModified modules
kubernetes/modules/materialize-instanceinternal_issuer_ref, system parameterskubernetes/modules/cert-managerservice_account_annotations,pod_labelsfor cloud IAMgcp/modules/load_balancersazure/modules/load_balancersaws/modules/nlbnlb_zone_idoutput for Route53 ALIASTests
New
testPublicTLSstage for all three clouds, gated on DNS zone env vars (skipped by default). Validates DNS resolution,TLS certificate SANs, and HTTPS connectivity. Shared helpers in
test/utils/helpers/connectivity.go.Commit structure