Add fully Terraform-managed EKS deployment mode by erikdw · Pull Request #232 · braintrustdata/terraform-aws-braintrust-data-plane

Erik Weathers (erikdw) · 2026-04-22T22:39:08Z

Introduces create_eks_cluster = true, which provisions an EKS cluster, the supporting AWS infrastructure, and the Braintrust Helm release end-to-end. Previously use_deployment_mode_external_eks assumed the cluster was managed outside Terraform; the new mode lets the module own the full lifecycle.

Structure

Two new submodules under modules/, plus a thin root-level wiring file (eks.tf).

`modules/eks-cluster/` — AWS infrastructure

EKS cluster via terraform-aws-modules/eks v21
OIDC provider and IRSA trust policies, OIDC-only, scoped to the braintrust-api and brainstore service accounts
Core addons: vpc-cni, coredns, kube-proxy
Private-subnet tagging (kubernetes.io/role/internal-elb) for Load Balancer Controller discovery
Pre-created internal NLB with a CloudFront-restricted security group. NLB security groups cannot be attached after creation, so the module creates the NLB itself and lets the LB Controller adopt it via the service.beta.kubernetes.io/aws-load-balancer-name annotation
CloudFront VPC Origin and distribution: default behavior routes to the EKS API; AI-proxy paths route to braintrustproxy.com
IAM role for the AWS Load Balancer Controller (IRSA)

`modules/eks-deploy/` — Kubernetes + Helm

Kubernetes namespace
Runtime Secret with keys PG_URL, REDIS_URL, FUNCTION_SECRET_KEY, BRAINSTORE_LICENSE_KEY. The name (braintrust-secrets) and keys are hardcoded by the chart.
Helm release: AWS Load Balancer Controller
Helm release: Braintrust chart, with a thin values template that sets only what the module owns (org name, namespace, cloud, S3 buckets, IRSA role ARNs, NLB adoption annotations, WAL / no-PG flags). Chart defaults handle everything else.
Structured per-component overrides — api_helm, brainstore_reader_helm, brainstore_fastreader_helm, brainstore_writer_helm — each accepting optional replicas and resources. Raw-YAML helm_chart_extra_values as an escape hatch for anything the structured variables do not cover.

Why two submodules (and a root-level wiring file)

services_common creates IAM roles shared with the non-EKS (Lambda / EC2) path, so it must sit at the root between eks_cluster (produces trust policies) and eks_deploy (consumes role ARNs). Wrapping both EKS submodules in a single parent would create a module-level dependency cycle through services_common: eks_deploy would need role ARNs from services_common, while services_common would need trust policies from eks_cluster, and Terraform treats module I/O atomically for cycle detection.

Changes to existing root files

main.tf: services_common uses eks_cluster's trust policies as override_*_trust_policy when create_eks_cluster = true; the database and redis authorized_security_groups include the EKS node security group in EKS mode.
outputs.tf: api_url and cloudfront_* outputs resolve to the EKS CloudFront distribution in EKS mode. Adds EKS-specific outputs plus database, Redis, storage, and IAM outputs consumed by downstream integrations.
variables.tf: new EKS knobs — create_eks_cluster, eks_node_instance_type, eks_node_min_size, eks_node_max_size, eks_node_desired_size, eks_kubernetes_version, helm_chart_version, and the four structured helm-override variables plus the raw-YAML escape hatch.
versions.tf: notes that kubernetes, helm, and random are declared in modules/eks-deploy. Non-EKS consumers still need empty provider blocks at the root because Terraform aggregates provider requirements across all submodules regardless of count, but the underlying resources are never evaluated when create_eks_cluster = false.

Example

examples/braintrust-data-plane-eks/ is a thin consumer — provider configuration plus a single module call — demonstrating the two-step apply workflow required on a fresh deployment:

terraform apply -target=module.braintrust.module.eks_cluster[0]
terraform apply

Step 1 creates the cluster so the kubernetes and helm providers in provider.tf can resolve its endpoint via data.aws_eks_cluster looked up by the known name ${deployment_name}-eks. Step 2 deploys the Kubernetes namespace, Secret, and Helm releases.

Contract

CONTRACT.md documents the coupling surface between this module and braintrustdata/helm: service account names, Secret name and keys, API port 8000, the helm-values schema this module writes, and the assumption that brainstore.fastreader.replicas >= 1 — the chart's api-configmap.yaml unconditionally emits BRAINSTORE_FAST_READER_URL, so replicas = 0 would leave the API pointing at an empty service. A mirror of CONTRACT.md lives in the helm repo.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Introduces `create_eks_cluster = true`, which provisions an EKS cluster, the supporting AWS infrastructure, and the Braintrust Helm release end-to-end. Previously `use_deployment_mode_external_eks` assumed the cluster was managed outside Terraform; the new mode lets the module own the full lifecycle. ## Structure Two new submodules under `modules/`, plus a thin root-level wiring file (`eks.tf`). ### `modules/eks-cluster/` — AWS infrastructure - EKS cluster via `terraform-aws-modules/eks` v21 - OIDC provider and IRSA trust policies, OIDC-only, scoped to the `braintrust-api` and `brainstore` service accounts - Core addons: `vpc-cni`, `coredns`, `kube-proxy` - Private-subnet tagging (`kubernetes.io/role/internal-elb`) for Load Balancer Controller discovery - Pre-created internal NLB with a CloudFront-restricted security group. NLB security groups cannot be attached after creation, so the module creates the NLB itself and lets the LB Controller adopt it via the `service.beta.kubernetes.io/aws-load-balancer-name` annotation - CloudFront VPC Origin and distribution: default behavior routes to the EKS API; AI-proxy paths route to `braintrustproxy.com` - IAM role for the AWS Load Balancer Controller (IRSA) ### `modules/eks-deploy/` — Kubernetes + Helm - Kubernetes namespace - Runtime `Secret` with keys `PG_URL`, `REDIS_URL`, `FUNCTION_SECRET_KEY`, `BRAINSTORE_LICENSE_KEY`. The name (`braintrust-secrets`) and keys are hardcoded by the chart. - Helm release: AWS Load Balancer Controller - Helm release: Braintrust chart, with a thin values template that sets only what the module owns (org name, namespace, `cloud`, S3 buckets, IRSA role ARNs, NLB adoption annotations, WAL / no-PG flags). Chart defaults handle everything else. - Structured per-component overrides — `api_helm`, `brainstore_reader_helm`, `brainstore_fastreader_helm`, `brainstore_writer_helm` — each accepting optional `replicas` and `resources`. Raw-YAML `helm_chart_extra_values` as an escape hatch for anything the structured variables do not cover. ### Why two submodules (and a root-level wiring file) `services_common` creates IAM roles shared with the non-EKS (Lambda / EC2) path, so it must sit at the root between `eks_cluster` (produces trust policies) and `eks_deploy` (consumes role ARNs). Wrapping both EKS submodules in a single parent would create a module-level dependency cycle through `services_common`: `eks_deploy` would need role ARNs from `services_common`, while `services_common` would need trust policies from `eks_cluster`, and Terraform treats module I/O atomically for cycle detection. ## Changes to existing root files - `main.tf`: `services_common` uses `eks_cluster`'s trust policies as `override_*_trust_policy` when `create_eks_cluster = true`; the `database` and `redis` `authorized_security_groups` include the EKS node security group in EKS mode. - `outputs.tf`: `api_url` and `cloudfront_*` outputs resolve to the EKS CloudFront distribution in EKS mode. Adds EKS-specific outputs plus database, Redis, storage, and IAM outputs consumed by downstream integrations. - `variables.tf`: new EKS knobs — `create_eks_cluster`, `eks_node_instance_type`, `eks_node_min_size`, `eks_node_max_size`, `eks_node_desired_size`, `eks_kubernetes_version`, `helm_chart_version`, and the four structured helm-override variables plus the raw-YAML escape hatch. - `versions.tf`: notes that `kubernetes`, `helm`, and `random` are declared in `modules/eks-deploy`. Non-EKS consumers still need empty provider blocks at the root because Terraform aggregates provider requirements across all submodules regardless of `count`, but the underlying resources are never evaluated when `create_eks_cluster = false`. ## Example `examples/braintrust-data-plane-eks/` is a thin consumer — provider configuration plus a single module call — demonstrating the two-step apply workflow required on a fresh deployment: terraform apply -target=module.braintrust.module.eks_cluster[0] terraform apply Step 1 creates the cluster so the `kubernetes` and `helm` providers in `provider.tf` can resolve its endpoint via `data.aws_eks_cluster` looked up by the known name `${deployment_name}-eks`. Step 2 deploys the Kubernetes namespace, `Secret`, and Helm releases. ## Contract `CONTRACT.md` documents the coupling surface between this module and `braintrustdata/helm`: service account names, `Secret` name and keys, API port `8000`, the helm-values schema this module writes, and the assumption that `brainstore.fastreader.replicas >= 1` — the chart's `api-configmap.yaml` unconditionally emits `BRAINSTORE_FAST_READER_URL`, so `replicas = 0` would leave the API pointing at an empty service. A mirror of `CONTRACT.md` lives in the helm repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Erik Weathers (erikdw) · 2026-04-24T22:10:39Z

Superseded by: #233

Erik Weathers (erikdw) requested a review from Mike Deeks (mdeeks) April 22, 2026 22:39

Erik Weathers (erikdw) closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fully Terraform-managed EKS deployment mode#232

Add fully Terraform-managed EKS deployment mode#232
Erik Weathers (erikdw) wants to merge 1 commit intomainfrom
erikdw/add-eks-deployment-mode

Erik Weathers (erikdw) commented Apr 22, 2026

Uh oh!

Erik Weathers (erikdw) commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Erik Weathers (erikdw) commented Apr 22, 2026

Structure

modules/eks-cluster/ — AWS infrastructure

modules/eks-deploy/ — Kubernetes + Helm

Why two submodules (and a root-level wiring file)

Changes to existing root files

Example

Contract

Uh oh!

Erik Weathers (erikdw) commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`modules/eks-cluster/` — AWS infrastructure

`modules/eks-deploy/` — Kubernetes + Helm