Add fully Terraform-managed EKS deployment mode#232
Closed
Erik Weathers (erikdw) wants to merge 1 commit intomainfrom
Closed
Add fully Terraform-managed EKS deployment mode#232Erik Weathers (erikdw) wants to merge 1 commit intomainfrom
Erik Weathers (erikdw) wants to merge 1 commit intomainfrom
Conversation
Introduces `create_eks_cluster = true`, which provisions an EKS cluster, the supporting AWS infrastructure, and the Braintrust Helm release end-to-end. Previously `use_deployment_mode_external_eks` assumed the cluster was managed outside Terraform; the new mode lets the module own the full lifecycle.
## Structure
Two new submodules under `modules/`, plus a thin root-level wiring file (`eks.tf`).
### `modules/eks-cluster/` — AWS infrastructure
- EKS cluster via `terraform-aws-modules/eks` v21
- OIDC provider and IRSA trust policies, OIDC-only, scoped to the `braintrust-api` and `brainstore` service accounts
- Core addons: `vpc-cni`, `coredns`, `kube-proxy`
- Private-subnet tagging (`kubernetes.io/role/internal-elb`) for Load Balancer Controller discovery
- Pre-created internal NLB with a CloudFront-restricted security group. NLB security groups cannot be attached after creation, so the module creates the NLB itself and lets the LB Controller adopt it via the `service.beta.kubernetes.io/aws-load-balancer-name` annotation
- CloudFront VPC Origin and distribution: default behavior routes to the EKS API; AI-proxy paths route to `braintrustproxy.com`
- IAM role for the AWS Load Balancer Controller (IRSA)
### `modules/eks-deploy/` — Kubernetes + Helm
- Kubernetes namespace
- Runtime `Secret` with keys `PG_URL`, `REDIS_URL`, `FUNCTION_SECRET_KEY`, `BRAINSTORE_LICENSE_KEY`. The name (`braintrust-secrets`) and keys are hardcoded by the chart.
- Helm release: AWS Load Balancer Controller
- Helm release: Braintrust chart, with a thin values template that sets only what the module owns (org name, namespace, `cloud`, S3 buckets, IRSA role ARNs, NLB adoption annotations, WAL / no-PG flags). Chart defaults handle everything else.
- Structured per-component overrides — `api_helm`, `brainstore_reader_helm`, `brainstore_fastreader_helm`, `brainstore_writer_helm` — each accepting optional `replicas` and `resources`. Raw-YAML `helm_chart_extra_values` as an escape hatch for anything the structured variables do not cover.
### Why two submodules (and a root-level wiring file)
`services_common` creates IAM roles shared with the non-EKS (Lambda / EC2) path, so it must sit at the root between `eks_cluster` (produces trust policies) and `eks_deploy` (consumes role ARNs). Wrapping both EKS submodules in a single parent would create a module-level dependency cycle through `services_common`: `eks_deploy` would need role ARNs from `services_common`, while `services_common` would need trust policies from `eks_cluster`, and Terraform treats module I/O atomically for cycle detection.
## Changes to existing root files
- `main.tf`: `services_common` uses `eks_cluster`'s trust policies as `override_*_trust_policy` when `create_eks_cluster = true`; the `database` and `redis` `authorized_security_groups` include the EKS node security group in EKS mode.
- `outputs.tf`: `api_url` and `cloudfront_*` outputs resolve to the EKS CloudFront distribution in EKS mode. Adds EKS-specific outputs plus database, Redis, storage, and IAM outputs consumed by downstream integrations.
- `variables.tf`: new EKS knobs — `create_eks_cluster`, `eks_node_instance_type`, `eks_node_min_size`, `eks_node_max_size`, `eks_node_desired_size`, `eks_kubernetes_version`, `helm_chart_version`, and the four structured helm-override variables plus the raw-YAML escape hatch.
- `versions.tf`: notes that `kubernetes`, `helm`, and `random` are declared in `modules/eks-deploy`. Non-EKS consumers still need empty provider blocks at the root because Terraform aggregates provider requirements across all submodules regardless of `count`, but the underlying resources are never evaluated when `create_eks_cluster = false`.
## Example
`examples/braintrust-data-plane-eks/` is a thin consumer — provider configuration plus a single module call — demonstrating the two-step apply workflow required on a fresh deployment:
terraform apply -target=module.braintrust.module.eks_cluster[0]
terraform apply
Step 1 creates the cluster so the `kubernetes` and `helm` providers in `provider.tf` can resolve its endpoint via `data.aws_eks_cluster` looked up by the known name `${deployment_name}-eks`. Step 2 deploys the Kubernetes namespace, `Secret`, and Helm releases.
## Contract
`CONTRACT.md` documents the coupling surface between this module and `braintrustdata/helm`: service account names, `Secret` name and keys, API port `8000`, the helm-values schema this module writes, and the assumption that `brainstore.fastreader.replicas >= 1` — the chart's `api-configmap.yaml` unconditionally emits `BRAINSTORE_FAST_READER_URL`, so `replicas = 0` would leave the API pointing at an empty service. A mirror of `CONTRACT.md` lives in the helm repo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
Superseded by: #233 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces
create_eks_cluster = true, which provisions an EKS cluster, the supporting AWS infrastructure, and the Braintrust Helm release end-to-end. Previouslyuse_deployment_mode_external_eksassumed the cluster was managed outside Terraform; the new mode lets the module own the full lifecycle.Structure
Two new submodules under
modules/, plus a thin root-level wiring file (eks.tf).modules/eks-cluster/— AWS infrastructureterraform-aws-modules/eksv21braintrust-apiandbrainstoreservice accountsvpc-cni,coredns,kube-proxykubernetes.io/role/internal-elb) for Load Balancer Controller discoveryservice.beta.kubernetes.io/aws-load-balancer-nameannotationbraintrustproxy.commodules/eks-deploy/— Kubernetes + HelmSecretwith keysPG_URL,REDIS_URL,FUNCTION_SECRET_KEY,BRAINSTORE_LICENSE_KEY. The name (braintrust-secrets) and keys are hardcoded by the chart.cloud, S3 buckets, IRSA role ARNs, NLB adoption annotations, WAL / no-PG flags). Chart defaults handle everything else.api_helm,brainstore_reader_helm,brainstore_fastreader_helm,brainstore_writer_helm— each accepting optionalreplicasandresources. Raw-YAMLhelm_chart_extra_valuesas an escape hatch for anything the structured variables do not cover.Why two submodules (and a root-level wiring file)
services_commoncreates IAM roles shared with the non-EKS (Lambda / EC2) path, so it must sit at the root betweeneks_cluster(produces trust policies) andeks_deploy(consumes role ARNs). Wrapping both EKS submodules in a single parent would create a module-level dependency cycle throughservices_common:eks_deploywould need role ARNs fromservices_common, whileservices_commonwould need trust policies fromeks_cluster, and Terraform treats module I/O atomically for cycle detection.Changes to existing root files
main.tf:services_commonuseseks_cluster's trust policies asoverride_*_trust_policywhencreate_eks_cluster = true; thedatabaseandredisauthorized_security_groupsinclude the EKS node security group in EKS mode.outputs.tf:api_urlandcloudfront_*outputs resolve to the EKS CloudFront distribution in EKS mode. Adds EKS-specific outputs plus database, Redis, storage, and IAM outputs consumed by downstream integrations.variables.tf: new EKS knobs —create_eks_cluster,eks_node_instance_type,eks_node_min_size,eks_node_max_size,eks_node_desired_size,eks_kubernetes_version,helm_chart_version, and the four structured helm-override variables plus the raw-YAML escape hatch.versions.tf: notes thatkubernetes,helm, andrandomare declared inmodules/eks-deploy. Non-EKS consumers still need empty provider blocks at the root because Terraform aggregates provider requirements across all submodules regardless ofcount, but the underlying resources are never evaluated whencreate_eks_cluster = false.Example
examples/braintrust-data-plane-eks/is a thin consumer — provider configuration plus a single module call — demonstrating the two-step apply workflow required on a fresh deployment:Step 1 creates the cluster so the
kubernetesandhelmproviders inprovider.tfcan resolve its endpoint viadata.aws_eks_clusterlooked up by the known name${deployment_name}-eks. Step 2 deploys the Kubernetes namespace,Secret, and Helm releases.Contract
CONTRACT.mddocuments the coupling surface between this module andbraintrustdata/helm: service account names,Secretname and keys, API port8000, the helm-values schema this module writes, and the assumption thatbrainstore.fastreader.replicas >= 1— the chart'sapi-configmap.yamlunconditionally emitsBRAINSTORE_FAST_READER_URL, soreplicas = 0would leave the API pointing at an empty service. A mirror ofCONTRACT.mdlives in the helm repo.Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com