braintrustdata · Erik Weathers (erikdw) · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/CONTRACT.md b/CONTRACT.md
@@ -0,0 +1,126 @@
+# Terraform ↔ Helm Contract
+
+This Terraform module is paired with the Braintrust Helm chart in
+[`braintrustdata/helm`](https://github.com/braintrustdata/helm). When
+`create_eks_cluster = true`, the module provisions an EKS Auto Mode
+cluster and related AWS infrastructure, then deploys the Helm chart on
+top of it. Several names, ports, and keys are hardcoded on both sides;
+this document enumerates them.
+
+## Pinned chart compatibility
+
+| Field | Value |
+|---|---|
+| Braintrust Helm chart | `oci://public.ecr.aws/braintrust/helm` |
+| Tested chart version | `6.1.0` |
+| Supported range | `6.x` |
+
+`helm_chart_version` in the module has no default when `create_eks_cluster = true` — consumers must pin.
+
+## Coupling surfaces
+
+### Names and identifiers
+
+| Thing | TF location | Chart location | Failure mode |
+|---|---|---|---|
+| API service account name `braintrust-api` | `modules/eks-deploy/variables.tf` default `api_service_account_name`; used as the Pod Identity association's `service_account` in `modules/eks-deploy/main.tf` | `api.serviceAccount.name` default in chart `values.yaml`; referenced by `api-deployment.yaml` as `serviceAccountName` | **Silent runtime**: pod starts but Pod Identity lookup finds no association for that SA name, AWS SDK calls return 403 |
+| Brainstore service account name `brainstore` | `modules/eks-deploy/variables.tf` default `brainstore_service_account_name`; used as the Pod Identity association's `service_account` | `brainstore.serviceAccount.name` default in chart | Silent runtime (same as above) |
+| K8s Secret name `braintrust-secrets` | `kubernetes_secret.braintrust` in `modules/eks-deploy/main.tf` | `api-deployment.yaml` and `brainstore-*-deployment.yaml` hardcode `secretKeyRef.name: braintrust-secrets` | Pod fails to start: `CreateContainerConfigError` |
+| Secret keys `PG_URL`, `REDIS_URL`, `FUNCTION_SECRET_KEY`, `BRAINSTORE_LICENSE_KEY` | `data = { ... }` in `kubernetes_secret.braintrust` | Referenced by name in chart deployment templates | Pod start-time failure (missing env var) |
+| Namespace | `var.eks_namespace` → `kubernetes_namespace.braintrust` in `modules/eks-deploy/main.tf` + passed as template `namespace` var | `global.namespace` (used in configmap to build `BRAINSTORE_*_URL`); runtime namespace resolved via `braintrust.namespace` helper to `.Release.Namespace` when `createNamespace: false` | Pods run in wrong namespace; intra-cluster DNS fails |
+| Brainstore NodePool label `braintrust.dev/node-pool: brainstore` | `helm_release.brainstore_nodepool` in `modules/eks-deploy/main.tf`, which installs the local chart at `modules/eks-deploy/charts/brainstore-nodepool/` | `nodeSelector` on the three Brainstore components in `helm-values.yaml.tpl`; chart passes it through to the pod spec unchanged | Brainstore pods stay Pending (no node matches) |
+
+### Network / ports
+
+| Thing | TF location | Chart location | Failure mode |
+|---|---|---|---|
+| API port `8000` | `aws_cloudfront_vpc_origin.api.http_port` in `modules/eks-cluster/cloudfront.tf`; `aws_vpc_security_group_ingress_rule.nodes_from_nlb` in `modules/eks-cluster/networking.tf` (NLB SG → cluster SG ingress on 8000) | `api.service.port` default `8000`; `api-deployment.yaml` containerPort | **Silent at deploy**: CloudFront → NLB → pod path dead or NLB target-group health checks fail |
+| Pre-created NLB adopted via `service.beta.kubernetes.io/aws-load-balancer-name` | `aws_lb.api.name` in `modules/eks-cluster/networking.tf` (exposed as the root's `eks_nlb_name` output) | `api.annotations.service.*` — the Auto-Mode-managed LB Controller reads this annotation | If the chart stops passing annotations through or the controller renames `aws-load-balancer-name`, the controller creates a parallel NLB; CloudFront VPC Origin points at the orphan |
+| NLB security group | `aws_security_group.nlb_cloudfront` in `modules/eks-cluster/networking.tf` (NLBs only accept SGs at creation time) | `service.beta.kubernetes.io/aws-load-balancer-security-groups` in `api.annotations.service` | Adopted NLB gets wrong SG; CloudFront can't reach it |
+
+### Helm values schema the module writes
+
+Template at `modules/eks-deploy/assets/helm-values.yaml.tpl`. Any of these
+keys moving or renaming in the chart breaks us silently (template writes
+a dead key, chart uses its default).
+
+- `global.orgName`
+- `global.createNamespace` (set to `false`)
+- `global.namespace`
+- `cloud` (set to `"aws"`)
+- `skipPgForBrainstoreObjects`
+- `brainstoreWalFooterVersion`
+- `objectStorage.aws.brainstoreBucket`
+- `objectStorage.aws.responseBucket`
+- `objectStorage.aws.codeBundleBucket`
+- `api.service.type` (set to `LoadBalancer`)
+- `api.annotations.service.*` (six NLB-related annotations: `aws-load-balancer-scheme`, `-type`, `-security-groups`, `-name`, `-additional-resource-tags`, `-target-group-attributes`)
+- `api.serviceAccount.awsRoleArn`
+- `brainstore.serviceAccount.awsRoleArn`
+- `brainstore.{reader,fastreader,writer}.nodeSelector`
+
+### Feature-flag value domains
+
+TF validates allowed values at `terraform plan` time. Accepted values for
+these fields must stay in sync:
+
+- `brainstoreWalFooterVersion`: TF allows `""`, `"v1"`, `"v2"`, `"v3"`. When the chart adds support for a new version, coordinate updating TF's validation.
+- `skipPgForBrainstoreObjects`: TF allows `""`, `"all"`, `"include:…"`, `"exclude:…"`. Chart passes through unchanged.
+
+### Pod Identity vs IRSA
+
+This module uses **EKS Pod Identity** (not IRSA) to give the API and
+Brainstore pods AWS credentials, because Auto Mode ships the Pod Identity
+Agent built-in. Mechanics:
+
+- `services_common` builds an IAM trust policy with `pods.eks.amazonaws.com` as the principal, scoped by session tags (`aws:RequestTag/eks-cluster-arn`, `aws:RequestTag/kubernetes-namespace`) to this specific cluster and namespace.
+- `modules/eks-deploy/` creates `aws_eks_pod_identity_association` resources binding `(cluster, namespace, service-account)` to the IAM role.
+
+The chart's api/brainstore service-account templates still render an
+`eks.amazonaws.com/role-arn: <awsRoleArn>` annotation (the IRSA path).
+That's harmless here — AWS SDK credential resolution checks
+`AWS_CONTAINER_CREDENTIALS_FULL_URI` (Pod Identity) before
+`AWS_WEB_IDENTITY_TOKEN_FILE` (IRSA), so Pod Identity wins and IRSA is
+never consulted.
+
+### Assumptions baked into the contract
+
+- **EKS mode assumes a fast reader is always deployed.** The chart defaults `brainstore.fastreader.replicas = 2` and unconditionally emits `BRAINSTORE_FAST_READER_URL` + `BRAINSTORE_FAST_READER_QUERY_SOURCES` from `api-configmap.yaml`, so the API always believes fast readers exist. Callers who override `brainstore.fastreader.replicas: 0` via `eks_helm_values_file` opt out of this contract and own the resulting query failures.
+- **Brainstore nodes are NVMe-backed.** The custom NodePool constrains Karpenter to the `c8gd` / `c7gd` / `m7gd` families by default (configurable via `eks_brainstore_nodepool_instance_families`). Brainstore caches data to an `emptyDir` volume on node-local storage; an EBS-backed fallback would be functional but much slower.
+
+## Checklists
+
+### Changing this module
+
+- If the change touches any row of a table above, open a matching issue/PR in `braintrustdata/helm`.
+- If you rename a service-account name or the secret name, update both the Pod Identity association and the chart values / secret name in lockstep.
+
+### Changing the Helm chart
+
+- Renaming any `.Values.*` key listed in "Helm values schema" → file an issue here to update `helm-values.yaml.tpl`.
+- Renaming `api.serviceAccount.name` or `brainstore.serviceAccount.name` defaults → Pod Identity associations in TF use these as the `service_account` selector; will silently break without a coordinated TF change.
+- Changing the API service port default away from `8000` → CloudFront VPC Origin and NLB SG in TF expect 8000; add a TF variable for port first.
+- Adding a new required secret key → TF must populate it in `kubernetes_secret.braintrust`. Coordinate.
+
+### Bumping the chart version used in the example
+
+- Diff `values.yaml` between the old and new chart versions; scan for any key in the "Helm values schema" list above.
+- `helm template` the new chart with this module's rendered values and grep for the hardcoded names: `braintrust-api`, `brainstore`, `braintrust-secrets`, the four secret keys, `containerPort: 8000`.
+
+## Deployment isolation: `deployment_name` must be unique per account+region
+
+Several coupling surfaces assume exactly one dataplane per `(AWS account, region, deployment_name)` tuple. Two deployments sharing a `deployment_name` will collide on:
+
+- EKS cluster name (`${deployment_name}-eks`) — AWS rejects the second create.
+- NLB name (`${deployment_name}-api-nlb`) — same.
+- RDS instance identifier (`${deployment_name}-main`) — same.
+- IAM role names (`${deployment_name}-APIHandlerRole`, etc.) — same.
+- S3 bucket `bucket_prefix` values — Terraform-generated suffix makes these unique per-apply, but re-apply against a different state would fail on the existing resources.
+
+Two deployments with **distinct** `deployment_name` values in the same account+region coexist successfully; three have been validated simultaneously in a test account. The one remaining cosmetic overlap is the Kubernetes-owned TargetGroup names that the LB Controller auto-generates (`k8s-<ns-8>-<svc-8>-<hash>`). Because the namespace (`braintrust`) and service name (`braintrust-api`) are chart-fixed, every deployment gets an identically-prefixed TG name in the ELB console. The controller does not expose a TG-name override annotation. The `service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags` annotation (set by this module to `BraintrustDeploymentName=${deployment_name}`) tags controller-created resources so operators can disambiguate via `tag:` filter instead of by name.
+
+## Future: mechanical drift detection
+
+Manual safety net today. Planned: CI smoke test that renders `helm
+template` with TF-shaped fixture values and asserts the contract, plus a
+symmetric test in the helm repo.
diff --git a/README.md b/README.md
@@ -12,6 +12,10 @@ To use this module, **copy the [`examples/braintrust-data-plane`](examples/brain
 
 The default configuration is a large production-sized deployment. Please consider that when testing and adjust the configuration to use smaller sized resources.
 
+### EKS Auto Mode deployment
+
+An alternative EKS-based deployment is available via `create_eks_cluster = true` (requires `use_deployment_mode_external_eks = true`). In that mode, the module provisions an EKS Auto Mode cluster and deploys Braintrust as pods via the [Braintrust Helm chart](https://github.com/braintrustdata/helm), replacing the Lambda-based ingress and EC2-based Brainstore paths. See [`examples/braintrust-data-plane-eks`](examples/braintrust-data-plane-eks) for the production example and [`examples/braintrust-data-plane-eks-sandbox`](examples/braintrust-data-plane-eks-sandbox) for a cheap disposable sandbox variant. See [`TROUBLESHOOTING.md`](TROUBLESHOOTING.md) for apply/destroy failure runbooks and [`RECOVERY.md`](RECOVERY.md) for disaster-recovery scenarios (e.g. out-of-band cluster deletion).
+
 If you're using a brand new AWS account for your Braintrust data plane you will need to run ./scripts/create-service-linked-roles.sh once to ensure IAM service-linked roles are created.
 
 ## Module Configuration
@@ -22,6 +26,8 @@ All module input variables and outputs are documented inline in the module's Ter
 ### dump-logs.sh
 This script will dump the logs for the given deployment and services to the `logs-<deployment_name>` directory. This is useful for debugging issues with the data plane and sharing with the Braintrust team.
 
+**Note:** this script covers the Lambda + EC2 Brainstore deployment mode only. In EKS mode (`create_eks_cluster = true`) the chart doesn't ship logs to CloudWatch, and this script has nothing to fetch; use `kubectl logs` instead. See [`TROUBLESHOOTING.md`](TROUBLESHOOTING.md) and [`RECOVERY.md`](RECOVERY.md) for EKS-mode runbooks.
+
 ```
 # ./dump-logs.sh <deployment_name> [--minutes N] [--service <svc1,svc2,...|all>]
 

diff --git a/RECOVERY.md b/RECOVERY.md
@@ -0,0 +1,49 @@
+# Recovery
+
+Disaster-recovery runbooks for the EKS Auto Mode deployment mode (`create_eks_cluster = true`). Scenarios here involve significant state mismatch between Terraform and AWS — recovery requires state-level intervention, not just a re-run of `terraform apply`.
+
+Routine apply/destroy failures belong in [`TROUBLESHOOTING.md`](TROUBLESHOOTING.md) instead.
+
+## Out-of-band cluster deletion
+
+### Symptom
+
+`terraform plan` or `terraform apply` fails at the refresh step with an error like:
+
+```
+Error: reading EKS Cluster (<deployment_name>-eks): couldn't find resource
+```
+
+The EKS cluster no longer exists in AWS, but Terraform state still references it (and many Kubernetes/Helm resources that depended on it).
+
+### Cause
+
+The EKS cluster was destroyed outside Terraform — AWS console, a stray `aws eks delete-cluster`, an account-cleanup script, etc. The module's kubernetes/helm provider configuration reads cluster endpoint + CA from module outputs that trace back to the `aws_eks_cluster` resource; with the cluster gone, those outputs become unreadable, so refresh fails before Terraform can plan or apply anything.
+
+### Recovery
+
+1. List the orphaned Kubernetes and Helm resources in state:
+
+   ```
+   terraform state list | grep -E "kubernetes_|helm_release"
+   ```
+
+2. Remove each of them from Terraform state. They already don't exist in AWS/Kubernetes (the cluster is gone), so this is a pure state-cleanup operation:
+
+   ```
+   terraform state rm '<address_1>' '<address_2>' ...
+   ```
+
+3. Re-run `terraform apply`. Terraform plans a fresh creation of the cluster, Pod Identity associations, namespace, secret, and Helm releases.
+
+Expected runtime to recreate is similar to a fresh deploy (~15 minutes).
+
+### When this runbook does NOT apply
+
+`terraform destroy` handles in-band cluster deletion correctly — the dependency graph drains Kubernetes resources before destroying the cluster. This runbook is only needed when the cluster is destroyed out-of-band while in-cluster state still exists in Terraform.
+
+### Why the module accepts this failure mode
+
+The EKS module sources the kubernetes/helm provider configuration from module outputs rather than a `data.aws_eks_cluster` lookup. This is what enables single-apply bootstrap — on the first run, module outputs are "known after apply" and Terraform defers provider resolution until the cluster exists. A data source would've read at refresh (pre-plan) and failed the first `terraform plan`, requiring a two-step `-target`'d apply.
+
+The tradeoff: if the cluster goes missing between applies (out-of-band deletion), the same mechanism that deferred provider resolution now fails to read the missing cluster. The recovery ritual above is the cost. For the target audience of sophisticated self-hosted-data-plane operators this is an acceptable trade; a broader-audience module might choose two-step apply instead.