Skip to content

Add EKS Auto Mode deployment mode#233

Draft
Erik Weathers (erikdw) wants to merge 23 commits intomainfrom
erikdw/eks-auto-mode
Draft

Add EKS Auto Mode deployment mode#233
Erik Weathers (erikdw) wants to merge 23 commits intomainfrom
erikdw/eks-auto-mode

Conversation

@erikdw
Copy link
Copy Markdown

@erikdw Erik Weathers (erikdw) commented Apr 23, 2026

Add EKS Auto Mode deployment mode

Summary

Adds a create_eks_cluster = true deployment mode that provisions a complete Braintrust dataplane on EKS Auto Mode instead of the existing Lambda + EC2 path. All Braintrust workloads (API, brainstore reader / fastreader / writer) run in-cluster as pods and are deployed via the Braintrust Helm chart.

When enabled, the module owns:

  • A new EKS Auto Mode cluster (${deployment_name}-eks)
  • All cluster + node IAM roles
  • Pod Identity associations binding Kubernetes service accounts to AWS IAM roles
  • A custom Karpenter NodePool/NodeClass constraining Brainstore to NVMe-backed instance families (delivered via a small in-repo Helm chart)
  • A pre-created NLB and CloudFront VPC Origin, adopted by the AWS Load Balancer Controller at Helm release time
  • The Braintrust Helm release itself, with values rendered from Terraform

Everything else (VPC, RDS, ElastiCache, S3, KMS, API/Brainstore IAM) is shared with the existing Lambda/EC2 path.

The Lambda, EC2 Brainstore, and Lambda-URL ingress submodules are disabled in this mode (gated by use_deployment_mode_external_eks = true, which create_eks_cluster = true requires).

Why EKS Auto Mode

Auto Mode lets AWS manage the control plane add-ons (VPC CNI, CoreDNS, kube-proxy, Pod Identity Agent, AWS Load Balancer Controller, EBS CSI driver) and node lifecycle (via a managed Karpenter). The same capabilities on self-managed EKS mean owning the install, IAM configuration, version-compatibility matrix, and upgrade choreography for each addon — plus whichever subset of {Karpenter, Pod Identity Agent, EBS CSI, metrics-server} matches your feature choices. None of it is individually hard; collectively it's real recurring work on every cluster upgrade. Auto Mode hands that coordination surface to AWS in exchange for the managed-mode premium and some lost flexibility. This module therefore uses Auto Mode exclusively rather than self-managed EKS or the terraform-aws-modules/eks community module.

Usage

Minimal example

See examples/braintrust-data-plane-eks/ for the production-sized canonical config, or examples/braintrust-data-plane-eks-sandbox/ for a cheap disposable sandbox variant (smaller RDS, Redis, and a values.yaml alongside that shrinks the chart components to 1-replica with tight CPU/memory). The shortest working invocation:

module "braintrust-data-plane" {
  source = "../../"

  deployment_name        = "my-deployment"
  braintrust_org_name    = "my-org"
  brainstore_license_key = var.brainstore_license_key

  use_deployment_mode_external_eks = true
  create_eks_cluster               = true

  helm_chart_version = "6.1.0"

  # ...plus the usual postgres_*, redis_*, etc.
}

Single-apply bootstrap

terraform apply. One command.

Cold first-deploy runtime is ~15 minutes (cluster ~8-10, then RDS + Redis + Helm release). Subsequent applies are incremental.

Two design choices make the one-command path work:

  1. Provider config is sourced from module outputs, not data.aws_eks_cluster. The example's provider.tf reads module.braintrust-data-plane.eks_cluster_endpoint, eks_cluster_ca_certificate_data, and eks_cluster_name directly off the module. Terraform treats these as "known after apply" on the first run and defers provider resolution until the cluster exists. A data source, by contrast, reads at refresh (pre-plan) and would fail on a fresh deploy — that was the reason the first iteration of this module required a -target'd two-step apply.
  2. NodeClass + NodePool are delivered via helm_release, not kubernetes_manifest. kubernetes_manifest reads CRD schemas from the live cluster at plan time to validate manifests, which fails on a fresh deploy; Helm renders templates locally and applies at apply time, with no plan-time cluster dependency. The CRDs live in an in-repo chart at modules/eks-deploy/charts/brainstore-nodepool/.

Architecture

Module layout

New submodules:

  • modules/eks-cluster/ — EKS cluster, cluster+node IAM roles, NLB pre-creation, CloudFront VPC Origin wiring, CloudFront distribution.
  • modules/eks-deploy/ — the Kubernetes / Helm layer: namespace, braintrust-secrets Secret, Pod Identity associations, the brainstore-nodepool helm release (NodeClass + NodePool), and the braintrust helm release itself.

New in-repo Helm chart:

  • modules/eks-deploy/charts/brainstore-nodepool/ — tiny chart with just two templates (NodeClass + NodePool). Not published anywhere; lives with the Terraform source so the module is self-contained.

Top-level eks.tf wires the three submodules together. Root-level main.tf is touched only lightly (for services_common to receive the EKS cluster ARN for Pod Identity trust scoping).

Module ordering

module.eks_cluster → module.services_common → module.eks_deploy
  1. eks_cluster provisions the cluster and exports its ARN.
  2. services_common builds the API + Brainstore IAM roles with Pod Identity trust policies scoped to (cluster_arn, namespace, service_account).
  3. eks_deploy creates the Pod Identity associations binding SAs to roles, plus the namespace / Secret / brainstore-nodepool chart / Braintrust helm release.

This is why the EKS layer is split into two submodules rather than one: services_common is also used by the non-EKS path, so it can't live inside eks_deploy, and the role ARNs it produces are consumed by eks_deploy, so services_common can't live inside eks_cluster.

Key design decisions

  • Pod Identity, not IRSA. Auto Mode ships the Pod Identity Agent preinstalled. Pod Identity uses simpler trust policies, supports session tags, and doesn't require an OIDC provider. The module creates aws_eks_pod_identity_association resources for both the braintrust-api and brainstore service accounts. The chart still writes an IRSA-style eks.amazonaws.com/role-arn annotation on the service accounts; this is harmless because Pod Identity intercepts AWS SDK credential resolution before IRSA is consulted.

  • Pre-created NLB adopted by the Load Balancer Controller. The CloudFront VPC Origin needs the NLB ARN at plan time, but the Load Balancer Controller normally creates NLBs on demand when a Service becomes type: LoadBalancer. The module pre-creates the NLB in Terraform (aws_lb.api), and the chart's Service uses service.beta.kubernetes.io/aws-load-balancer-name + aws-load-balancer-security-groups annotations to have the controller adopt the existing NLB rather than create a new one. Security groups can only be attached to an NLB at creation time, which is why the NLB SG is also owned by Terraform.

  • Custom Brainstore NodePool. Brainstore caches to local NVMe SSD via emptyDir, so its pods need NVMe-backed EC2 families (c8gd, c7gd, m7gd, etc.). Auto Mode's default general-purpose NodePool doesn't constrain to those families, so the module adds a custom NodeClass + NodePool that does, and Brainstore pods target it via the braintrust.dev/node-pool: brainstore nodeSelector injected into the Helm values.

  • NodePool delivered via helm_release, not kubernetes_manifest. kubernetes_manifest reads CRD schemas from the live cluster at plan time. That's incompatible with single-apply bootstrap because the cluster doesn't exist yet on the first plan. Wrapping the two manifests in a tiny local Helm chart moves the cluster contact to apply time. The rendered objects are structurally identical to what kubernetes_manifest produced — verified by rendering the chart and diffing field-by-field against the old values, including the tricky aws:eks:cluster-name colon-key.

  • Provider config from module outputs, not data.aws_eks_cluster. The example's provider.tf reads eks_cluster_endpoint, eks_cluster_ca_certificate_data, and eks_cluster_name directly off the module. Terraform treats module outputs that trace back to "known after apply" resource attributes as unknown at plan time and defers provider resolution until the cluster exists. A data source reads at refresh (pre-plan) and would fail on a fresh deploy.

  • Exec auth for the Kubernetes/Helm providers, not static tokens. The example's provider.tf uses exec { aws eks get-token } rather than the simpler aws_eks_cluster_auth data source. The static-token pattern expires after 15 minutes — short enough to fail if an apply sits at an approval prompt or if the operator walks away between terraform plan and terraform apply. Exec auth refreshes on every API call and requires only the AWS CLI on the runner (which consumers need anyway, for aws eks update-kubeconfig).

Destroy choreography

TBD whether to ship this in the PR. The prepare_for_destroy mechanism below is implemented and validated, but it overlaps with the chart-level annotation already shipped in fc11624. If the chart annotation reliably propagates to the live TG on every supported chart version, this is belt-and-suspenders and could be cut to keep the surface area smaller. Keeping it earns its place when the chart annotation drifts (older chart, manual overrides, broken-state cluster) — exactly the case that motivated the manual kubectl patch runbook in TROUBLESHOOTING.md and produced an orphan TG on a real sandbox teardown. Decide before merge.

Tearing down an EKS-mode deployment is a two-step apply→destroy:

  1. Set prepare_for_destroy = true and run terraform apply.
  2. Run terraform destroy.

The preflight resources live in modules/eks-deploy/main.tf behind count = var.prepare_for_destroy ? 1 : 0:

  • kubernetes_annotations.api_drain_zero patches service.beta.kubernetes.io/aws-load-balancer-target-group-attributes on the api Service to deregistration_delay.timeout_seconds=0. Belt-and-suspenders: the chart's helm-values.yaml.tpl already sets this, but this resource forces the live annotation back to the right value if it ever drifted (older chart, manual override, broken state).
  • terraform_data.api_tg_drain_zero calls aws elbv2 modify-target-group-attributes directly on every TargetGroup tagged BraintrustDeploymentName=<deployment_name>. Faster path than waiting for the LB Controller's reconcile loop to propagate the annotation, and works even on TGs created before the annotation was set.

With drain wait at zero, the LB Controller releases its service.eks.amazonaws.com/resources finalizer the moment helm uninstall deletes the api Service, finishes its own TG cleanup, and helm_release.braintrust returns in seconds. No kubectl patch workarounds, no orphan TargetGroups left in AWS.

Why this exists: in earlier sandbox tear-downs, terraform destroy froze for ~5 min on the helm_release while LBC waited out the default 300s drain timer. The manual workaround (kubectl -n braintrust patch svc braintrust-api --type merge -p '{"metadata":{"finalizers":null}}') unblocks the destroy but interrupts the controller mid-cleanup, leaving an orphan TG behind. prepare_for_destroy is the supported alternative.

Scope: service infra only. Data-bearing resources have separate, explicit knobs:

  • RDSDANGER_disable_database_deletion_protection = true (existing) flips the RDS deletion_protection attribute. Required for terraform destroy to remove the database; intentionally not bundled into prepare_for_destroy.
  • S3 — buckets are deliberately not exposed as destroyable from Terraform. No force_destroy toggle, no DANGER_* flag. If you need to tear a sandbox down completely, empty the buckets manually first (aws s3 rm s3://<bucket> --recursive, then aws s3api delete-objects for non-current versions and delete-markers if versioning was enabled). The cost of a stray destroy hitting a data bucket is too high to mitigate with a flag.

New variables

All defaulted except helm_chart_version when create_eks_cluster = true.

Variable Default Purpose
create_eks_cluster false Master switch for this mode. Requires use_deployment_mode_external_eks = true.
eks_kubernetes_version "1.31" Kubernetes version for the EKS cluster.
eks_brainstore_nodepool_instance_families ["c8gd", "c7gd", "m7gd"] EC2 families Karpenter may pick from for Brainstore nodes. Must be NVMe-backed.
helm_chart_version null Required when create_eks_cluster = true. No default so chart upgrades are always deliberate.
eks_helm_values_file null Path to a YAML file with Helm values overrides, merged in after the module's rendered defaults. Idiomatic usage: eks_helm_values_file = "${path.module}/values.yaml" so the file lives alongside your main.tf. Leave null to accept chart defaults. See the braintrust-data-plane-eks-sandbox example for sandbox-sized values.
prepare_for_destroy false Pre-flight before terraform destroy. Flip true, apply, then destroy. Zeroes deregistration_delay on the LB Controller's TargetGroup(s) so the finalizer doesn't hang helm_release.braintrust on destroy and the controller cleans up its own TGs (no orphans). EKS-mode only. See Destroy choreography below and TROUBLESHOOTING.md.

New module outputs

The three starred outputs below are required by the example's provider.tf to configure the kubernetes/helm providers from module outputs instead of a data.aws_eks_cluster lookup — which is what enables single-apply bootstrap. The rest are broadly useful for downstream consumers wiring this module into larger deployments (IAM references for external Pod Identity associations, Postgres/Redis connection details for downstream Kubernetes Secret construction, S3 bucket names for downstream IAM policy templates, NLB identifiers, etc.).

Output Sensitive Purpose
eks_cluster_name no Cluster name (used in the aws eks get-token exec arg in provider.tf).
eks_cluster_endpoint no API server endpoint for the kubernetes/helm providers' host.
eks_cluster_ca_certificate_data yes Base64-encoded cluster CA. Consumed by the kubernetes/helm providers' cluster_ca_certificate (after base64decode()).
eks_cluster_security_group_id no Cluster SG (attached to Auto Mode nodes). Useful for authoring additional inbound rules from external sources.
eks_nlb_arn no ARN of the pre-created internal NLB adopted by the LB Controller.
eks_nlb_name no NLB name (referenced by the chart's aws-load-balancer-name annotation).
nlb_security_group_id no Security group attached to the NLB.
code_bundle_bucket_id no S3 bucket for code bundles.
lambda_responses_bucket_id no S3 bucket for lambda responses.
postgres_database_address no Postgres hostname.
postgres_database_port no Postgres port.
redis_endpoint no Redis hostname.
redis_port no Redis port.
api_handler_role_arn no IAM role ARN for the braintrust-api service account.
brainstore_iam_role_arn no IAM role ARN for the brainstore service account (also the EC2 role on the EC2-Brainstore path).

★ = required for the single-apply provider.tf pattern.

Module ↔ Helm chart contract

The module and chart are tightly coupled — several names, ports, keys, and paths have to match exactly on both sides. The full list is documented in CONTRACT.md (tested chart version 6.1.0, supported range 6.x). Highlights:

  • K8s Secret name braintrust-secrets and its keys (PG_URL, REDIS_URL, FUNCTION_SECRET_KEY, BRAINSTORE_LICENSE_KEY)
  • Service account names (braintrust-api, brainstore) used in both the Pod Identity associations and the chart
  • API container port 8000 — used by the CloudFront VPC Origin and by the cluster SG ingress rule that admits NLB traffic
  • The four aws-load-balancer-* service annotations the controller reads to adopt our pre-created NLB, plus aws-load-balancer-additional-resource-tags for deployment-scoped tagging of controller-created resources
  • Brainstore nodeSelector label braintrust.dev/node-pool: brainstore

Any of these moving or renaming on the chart side breaks us, often silently. Drift detection between the module and chart is a follow-up (below).

Tradeoffs accepted with single-apply bootstrap

Single-apply is strictly an improvement if the target audience can handle the failure modes below. For Braintrust's self-hosted data plane audience (sophisticated operators), the judgment is that it is. For less-experienced consumers, two-step would have been safer. This PR accepts the tradeoff.

# Tradeoff Likelihood Severity Recovery
1 Out-of-band cluster deletion (AWS console, cleanup script, aws eks delete-cluster) breaks terraform plan at refresh — the provider can no longer read eks_cluster_endpoint. Medium Medium terraform state rm the kubernetes_* and helm_release.* resources, then terraform apply to recreate. Full runbook in RECOVERY.md.
2 -targeted partial destroy of just the cluster orphans K8s state with no way to reach it. Low (anti-pattern) Medium Same as #1.
3 "Known after apply" values in provider config generate warnings on first plan. Terraform ≥1.3 tolerates this, but future provider/TF versions could tighten the rules. Low (fine today, future risk) Low Revert the provider config to a data.aws_eks_cluster lookup + -target two-step. Change is reversible.
4 Helm-managed NodeClass/NodePool. If the helm release secret for brainstore-nodepool gets corrupted or a customer kubectl deletes the CRs out of band, recovery is rougher than before. Low Low helm uninstall brainstore-nodepool -n braintrust + terraform apply.
5 Local chart templates don't validate at plan time. Syntax errors in charts/brainstore-nodepool/templates/*.yaml fail at apply time, not plan. Low Low Fix chart, re-apply. Could add helm lint to CI later.
6 CRD availability racehelm_release.brainstore_nodepool could theoretically try to create a NodeClass / NodePool before Auto Mode finishes installing the Karpenter CRDs. Very low Low Retry apply. Not observed; Auto Mode CRDs are present by the time the cluster is kubectl-reachable.

In-band terraform destroy still works correctly via the dependency graph — Terraform drains K8s resources first, then the cluster.

Known limitations / follow-ups

  • TG naming. Controller-generated TG name k8s-braintru-braintru-* still collides visually across deployments. The additional-resource-tags fix makes the tags disambiguating, but the name itself isn't configurable on the controller side. Low priority.
  • Drift detection between module and chart. CONTRACT.md enumerates the coupling surfaces, but there's no automated check. A CI smoke test that renders the chart against the module's template values and grep-asserts the known-good keys would prevent silent breakage on chart upgrades. Deferred.
  • Dedicated node per Brainstore pod. The EC2-Brainstore path runs each Brainstore component on its own instance; on EKS today multiple Brainstore pods can land on the same Karpenter-provisioned node (as we observed in testing — all 3 readers + the api pod on one c8gd.*). That's fine for sandbox throughput but defeats the isolation/headroom story of the EC2 path for production workloads. Fix lives in the Helm chart (pod anti-affinity on braintrust.dev/brainstore-role), not in this module. Deferred.
  • Explicit NodePool for the API component. Today the API pod has no nodeSelector and falls through to Auto Mode's default general-purpose NodePool, with opaque instance-family selection managed by AWS. Brainstore, by contrast, targets the module-owned brainstore NodePool pinned to NVMe-Graviton families. Parallel follow-up: add a second custom NodePool (api) in the local chart with its own eks_api_nodepool_instance_families variable (default non-NVMe Graviton compute: c8g/c7g/m7g). Brings the API pod under the same explicit control as Brainstore — predictable instance selection, consistent operational model, and enables per-pool tuning of disruption/consolidation policy (API pods tolerate more aggressive consolidation than Brainstore). Implementation is ~50 lines of chart YAML + one variable + one helm-values-template edit; guard the nodeSelector rendering on the pool being enabled to avoid a stuck-Pending footgun. Deferred.

Testing

End-to-end validated in a sandbox AWS account:

  • Fresh single-terraform apply succeeds from an empty AWS account.
  • All 4 pods (braintrust-api, brainstore-fastreader, brainstore-reader, brainstore-writer) reach Running 1/1.
  • Brainstore pods land on a Karpenter-provisioned node in the custom NodePool with an NVMe-backed instance type (c8gd.xlarge observed).
  • curl https://<cloudfront-domain>/ returns 200 OK + Hello World! from the API through CloudFront → NLB → pod.
  • Subsequent applies are idempotent and don't recreate the NLB or CloudFront.
  • Verified TG carries the BraintrustDeploymentName tag after the tagging commit.
  • Verified coexistence with a separate manual dataplane install in the same AWS account — no cross-deployment interference at the AWS resource layer (tags and cluster-scoping keep them isolated).
  • Verified the brainstore-nodepool chart renders to the exact same Kubernetes manifests as the previous kubernetes_manifest resources (field-by-field JSON diff, including the aws:eks:cluster-name colon-key YAML parse).
  • Ran smoke test from a real Braintrust org in the braintrust.dev UI:
    • Configured to use the CloudFront API URL
    • Created a service token
    • Successfully interacted with ChatGPT from the Playground

Erik Weathers (erikdw) and others added 21 commits April 22, 2026 18:38
Introduces `create_eks_cluster = true`, which provisions an EKS Auto Mode cluster and deploys the Braintrust Helm chart on it end-to-end. Uses raw AWS provider resources (no `terraform-aws-modules/eks` dependency) and EKS Pod Identity for pod-to-IAM binding.

## Why Auto Mode

Auto Mode collapses most of the yak shave for a production EKS deployment:

- **Node provisioning**: AWS runs Karpenter internally; no managed node group to define.
- **Core addons**: `vpc-cni`, `coredns`, `kube-proxy`, EBS CSI driver, and the AWS Load Balancer Controller come preinstalled — no `aws_eks_addon` resources, no LB Controller IAM role / Helm release.
- **Pod Identity**: the Pod Identity Agent ships built-in, enabling a simpler alternative to IRSA (no OIDC provider, no TLS-thumbprint wrangling, no `data.tls_certificate`).

This module therefore only has to own the cluster + node IAM roles, the VPC wiring, the pre-created NLB + CloudFront distribution, and the Braintrust-specific K8s objects.

## Structure

Two submodules under `modules/`, with a thin root-level wiring file (`eks.tf`). Both use only AWS provider primitives — no community module.

### `modules/eks-cluster/` — AWS infrastructure

- `aws_iam_role` for the cluster, with Auto Mode's five required managed policies attached: `AmazonEKSClusterPolicy`, `AmazonEKSComputePolicy`, `AmazonEKSBlockStoragePolicy`, `AmazonEKSLoadBalancingPolicy`, `AmazonEKSNetworkingPolicy`.
- `aws_iam_role` for Auto Mode nodes, with `AmazonEKSWorkerNodeMinimalPolicy` and `AmazonEC2ContainerRegistryPullOnly`.
- `aws_eks_cluster` with `compute_config`, `storage_config`, and `kubernetes_network_config.elastic_load_balancing` all enabled. `access_config.authentication_mode = "API"` uses EKS access entries (no aws-auth configmap).
- Pre-created internal NLB (`aws_lb`) with a CloudFront-prefix-list security group. NLB security groups cannot be attached after creation, so the module creates the NLB itself; the Auto-Mode-managed LB Controller adopts it later via the chart's `service.beta.kubernetes.io/aws-load-balancer-name` annotation.
- `aws_cloudfront_vpc_origin` wrapping the NLB, plus an `aws_cloudfront_distribution` whose default behavior routes to the EKS API and whose AI-proxy paths route to `braintrustproxy.com`.
- Private subnet tags (`kubernetes.io/role/internal-elb`) for LB Controller subnet auto-discovery.

### `modules/eks-deploy/` — Kubernetes + Helm

- `kubernetes_namespace` for Braintrust workloads.
- `kubernetes_secret` (`braintrust-secrets`) with `PG_URL`, `REDIS_URL`, `FUNCTION_SECRET_KEY`, `BRAINSTORE_LICENSE_KEY`. The name and keys are hardcoded by the chart.
- `aws_eks_pod_identity_association` resources for the `braintrust-api` and `brainstore` service accounts, binding each to its IAM role from `services_common`.
- `kubernetes_manifest` for a custom `NodeClass` (Auto Mode API: `eks.amazonaws.com/v1`) and `NodePool` (`karpenter.sh/v1`) that constrain Karpenter to NVMe-backed instance families (`c8gd`, `c7gd`, `m7gd` by default, configurable via `eks_brainstore_nodepool_instance_families`). Brainstore pods pin to this NodePool via a `braintrust.dev/node-pool: brainstore` nodeSelector in helm values.
- `helm_release` for the Braintrust chart, with a thin values template that sets only what this module owns and structured per-component overrides (`eks_api_helm`, `eks_brainstore_{reader,fastreader,writer}_helm`) plus a raw-YAML `eks_helm_chart_extra_values` escape hatch.

### Why two submodules

`services_common` creates IAM roles shared with the non-EKS (Lambda / EC2) deployment path, so it must sit at the root between `eks_cluster` (which provides the cluster ARN used to scope Pod Identity trust policies) and `eks_deploy` (which consumes the resulting role ARNs). Wrapping both EKS submodules in a single parent would create a module-level dependency cycle through `services_common`.

## Pod Identity (not IRSA)

Auto Mode's Pod Identity Agent intercepts AWS SDK credential resolution before IRSA is consulted, so pods authenticate via Pod Identity even though the chart still emits an `eks.amazonaws.com/role-arn` annotation (the IRSA path). The module:

- Sets `enable_eks_pod_identity = true` on `services_common` and passes it the cluster ARN. `services_common` builds a trust policy with the `pods.eks.amazonaws.com` principal, scoped via session tags (`aws:RequestTag/eks-cluster-arn`, `aws:RequestTag/kubernetes-namespace`) to this specific cluster and namespace.
- Creates an `aws_eks_pod_identity_association` for each service account, binding `(cluster, namespace, service-account)` to the IAM role.

No OIDC provider, no TLS cert thumbprint management.

## Changes to existing root files

- `main.tf`: `services_common` gets `enable_eks_pod_identity = true` + the EKS cluster ARN when `create_eks_cluster = true`. `database` and `redis` `authorized_security_groups` include the cluster's primary security group (which Auto Mode attaches to all nodes) in EKS mode.
- `outputs.tf`: `api_url` and `cloudfront_*` outputs resolve to the EKS CloudFront distribution in EKS mode. No other outputs added — existing output contract is otherwise unchanged.
- `variables.tf`: new EKS knobs (`create_eks_cluster`, `eks_kubernetes_version`, `eks_brainstore_nodepool_instance_families`, `helm_chart_version`, and the four structured helm-override variables plus the raw-YAML escape hatch).
- `versions.tf`: notes that `kubernetes`, `helm`, and `random` are declared in `modules/eks-deploy`. Non-EKS consumers must still declare empty provider blocks at the root because Terraform aggregates provider requirements across all submodules regardless of `count`, but the underlying resources are never evaluated when `create_eks_cluster = false`.

## Example

`examples/braintrust-data-plane-eks/` is a thin consumer — provider configuration plus a single module call — demonstrating the two-step apply workflow required on a fresh deployment:

    terraform apply -target=module.braintrust.module.eks_cluster[0]
    terraform apply

Step 1 creates the cluster so the `data.aws_eks_cluster` lookup in `provider.tf` (keyed by the statically-knowable name `${deployment_name}-eks`) can resolve. Step 2 plans the kubernetes and helm resources, including the `kubernetes_manifest` NodeClass/NodePool which require Auto Mode's CRDs to exist on the cluster before plan time.

The kubernetes and helm providers use a static token from `data.aws_eks_cluster_auth`. Step 2's runtime is well under the 15-minute token TTL because Auto Mode's in-cluster setup is fast and `helm_release` defaults to a 5-minute wait timeout.

## Contract

`CONTRACT.md` documents the coupling surface between this module and `braintrustdata/helm`: service account names, `Secret` name and keys, API port `8000`, the helm-values schema this module writes, the Pod-Identity-over-IRSA precedence, and the assumption that `brainstore.fastreader.replicas >= 1`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`braintrustdata/helm` released `6.1.0` today. Audit of the 5.0.1 → 6.1.0 diff against this module's coupling surface (service account names, `braintrust-secrets` name and keys, API port `8000`, values-schema keys the template writes, `brainstore.{reader,fastreader,writer}.nodeSelector`) came back clean — nothing on the contract moved.

What actually changed in 6.x:

- Image tags bumped `v1.1.32` → `v2.0.0` (chart semver policy treats image major bumps as chart major bumps).
- `skipPgForBrainstoreObjects` and `brainstoreWalFooterVersion` are now top-level `values.yaml` defaults (this module was already writing them at top-level, so no template change needed).
- Chart now emits additional Brainstore env vars derived from existing values (`BRAINSTORE_RESPONSE_CACHE_URI`, `BRAINSTORE_CODE_BUNDLE_URI`, `BRAINSTORE_ASYNC_SCORING_OBJECTS`, `BRAINSTORE_LOG_AUTOMATIONS_OBJECTS`, `BRAINSTORE_WAL_USE_EFFICIENT_FORMAT`) and adds `checksum/config` annotations on deployments so pods restart when the configmap changes.

None of it requires a template, values-schema, or variable change on this side.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Aligns `examples/braintrust-data-plane-eks/` with the style of the other examples in this directory (`braintrust-data-plane/`, `braintrust-data-plane-sandbox/`): literal values in `main.tf`, `variables.tf` reduced to just the sensitive/per-deployment `brainstore_license_key`.

Before, the example had a variable for every knob it set on the module (`deployment_name`, `braintrust_org_name`, `helm_chart_version`, `eks_namespace`, `brainstore_wal_footer_version`, `skip_pg_for_brainstore_objects`), which meant a user copying the example had to wire up `.tfvars` or `-var` flags for all of them. Now the example ships with sensible defaults as literals, users edit the values directly in their copy of `main.tf`, and only the license key flows through a variable (consistent with the sandbox and production examples).

Also:

- Module block renamed `module "braintrust"` → `module "braintrust-data-plane"` to match the other examples' naming.
- `helm_chart_version = "6.1.0"` pinned as a literal.
- `eks_cluster_name` local in `provider.tf` hardcoded to `"braintrust-eks"` with a comment noting it must match `${deployment_name}-eks` from `main.tf` (the `var.deployment_name` reference was dropped along with the variable).
- Output references updated to the new module block name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`provider.tf` needs the EKS cluster name to configure the kubernetes and helm providers' `data.aws_eks_cluster` lookup. Previously this was a hardcoded literal (`"braintrust-eks"`) with a comment asking the user to keep it in sync with `deployment_name` in `main.tf`. That split-source-of-truth bit in practice: changing `deployment_name` in `main.tf` without also updating `provider.tf` silently points the providers at a nonexistent cluster, and step 2 of the two-step apply fails.

Move `deployment_name` into a `locals` block at the top of `main.tf`. Terraform merges locals across files in the same module, so `provider.tf` can compute `eks_cluster_name = "${local.deployment_name}-eks"` without duplicating the string. One place to edit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Learnings from standing up the Auto Mode deployment for the first time:

- aws_eks_cluster: set bootstrap_self_managed_addons = false. Auto Mode
  rejects CreateCluster otherwise, since its built-in addons conflict
  with the self-managed bootstrap path.
- Brainstore NodeClass: scope subnetSelectorTerms to this deployment's
  VPC via BraintrustDeploymentName. `kubernetes.io/role/internal-elb`
  alone matches subnets in other VPCs (default VPC, other clusters in
  the same region), making Karpenter pick a subnet in the wrong VPC
  and fail RunInstances with a cross-VPC SG/subnet error.
- Brainstore NodeClass: drop custom tags. AmazonEKSComputePolicy gates
  ec2:CreateLaunchTemplate on a tag-key allowlist; any extra key fails
  the controller's IAM pre-check.
- Brainstore NodePool: switch instance-family requirement key from
  karpenter.k8s.aws/instance-family to eks.amazonaws.com/instance-family.
  Auto Mode restricts requirement domains and the karpenter.k8s.aws one
  isn't accepted.
- helm_release timeout: bump to 1200s. Cold first deploys take longer
  than the 300s default (Karpenter node provisioning + three large
  Brainstore image pulls + readiness).
- VPC private-subnet lifecycle: ignore_changes on the
  kubernetes.io/role/internal-elb tag so Terraform doesn't fight
  aws_ec2_tag (from modules/eks-cluster) on every apply.
- Example provider.tf: switch kubernetes/helm auth from the 15-min
  static aws_eks_cluster_auth token to exec { aws eks get-token } so
  long applies and extended approval-prompt pauses don't fail with
  expired-token errors.
- Example main.tf: expand the two-step-apply doc comment with the
  zsh-globbing caveat, and explain the 400 GB gp3 IOPS/throughput
  threshold that trips up smaller (sandbox) postgres_storage_size
  values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto Mode's Load Balancer Controller uses `ip` target-type for NLBs,
which sends health checks and traffic directly to pod IPs on the
container port (8000) — not via the NodePort on a node IP. When the
NLB SG is pre-created and attached via the
`aws-load-balancer-security-groups` annotation, the controller only
opens the NodePort range on the cluster SG (the rule it would need for
`instance` target-type) and leaves the container port unreachable.
Result: TCP health checks time out, TG stays unhealthy, NLB has no
backends, CloudFront hangs.

Replace the NodePort-range rule (30000-32767) with a single TCP 8000
rule from the NLB SG to the cluster SG. NodePort wasn't being used by
the `ip` target-type path anyway, so removing it is safe and avoids
carrying a misleading rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LB Controller names TargetGroups auto-generated from the Service
as `k8s-<ns-8>-<svc-8>-<hash>`, and doesn't expose an override. For a
Braintrust dataplane the namespace and service names are fixed
(braintrust/braintrust-api), so every deployment in an AWS account
ends up with TGs named `k8s-braintru-braintru-*` — visually
indistinguishable in the console even though they're functionally
isolated by the controller's cluster-scoping tag.

Add the `aws-load-balancer-additional-resource-tags` annotation so the
controller tags its TGs (and listeners) with BraintrustDeploymentName,
matching the tag scheme we already use on Terraform-owned resources.
Now `tag:BraintrustDeploymentName` is a reliable way to identify all
AWS resources belonging to a specific dataplane deployment.

Wire deployment_name into the helm-values template to pass through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The example only showed api and brainstore writer overrides; reader and
fastreader were undocumented even though they have the same structured
override variables. Add them so all four chart components have a
copy-pasteable sandbox sizing example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two blockers made the initial EKS deploy require a -target'd two-step
apply:

1. The example's provider.tf looked up the cluster via
   data.aws_eks_cluster, which reads at refresh (pre-plan) and fails
   if the cluster doesn't exist yet. There's no way to "defer" a data
   source read through the initial plan.

2. The NodeClass and NodePool were delivered via kubernetes_manifest,
   which reads CRD schemas from the live cluster at plan time to
   validate the manifest. On a fresh deploy the cluster doesn't exist
   and the plan fails.

Neither has to be this way:

1. Expose the cluster endpoint, CA data, and name as root-module
   outputs. The example's provider.tf reads those instead of the data
   source. Terraform treats module outputs that trace back to unknown
   resource attributes as "known after apply" and defers provider
   resolution — no data source, no refresh-time failure.

2. Replace kubernetes_manifest for the NodeClass + NodePool with a
   helm_release pointing at a tiny local chart
   (modules/eks-deploy/charts/brainstore-nodepool/). Helm renders
   templates locally and applies at apply time, so there's no
   plan-time cluster contact.

Result: single `terraform apply` from an empty AWS account brings up
everything — VPC, cluster, RDS, Redis, S3, IAM, NodeClass/NodePool,
Braintrust Helm release — in one command.

Tradeoff we accepted: if the cluster is destroyed out of band while
Terraform state still references in-cluster resources, refresh will
fail because the cluster outputs become unreadable. Recovery is
`terraform state rm` of the kubernetes_*/helm_release resources
followed by `terraform apply`. In-band `terraform destroy` is handled
correctly by the dependency graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The EKS CloudFront distribution was unconditionally routing
/function/*, /v1/proxy*, and /v1/eval* to the CloudflareProxy origin
(braintrustproxy.com). For a self-hosted dataplane this is wrong on
two counts:

- Request payloads round-trip through Braintrust's hosted proxy rather
  than staying inside the customer's AWS account — defeating a core
  reason for self-hosting.
- The preflight OPTIONS that browsers send for these paths hits a
  Cloudflare 404 with no CORS headers, so the UI (braintrust.dev) fails
  every cross-origin request to these paths.

Fix: default `target_origin_id` for those path patterns to
`EKSAPIOrigin` (the in-cluster API pod via the NLB — standalone-api
serves these paths in Dataplane 2.0). Mirrors the Lambda ingress
module's default behavior, where paths route to the local AIProxy
Lambda unless `use_global_ai_proxy = true`.

Expose the same `use_global_ai_proxy` toggle so both modes have
identical semantics — opt-in to `braintruestproxy.com` if Braintrust
instructs, otherwise stay local.

The root-level `var.use_global_ai_proxy` already existed (shared with
the Lambda path); this wires it through to the EKS cluster submodule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The example's `locals { deployment_name = ... }` block only existed so
provider.tf could derive the EKS cluster name from it without a
duplicate literal. Since provider.tf now reads the cluster name from
module outputs directly (`module.braintrust-data-plane.eks_cluster_name`),
the local has no remaining cross-file use. Fold the constant back into
the module call and carry the comment on the attribute instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed failure mode: `terraform destroy` freezes for ~5 minutes on
helm_release.braintrust because the LB Controller holds the
`service.eks.amazonaws.com/resources` finalizer on the api Service
while it waits for target-group drain to complete. The default
deregistration delay is 300s. In failure-mode states (cluster never
had nodes register, a failed helm install, pods never reached Ready)
the drain wait is spent on nothing — there are no targets to drain —
but LB Controller respects it anyway. To the operator, `terraform
destroy` looks hung; the hang resolves only after a manual
`kubectl patch svc ... --patch '{"metadata":{"finalizers":null}}'`.

Hit this class three times now: yesterday on the 2nd deployment, and
today on both a failed redux apply and the intentional destroy of the
2nd deployment.

Fix: annotate the api Service with
`aws-load-balancer-target-group-attributes: deregistration_delay.timeout_seconds=0`.
Zero drain wait means the NLB deregisters targets instantly, the
finalizer clears immediately, and `helm uninstall` (and therefore
`terraform destroy`) converges in seconds.

Safe for production: the drain delay exists to let in-flight
connections finish before a target is removed. For a stateless HTTP
API fronted by CloudFront (which retries on connection failure), a
few aborted connections on scale-in or destroy are acceptable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `use_global_ai_proxy` toggle was carried over from the Lambda
ingress module, where it exists to let Braintrust's own
multi-tenant SaaS deployment route through braintrustproxy.com
instead of the local AIProxy Lambda. For self-hosted customers
there's no reason to route through Braintrust's hosted proxy —
doing so defeats the point of self-hosting and requires
Braintrust-side registration of the customer's deployment to
work at all.

Hardcode the LLM-proxy path routing to the in-cluster API
(`EKSAPIOrigin`), remove the `CloudflareProxy` origin from the
distribution entirely, and drop the `use_global_ai_proxy` variable
from the EKS cluster submodule. Lambda mode is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`terraform destroy` on a non-empty dataplane currently requires two
manual cleanup steps before it'll succeed:

1. RDS instance has deletion_protection=true by default. Destroy errors
   with `InvalidParameterCombination: Cannot delete protected DB
   Instance`. Fix today: `aws rds modify-db-instance
   --no-deletion-protection --apply-immediately` out of band.
2. S3 buckets are versioned and non-empty (especially the Brainstore
   bucket which accumulates WAL + cache). Destroy errors with
   `BucketNotEmpty: The bucket you tried to delete is not empty. You
   must delete all versions`. Fix today: write a loop against
   list-object-versions + delete-objects for every bucket.

For real customer deployments this safety is the right default — it
prevents accidental data loss on a typo'd destroy. For sandbox / CI /
throwaway deployments the friction is painful.

New root variable `force_destroy_data` (default: false). When true:

- Every S3 bucket gets `force_destroy = true`, so destroy empties the
  bucket (all versions + delete markers) before deleting it.
- RDS deletion_protection is disabled (OR'd with the existing
  `DANGER_disable_database_deletion_protection` toggle).
- RDS `skip_final_snapshot = true`, so destroy doesn't block on
  snapshot creation.

Default stays at false, so existing consumers are unaffected. Sandbox
users set `force_destroy_data = true` in their example main.tf and
subsequent destroys are a single command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore a set of root-module outputs that were previously pruned.
They're broadly useful for consumers wiring this module into larger
deployments — IAM role ARNs (for Pod Identity / IRSA references),
Postgres and Redis connection details (for Kubernetes Secret
construction from the root module's state), S3 bucket names (for
downstream IAM policy templates), and EKS NLB identifiers.

Omitted three outputs that appeared in earlier iterations but don't
apply to Auto Mode:

- eks_oidc_provider_arn — Auto Mode uses Pod Identity; there's no OIDC
  provider resource.
- eks_node_security_group_id — we don't create a dedicated node SG;
  Auto Mode attaches the cluster SG to nodes. Expose
  eks_cluster_security_group_id instead.
- eks_lb_controller_role_arn — Auto Mode owns the LB Controller;
  there's no IAM role to expose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Terraform state already stores these values; exposing them as root
outputs amplifies the blast radius:

- `terraform_remote_state` consumers pull values into a second state
  file.
- `terraform output -json` in CI pipelines writes them to stdout/logs
  unredacted (`sensitive = true` only suppresses plaintext at the CLI,
  it doesn't scrub downstream logging).

Removed:

- `postgres_database_password` — the database module already creates a
  Secrets Manager secret; consumers can resolve credentials via
  `postgres_database_secret_arn` (still exposed), which is the
  canonical path.
- `function_tools_secret_key` — Braintrust-internal encryption key
  used only by our own `kubernetes_secret.braintrust`. External
  consumers have no legitimate need for it.

`eks_cluster_ca_certificate_data` stays — it's marked sensitive
upstream but is a public CA cert by definition, and our own
provider.tf consumes it from module outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Too dangerous a footgun to keep in the module. A consumer who
accidentally flags `force_destroy_data = true` (or leaves it on after a
test destroy and then starts using the deployment for real) would have
no safety rails — all customer data evaporates on the next `destroy`
with no final snapshot and no S3 version retention.

Sandbox teardown friction is real but narrowly felt (just the TF
module authors); the risk is broadly felt (every consumer). Prefer
operators to run the same `aws s3api` version-delete loop and
`aws rds modify-db-instance --no-deletion-protection` ceremony we
ran during development — it's slower but impossible to trigger by
accident.

This reverts commit 1c350b2d; PR description updated separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove `data "aws_region" "current"` from modules/eks-deploy/main.tf;
  it was imported in the early days of the module and never actually
  referenced in any rendered value.
- Remove the `custom_tags` variable from modules/eks-deploy/variables.tf
  and its unused pass-through in eks.tf. The eks-deploy submodule
  doesn't own any AWS resources (only Kubernetes + helm_release), so
  custom AWS tags have no effect there.

Also incorporates the `terraform fmt -recursive` whitespace fixes on
main.tf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the four structured per-component variables
(`eks_api_helm`, `eks_brainstore_{reader,fastreader,writer}_helm`)
and the `eks_helm_chart_extra_values` heredoc string with a single
`eks_helm_values_file` variable pointing at a YAML file alongside
the caller's main.tf.

Why:

- The four structured variables only covered two specific fields
  (replicas, resources) on four specific components. Anyone tweaking
  anything else (annotations, probes, env, image pins, nodeSelector)
  was forced into the heredoc escape hatch. The structured-vars
  abstraction was a half-abstraction.
- Helm's native interface is "a list of values files." Collapsing to
  "module defaults + one caller-supplied values file" matches the
  mental model customers already have from `helm install -f values.yaml`.
- Heredocs in HCL are unwieldy — no YAML lint, no IDE support, not
  shareable between deployments. A separate `.yaml` file fixes all
  three.

Mechanics: the submodule accepts a filename (not a `file()` result).
Path is interpreted by `file()` inside the submodule — use
`${path.module}/values.yaml` or an absolute path on the caller side.

Also adds `examples/braintrust-data-plane-eks-sandbox/` — a cheap
disposable-sandbox variant of the existing EKS example. Smaller RDS
(`db.r8g.large` / 100GB / gp3 baseline), smaller Redis
(`cache.t4g.small`), and a `values.yaml` that shrinks every chart
component to 1 replica with tight CPU/memory so the whole dataplane
fits on a single small Karpenter-provisioned node. Matches the
existing `braintrust-data-plane` / `braintrust-data-plane-sandbox`
pattern elsewhere in the examples directory.

Removes `modules/eks-deploy/overrides.tf` (the locals that
synthesized YAML from the structured variables — no longer needed).

CONTRACT.md updated to point at `eks_helm_values_file` for the
fast-reader opt-out warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before-GA doc coverage items that came up in review prep:

- `modules/eks-cluster/README.md`: terse "what this submodule owns"
  overview + outputs table + key variables, so readers browsing the
  module on github/registry have a landing page rather than raw .tf.
- `modules/eks-deploy/README.md`: same for the K8s/Helm layer, plus
  an explanation of the in-repo `brainstore-nodepool` chart (why we
  use helm_release instead of kubernetes_manifest for the NodeClass +
  NodePool) and the helm-values merge precedence.
- `CONTRACT.md` "Deployment isolation" section: explicit note that
  `deployment_name` must be unique per account+region. Enumerates the
  resources that'd collide, confirms multiple-deployments-with-distinct-names
  is supported and validated, and points at the cosmetic LB Controller
  TG-name overlap (disambiguated via BraintrustDeploymentName tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two discoverability gaps that make the EKS mode hard to find / recover
from, surfaced during PR review prep:

- Root README.md previously didn't mention EKS mode at all. Readers
  browsing the module on github or the registry would have no signal
  that the create_eks_cluster = true path exists. Added a one-paragraph
  subsection under "How to use this module" pointing at the prod + sandbox
  examples and the new TROUBLESHOOTING.md.
- TROUBLESHOOTING.md promotes the EKS-mode recovery ritual from the PR
  description (where it'd evaporate after merge) into a durable
  operator-facing doc. Covers the four failure modes we actually hit
  during development: out-of-band cluster deletion + state-rm recovery,
  helm_release destroy hanging on the Service finalizer, EIP quota
  exhaustion on fresh apply, and pods stuck Pending due to broken NAT.
  Also notes that the existing Lambda-mode dump-logs.sh script does not
  cover EKS mode (observability parity is a tracked follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Erik Weathers (erikdw) and others added 2 commits April 24, 2026 15:11
Four .md-audit-pass fixes and one structure change:

- CONTRACT.md: "the four NLB annotations" → explicit list of the six
  now-present (`-scheme`, `-type`, `-security-groups`, `-name`,
  `-additional-resource-tags`, `-target-group-attributes`). Matches the
  current helm-values.yaml.tpl.
- CONTRACT.md: deployment-isolation section dropped a dangling "See
  the 'TG naming' follow-up in the PR description" reference; the
  paragraph now explains the cosmetic collision + tag-disambiguation
  story inline, so it survives PR merge.
- TROUBLESHOOTING.md: dropped a dangling "See the PR description's
  'Remaining challenges' section" reference in the dump-logs.sh note;
  observability gap is now described inline.
- README.md (dump-logs.sh section): added a note that the script
  covers only the Lambda/EC2 deployment mode, pointing at
  TROUBLESHOOTING.md + RECOVERY.md for EKS-mode runbooks.

Plus: the out-of-band-cluster-deletion runbook is promoted from a
buried section in TROUBLESHOOTING.md to its own top-level
RECOVERY.md. It's a disaster-recovery scenario (state mismatch
requiring state-level intervention), distinct from the routine
apply/destroy failures that TROUBLESHOOTING.md collects. Cross-refs
between the two docs so readers landing on the wrong one get
redirected. RECOVERY.md also includes a "why the module accepts
this failure mode" note explaining the single-apply-bootstrap
tradeoff that makes this scenario possible.

README's EKS-mode signal now points at both TROUBLESHOOTING.md and
RECOVERY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Status: implemented and validated end-to-end on the erikdw-sandbox-5
teardown, but unclear whether to ship in this PR. The chart-level
annotation already added in fc11624 covers the same drain-wait
finalizer hang for fresh deploys; this adds a redundant module-level
preflight that catches the failure when the chart annotation didn't
propagate (older chart, manual override, broken state). Decide before
merge whether the broader coverage is worth the extra surface area.

What it does, when var.prepare_for_destroy = true (default false):

- kubernetes_annotations.api_drain_zero forces the api Service
  annotation `service.beta.kubernetes.io/aws-load-balancer-target-group-attributes`
  to `deregistration_delay.timeout_seconds=0`. Same key the chart
  template (helm-values.yaml.tpl) already sets — this resource only
  matters if the live annotation drifted.

- terraform_data.api_tg_drain_zero loops over every TargetGroup
  tagged `BraintrustDeploymentName=<deployment_name>` and calls
  `aws elbv2 modify-target-group-attributes` to set the same
  attribute directly. Faster path than the LB Controller's reconcile
  loop, and covers the case where the controller created the TG
  before our annotation propagated.

With drain wait at zero, the LB Controller releases its
`service.eks.amazonaws.com/resources` finalizer the moment helm
uninstall deletes the Service, finishes its own TG cleanup, and
helm_release.braintrust returns in seconds. No kubectl-patch
workaround needed, and no orphan TGs left in AWS.

Why this exists: on the erikdw-sandbox-5 teardown, terraform destroy
hung on helm_release.braintrust for ~10 minutes (past the default
5-min drain timer, suggesting the chart annotation never made it to
the live TG on chart 6.1.0). Manual `kubectl patch svc braintrust-api
... finalizers:null` unblocks the destroy but interrupts the LBC
mid-cleanup, leaving an orphan TG (`k8s-braintru-braintru-*` tagged
with the deployment name). prepare_for_destroy avoids both problems.

Scope is service infra only. Data-bearing resources keep their
separate, explicit knobs:
- RDS: DANGER_disable_database_deletion_protection (existing)
- S3:  deliberately not destroyable from TF — emptying buckets is
       a manual operator step before destroy. No DANGER_* flag, no
       force_destroy var. Matches the prior 398f997 revert of
       `force_destroy_data`.

Files:
- variables.tf: add `prepare_for_destroy` (root)
- eks.tf: plumb through to module.eks_deploy
- modules/eks-deploy/variables.tf: declare the var
- modules/eks-deploy/main.tf: add `data.aws_region.current`,
  `kubernetes_annotations.api_drain_zero`, and
  `terraform_data.api_tg_drain_zero` (both gated by count)
- TROUBLESHOOTING.md: prepare_for_destroy is now the documented
  happy path; the manual kubectl-patch runbook stays as the
  in-flight recovery, with a tag-driven cleanup snippet for the
  orphan TG that workaround leaves behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant