Terraform modules and example configurations for deploying the Poolside platform on AWS EKS. A single terraform apply provisions the AWS infrastructure, pushes the Poolside container images to ECR, uploads model checkpoints to S3, and installs the Poolside Helm releases.
See docs/architecture.md for a detailed walkthrough of the architecture.
| Profile | GPU | Cognito | Use case |
|---|---|---|---|
examples/platform-only |
No | Optional | Core platform; models hosted elsewhere or via external API |
examples/full |
Yes | Optional | Platform + local GPU inference |
- Prepare prerequisites. AWS account with admin-equivalent credentials, the Poolside Helm bundle extracted on disk, model checkpoint tarballs (full profile), a public DNS hostname, and an ACM certificate covering that hostname in your target region. See docs/prerequisites.md.
- Copy the example that matches your profile out of this repo, then fill in
terraform.tfvarsfromterraform.tfvars.example. - First
terraform applycreates the VPC, EKS cluster, node groups, RDS, S3, ECR (with Poolside images pushed via skopeo), KMS keys, IAM roles, ALB controller, External Secrets Operator, and uploads model checkpoints to S3. The Helm releases are gated off by default so the first apply exercises only infrastructure. - Flip
install_poolside_deployment = true(andinstall_inference_stack = truefor the full profile) interraform.tfvarsand re-apply. Terraform installs the Poolside Helm releases against the cluster. - Point your public hostname at the ALB. Create a Route 53 A (Alias) record. The ALB DNS name is on the ingress:
kubectl get ingress -n poolside. - Bind your identity provider. Visit
https://<your-hostname>and follow the on-screen prompts. With Cognito, retrieve the issuer URL, client ID, and client secret from the Terraform outputs.
Quickstart: docs/quickstart.md.
| Layer | Resources |
|---|---|
| Network | VPC, three-tier subnets (public, worker, control-plane), NAT gateways, S3 gateway endpoint |
| Compute | EKS cluster, CPU node group, optional GPU node group, managed addons |
| Data | RDS PostgreSQL (AWS-managed password), S3 buckets (data, access logs, models) |
| Security | KMS keys, least-privilege IRSA roles, permissions-boundary support |
| Registry | ECR repositories for all Poolside container images, populated from the bundle |
| Model checkpoints | Streaming upload of *.tar checkpoints into the models bucket (full profile) |
| Cluster setup | Kubernetes namespaces, gp3 StorageClass, ALB controller, External Secrets Operator, optional GPU Operator |
| Helm releases | poolside-deployment and (full profile) inference-stack, gated by capability flags |
| Auth | Optional Cognito user pool + client |
All modules live under modules/:
| Module | Purpose |
|---|---|
reference-stack |
Composition wrapper that wires all infra modules into a single deployable stack |
network |
VPC, subnets, NAT gateways, S3 endpoint |
eks |
EKS cluster, OIDC provider, managed addons, access entries |
eks-node-groups/cpu |
CPU (platform) node group |
eks-node-groups/gpu |
GPU (inference) node group with capacity-reservation support |
data-stores |
RDS PostgreSQL + S3 buckets |
ecr |
ECR repositories with bundle-driven image push |
iam |
Node group instance roles plus IRSA roles for cluster addons and Poolside workloads |
security |
KMS keys (EKS, RDS, S3, EBS, application encryption) |
cluster-bootstrap |
Namespaces, gp3 StorageClass, optional custom CA bundle |
ingress |
AWS Load Balancer Controller (Helm) |
gpu-operator |
NVIDIA GPU Operator (Helm) |
secrets-sync |
External Secrets Operator + RDS password sync |
cognito |
AWS Cognito user pool, client, domain |
model-checkpoints |
Streaming uploader for model checkpoint tarballs into S3 |
poolside-values |
Composes Helm values for the Poolside charts from infra outputs |
helm-wrapper |
Chart-agnostic helm_release wrapper used by the example roots |
This reference architecture is intentionally opinionated:
- Single
terraform applyprovisions infrastructure, pushes container images, uploads model checkpoints, and installs the Helm releases. - IRSA only. EKS Pod Identity is not used.
- ALB only. No nginx ingress controller.
- Public EKS API endpoint with a mandatory CIDR allowlist; the private endpoint is also enabled.
- AWS-managed RDS password. Never stored in Terraform state.
- KMS encryption for application secrets. No static key option.
- p5e.48xlarge as the minimum GPU instance type.
- Terraform 1.5.7 or later. Works with OpenTofu 1.x.
See docs/customizing.md for permissions boundaries, custom CA bundles, AMI overrides, and other tunable knobs.
- Architecture overview (covers both the platform-only and full profiles)
- Prerequisites
- Deployment guide
- Quickstart
- Authentication options
- Customizing your deployment (sizing, regulated environments, AMI overrides, BYO logging)
- Model checkpoints
