How to tune the reference architecture for your environment without forking it.
The reference architecture composes three modules in your operator root:
├── reference-stack — infra (VPC, EKS, RDS, S3, IAM, ECR, model uploads)
├── poolside-values — chart-specific glue (YAML values for Poolside Helm charts)
└── helm-wrapper (x2) — chart-agnostic helm_release runner
Most customizations happen via reference-stack variables (infra
sizing and toggles) or poolside-values inputs (chart-level
overrides). You rarely touch the helm-wrapper calls.
The examples/full and examples/platform-only roots expose a small
set of variables (deployment_name, region, public_hostname,
containers_dir, etc.). That's enough to stand up a deployment from a
tfvars file. Most reference-stack inputs fall to their module
defaults, which you won't see in the example's variables.tf.
To customize a knob that isn't already a variable, copy the example root and pick one of two patterns:
- Edit
module "stack"inline: add the input directly to the block. Good for a one-shot deployment. - Add a passthrough variable: declare a new variable in
variables.tf, wire it intomodule "stack", surface it interraform.tfvars. Good when multiple deployments share a root.
The full example already has commented-out GPU-sizing lines in
module "stack" showing the inline pattern.
The example surfaces a single inference_models variable mirroring the
chart's inference.models.<key> schema. To override a model's GPU
count or other fields, declare the full set of models you want and pass
whatever per-model fields the chart accepts:
inference_models = {
malibu = { model = "s3://...", gpus = 4 }
point = { model = "s3://...", gpus = 2 }
"laguna-xs" = { model = "s3://...", gpus = 1 }
}Keys become chart Deployment names verbatim. For well-known aliases
(malibu, point, laguna-m, laguna-xs, laguna), missing
optional fields (modelName, modelType, gpus) fall back to built-in
defaults. For unknown keys, the operator must supply all required fields.
Other fields you can pass:
inference_models = {
malibu = {
model = "s3://..."
gpus = 4
modelName = "MyOrg-Malibu-Variant"
modelType = "agent"
modelExtraArgs = {
"distributed-executor-backend" = "mp"
}
}
}If inference_models is left null (the example's default), the example
auto-derives one entry per uploaded tarball using the first-hyphen rule
described in model-checkpoints.md.
cpu_instance_type = "m5.8xlarge" # default m5.4xlarge
cpu_desired_size = 5
cpu_max_size = 10
cpu_ebs_volume_size_gib = 200# Still creates node group + GPU Operator, but desired size = 0
gpu_desired_size = 0
# Or disable GPU provisioning entirely
enable_gpu_node_group = falseNote: enable_gpu_node_group = false means no inference workloads
will schedule even if tarballs exist. For platform-only (no
inference) deployments, use the platform-only profile instead. It's
cheaper and skips several other GPU-adjacent resources.
gpu_instance_type = "p5.48xlarge" # default p5e.48xlarge
gpu_capacity_reservation_id = "cr-xxxxxxxxxx" # pin to a reservationp5e.48xlarge is the recommended minimum. Smaller GPUs (p4, g5) may not have enough memory for the Malibu/Point models.
cluster_endpoint_public_access_cidrs = [
"203.0.113.0/24", # your office VPN
"198.51.100.5/32", # your bastion
]The list must be non-empty (an empty list fails plan). To open the
API to the entire internet, pass ["0.0.0.0/0"] explicitly. This is
discouraged for any deployment with production workloads.
admin_principal_arns = [
data.aws_iam_session_context.current.issuer_arn,
"arn:aws:iam::<acct>:role/AWSReservedSSO_AdministratorAccess_*",
]These get EKS cluster-admin access via Access Entries (not the legacy
aws-auth ConfigMap).
allow_locked_out_cluster = false # defaultWhen false (default), plan fails if the running principal isn't in
admin_principal_arns. This prevents you from creating a cluster you
can't access. Set true only if you're explicitly handing over
control to another role.
If your environment intercepts TLS (corporate proxy, private PKI):
custom_ca_bundle_pem = file("./certs/ca-chain.pem")Terraform creates a ConfigMap (custom-ca-bundle in the poolside
namespace), and the chart mounts it into pods at
/etc/ssl/certs/custom-ca-bundle.crt.
permissions_boundary_arn = "arn:aws:iam::<acct>:policy/my-boundary"Every IAM role created by the reference architecture gets this
boundary attached: the direct roles in modules/iam/ (CPU/GPU node
group instance roles, core-api pod, inference pod, External Secrets
Operator), the module-managed addon roles (ALB controller, VPC CNI,
EBS CSI driver, via terraform-aws-modules/iam/aws), and the EKS
cluster role created by the upstream terraform-aws-modules/eks/aws
module. Common in PubSec / FedRAMP environments where SCPs mandate a
boundary on every new role. Default empty = no boundary attached.
iam_name_prefix = "MyOrg-"Prepends the prefix to every IAM role and policy name.
The reference architecture is HTTPS-only: the ALB terminates TLS and the ACM cert
is mandatory. Both example roots look up the cert by
public_hostname:
public_hostname = "poolside.example.com"public_hostname is required (no default, no empty value allowed)
and must be a valid DNS hostname. An ACM cert covering it must be
issued in var.region before terraform apply. The
data "aws_acm_certificate" lookup fails plan otherwise. There is
no HTTP-only fallback.
After apply, point the public hostname at the ALB by creating a
Route 53 A (Alias) record. The ALB DNS name is on the ingress
resource: kubectl get ingress -n poolside.
See model-checkpoints.md Mode B.
Not supported. The reference architecture is opinionated: ECR only. If you need a different registry, the pattern is:
- Fork the reference architecture, replace
modules/ecrwith your own pushing module - Keep everything else
- Point
poolside-values'secrinput at your registry's outputs
Non-trivial. Open an issue if this matters to you.
The default install_poolside_deployment = false and
install_inference_stack = false are development-workflow niceties.
For routine redeployments where the code is stable, just set both to
true in your tfvars.
database_instance_class = "db.m7g.xlarge" # default
database_multi_az = true # default true
database_allocated_storage_gib = 200 # default 64The default db.m7g.xlarge Multi-AZ instance is sized for production
use. Override only if you need to scale further up, or down for
non-production environments. database_multi_az = true is strongly
recommended for any deployment that has an SLA.
use_s3_transfer_acceleration = trueEnables the transfer-acceleration endpoint on the models bucket. Worth it when the operator's machine is geographically distant from the models bucket's region (e.g. Europe ↔ us-east-2). Costs a fraction of a cent per GB; negligible next to the bandwidth it saves.
Default: AWS-managed EKS-optimized AL2023 AMIs (AL2023_x86_64_STANDARD
for CPU nodes, AL2023_x86_64_NVIDIA for GPU nodes). Two override
patterns, in order of preference:
# Pin to a specific managed AMI release (still managed, version-locked)
cpu_ami_release_version = "1.32.0-20251120"
gpu_ami_release_version = "1.32.0-20251120"
# OR pick a different managed AMI family (e.g. Bottlerocket, Graviton)
cpu_ami_type = "BOTTLEROCKET_x86_64" # default AL2023_x86_64_STANDARDBring-your-own AMI is also supported, but you own the lifecycle:
cpu_custom_ami_id = "ami-0123456789abcdef0"
cpu_custom_user_data = file("./userdata/cpu-bootstrap.sh")When cpu_custom_ami_id is set, AWS-managed AMI updates are
disabled. You're responsible for kubelet, containerd, CNI binaries,
SSM agent, and any custom-CA injection. cpu_custom_user_data is
required (pass "" only if your AMI self-bootstraps, which is rare).
The same four knobs exist with a gpu_ prefix. A custom GPU AMI
must include working NVIDIA drivers, nvidia-container-toolkit, and
the runtime setup the device plugin expects. The AL2023 NVIDIA
AMI ships all of this; a barebones custom AMI will boot but pods
will fail to schedule because the GPUs aren't visible.
Stick with managed AMIs unless you have a hard requirement (FIPS, custom hardening baseline, internal-build-promotion policy). Custom AMIs make every EKS upgrade your problem to keep working.
The reference architecture does not install a cluster-wide log shipper or metrics stack. Pick whatever fits your existing observability tooling. Two data sources are already wired and ready to consume:
- EKS control-plane logs go to CloudWatch via the upstream EKS
module's
cluster_enabled_log_types/create_cloudwatch_log_group. Seecontrol_plane_log_typesinmodules/eks/variables.tfto pick which streams (api, audit, authenticator, controllerManager, scheduler) get enabled. - RDS logs export to CloudWatch via
enabled_cloudwatch_logs_exportsinmodules/data-stores.
For pod and node logs, layer on whichever shipper your team already runs:
- CloudWatch Container Insights / Fluent Bit: AWS-native, easiest if you're already in CloudWatch.
- Managed offering (Datadog, New Relic, Splunk): drop in their agent DaemonSet against the same EKS cluster.
- Self-hosted (Vector, Fluent Bit, Promtail → Loki / OpenSearch / S3): install via separate Helm chart.
Poolside workloads log to stdout/stderr; any of the above will pick them up. The reference architecture deliberately doesn't pick one for you because operators almost always have an organizational standard already.
- Chart-internal values like pod resource requests, probe
configs, env vars, autoscaling on inference-envoy/inference-extproc,
and anything outside
inference.models.<key>. Those are chart defaults. If you need to override one, use a custom Helm values overlay at install time, but doing so bypasses this reference architecture'spoolside-valuesmodule. (inference.models.<key>itself is fully operator-controlled viainference_models; see above.) - Bundle layout.
containers/*.tarandcharts/<chart>/are the expected structure; re-extract your bundle if this looks different.
- GPU cost control:
gpu_desired_size = 0(above) plus the capacity-reservation knobs (gpu_capacity_reservation_id,gpu_capacity_reservation_resource_group_arn,gpu_use_capacity_block) inmodules/reference-stack/variables.tf. - Regulated environments:
permissions_boundary_arn+iam_name_prefix+allow_locked_out_cluster = falseare the core knobs (above). - Observability: see Logging / observability (BYO) above.
- Want to configure something not listed here? Open an issue. This reference architecture's variable surface is intentionally small; we'd rather add a specific variable than point people at a generic escape hatch.