How Poolside model weights get from your disk to S3 to GPU pods.
For the full profile, the reference architecture ships a streaming uploader that
reads *.tar files from a local directory and uploads each tarball's
members directly to an S3 bucket it creates for you. The
inference-stack Helm values reference the S3 URIs automatically.
The alternative, operator-managed buckets, is also supported.
# terraform.tfvars
enable_model_s3_upload = true # default
checkpoints_dir = "/home/ops/poolside/models"What happens:
- Reference architecture creates a
<deployment>-modelsS3 bucket (SSE-KMS, HTTPS-only, BucketOwnerEnforced) - The
model-checkpointsmodule scanscheckpoints_dirfor*.tarfiles - For each tarball, it streams members directly into S3 at
s3://<bucket>/models/checkpoints/<tarball_stem>/<member>. No local extraction, no scratch disk. - On success, a zero-byte
.checkpoint-completemarker is written with the source tarball's SHA-256 in its S3 metadata. Future applies HEAD this marker and short-circuit if the SHA matches - Inference IAM role gets scoped to this bucket
inference-stackHelm values point each inference subchart'smodel:field at the right S3 path
# terraform.tfvars
enable_model_s3_upload = false
external_models_bucket = "your-existing-models-bucket"
# You populate the bucket yourself, out-of-band.What happens:
- Reference architecture does NOT create a bucket or run any uploads
- Inference IAM role gets
s3:GetObject+s3:ListBucketonexternal_models_bucket inference-stackHelm values don't auto-derive S3 URIs from tarballs. You supply them yourself by passinginference_modelsdirectly to thepoolside-valuesmodule (one entry per model with the s3:// URI in themodelfield). Seecustomizing.md.
Cross-account buckets: not supported. The bucket must live in the same AWS account as the deployment. Cross-account requires a bucket policy on the source side granting the inference_pod role, plus KMS key cross-account grants. Both are out of scope for this reference architecture.
The reference architecture splits each <file>.tar stem on the first hyphen to
yield a model alias:
| Tarball filename | Alias | Version |
|---|---|---|
malibu-v2.20251021.tar |
malibu |
v2.20251021 |
malibu-v2.20251021_int4.tar |
malibu |
v2.20251021_int4 |
point-v2.20250403.tar |
point |
v2.20250403 |
laguna-v1.0.tar |
laguna |
v1.0 |
The alias identifies which inference-stack subchart this
checkpoint feeds (e.g. inference-malibu, inference-point). The
version (everything after the first hyphen) is carried verbatim in
the S3 path:
s3://<deployment>-models/models/checkpoints/malibu-v2.20251021_int4/
config.yaml
model.safetensors
pipeline_config.json
recipe.yaml
tokenizer/
tokenizer.json
chat_template.jinja
...
.checkpoint-complete ← zero-byte marker; source SHA in metadata
Keeping the full stem as the S3 directory name gives you content-addressable-by-version. Swapping tarballs doesn't silently replace the S3 contents.
examples/full calls inference_models_from_uploads whenever you
leave inference_models unset. That helper keys the map by the
first hyphen of each tarball stem, so multiple tarballs whose
stems share a first segment (e.g. laguna-12341a.tar and
laguna-1234513.tar) collapse to the same key — only one survives,
unpredictably.
If you need to deploy multiple checkpoints under what the helper
would treat as the same alias, either rename the tarballs so each
produces a distinct first segment, or set inference_models
explicitly in your root (see customizing.md)
rather than using the helper.
Every *.tar at the top level of checkpoints_dir gets uploaded, so
pointing it at your full model library uploads everything. To deploy
only a subset, point checkpoints_dir at a directory of symlinks:
mkdir -p ~/poolside/models-for-this-deployment
ln -s ~/poolside/models/malibu-v2.20251021_int4.tar ~/poolside/models-for-this-deployment/
ln -s ~/poolside/models/point-v2.20250403.tar ~/poolside/models-for-this-deployment/Then set checkpoints_dir = "~/poolside/models-for-this-deployment".
A model deploys whenever its key appears in the inference_models map
passed to modules/poolside-values. Each key becomes a separate
inference-<key> Deployment/Service.
The examples/full root, when var.inference_models is left null,
auto-derives the map from module.stack.inference_models_from_uploads,
producing one entry per uploaded tarball, keyed by first-hyphen alias.
poolside-values fills in modelName/modelType/gpus from a
defaults table for well-known alias keys (malibu, point,
laguna-m, laguna-xs, laguna). Unknown keys pass through
untouched; you supply the chart fields they need yourself.
To see which models will deploy and what values they'll receive:
terraform output -json inference_models_resolvedTerraform's change trigger on each upload is keyed on the tarball filename and the destination S3 path, not the file's contents. That means:
- Add / remove / rename a tarball → trigger fires, Terraform plans the upload (or teardown).
- In-place replace (same filename, new content) → Terraform trigger does NOT fire. The Python uploader's SHA-check against the marker object catches the mismatch and re-uploads, but plan/apply output won't reflect it until the provisioner runs.
The recommended pattern is to give every distinct version a distinct
filename. The version is part of the alias-prefix convention anyway
(for example, malibu-v2.20251021_int4.tar vs
malibu-v2.20260101.tar), so the "in-place replace" case rarely comes
up in practice.
aws s3 cp --recursive would work, but requires extracting each
tarball to disk first: 75GB of scratch per large model, written
twice (extract, then upload). This reference architecture's uploader
uses Python's tarfile module to read the archive's member index,
then pipes each member directly through boto3's upload_fileobj to
S3 with no local extraction.
Tradeoff: you need python3 with boto3 importable by it. See
prerequisites.md for the requirement statement
and the install-method notes. This reference architecture doesn't
prescribe an install method; system package, venv, pip --user, uv,
and similar are all fine as long as import boto3 succeeds for
whichever python3 is first on PATH.
If the Python dependency is a dealbreaker for your environment, use
Mode B (BYO bucket) and populate the bucket with whatever tool you
already have (aws s3 sync, rclone, a CI pipeline, etc.).
Checkpoints persist across terraform apply runs (that's the point
of the marker). To clear them:
# Clear one checkpoint
aws s3 rm --recursive s3://<deployment>-models/models/checkpoints/<stem>/
# Clear everything (e.g. before destroy)
aws s3 rm --recursive s3://<deployment>-models/On full terraform destroy, s3_force_destroy_buckets = true
deletes the bucket even if it still has objects. Defaults to false:
flip it for POCs, leave it for production.