Skip to content

feat(exposure): CloudFront mode for domain-less deployments#680

Draft
allamand wants to merge 19 commits into
feature/agent-platformfrom
feature/cloudfront-exposure
Draft

feat(exposure): CloudFront mode for domain-less deployments#680
allamand wants to merge 19 commits into
feature/agent-platformfrom
feature/cloudfront-exposure

Conversation

@allamand

Copy link
Copy Markdown
Contributor

Summary

Implements conditional ingress rendering based on exposure.mode config:

  • domain (default): HTTPS:443, host-based routing, ACM cert required
  • cloudfront: HTTP:80, no host, CloudFront terminates TLS — no custom domain needed

Problem

In Workshop Studio environments there's no custom domain or Route53 hosted zone. The ALB controller fails with no certificate found for host because ingresses specify HTTPS with a host that has no ACM cert.

Changes

  • Add exposure.mode to config.yaml schema and template
  • Update ingress templates (keycloak, argo-workflows, grafana, jupyterhub, kubeflow) with conditional HTTP/HTTPS rendering
  • Pass exposure_mode annotation through registry valuesObject to addon charts
  • Add hub:cloudfront Taskfile task (creates ALB + CloudFront distribution, updates Secrets Manager)
  • Update fleet-secret chart to propagate exposure_mode annotation

Testing

Verified on live hub cluster:

  • ALB provisioned successfully with HTTP:80 listeners
  • Keycloak responding at /keycloak via ALB
  • All ingresses sharing the platform group without cert errors

Closes #677

@allamand

Copy link
Copy Markdown
Contributor Author

Companion issue for the external agent-platform repo charts (agent-gateway, langfuse ingresses): aws-samples/sample-open-agentic-platform#11

@allamand allamand force-pushed the feature/cloudfront-exposure branch from 446ae43 to d3d7ed9 Compare May 20, 2026 21:04
@allamand allamand marked this pull request as draft May 20, 2026 21:09
@allamand allamand force-pushed the feature/cloudfront-exposure branch from d3d7ed9 to 7d0d5ce Compare May 20, 2026 21:11
Implements conditional ingress rendering:
- exposure.mode: 'domain' (default) — HTTPS:443, host-based routing, TLS
- exposure.mode: 'cloudfront' — HTTP:80, no host, CloudFront terminates TLS

Changes:
- Add exposure.mode to config schema and template
- Update ingress templates (keycloak, argo-workflows, grafana, jupyterhub, kubeflow)
- Pass exposure_mode annotation through registry valuesObject
- Add hub:cloudfront Taskfile task (creates ALB + CloudFront distribution)
- Update fleet-secret chart to propagate exposure_mode annotation

Closes #677
@allamand allamand force-pushed the feature/cloudfront-exposure branch from 7d0d5ce to e9bd3b3 Compare May 20, 2026 21:34
@shapirov103 shapirov103 requested a review from hmuthusamy May 20, 2026 21:42
@hmuthusamy

Copy link
Copy Markdown
Collaborator

Review: feature/cloudfront-exposure

Overall approach is solid — the pre-create ALB → CloudFront → use CF domain pattern is correct. A few gaps need addressing before this will work reliably with Keycloak and SSE (Agent Gateway MCP):

Issues

# Issue Detail Fix
1 No OriginReadTimeout set CloudFront config JSON doesn't specify OriginReadTimeout — defaults to 30s. The Terraform reference (platform/infra/terraform/common/cloudfront.tf) uses 60s. Agent Gateway SSE requires the origin to send data within this window or CloudFront drops the connection. Add "OriginReadTimeout": 60 to CustomOriginConfig in the distribution JSON
2 No OriginKeepaliveTimeout set Defaults to 5s. Terraform reference uses 30s. Short keepalive means CloudFront opens new TCP connections frequently, adding latency. Add "OriginKeepaliveTimeout": 30 to CustomOriginConfig
3 Missing X-Forwarded-Proto / X-Forwarded-Port custom headers Terraform adds X-Forwarded-Proto: https and X-Forwarded-Port: 443 as custom origin headers. Without these, Keycloak generates redirect URIs with http:// instead of https:// (it sees the ALB connection as HTTP). Add CustomHeaders to the origin config with these two headers
4 No separate cache behavior for /keycloak/* Terraform has an ordered_cache_behavior for Keycloak with TTL=0 and all headers/cookies forwarded. The branch uses a single default behavior. Keycloak requires all cookies and headers for session management — the AllViewer origin request policy should cover this, but explicit TTL=0 prevents stale auth responses. Add an ordered_cache_behavior for /keycloak/* with MinTTL=0, DefaultTTL=0, MaxTTL=0
5 No destroy cleanup for CloudFront/ALB The destroy task doesn't delete the CloudFront distribution, the pre-created ALB, or the dedicated security group. These will be orphaned on teardown. Add CloudFront disable+delete, ALB delete, and SG delete to the destroy task (CloudFront requires disabling first, then waiting, then deleting)
6 CloudFront deployment propagation delay CloudFront distributions take 5-15 minutes to deploy. wait_for_deployment = false in Terraform skips this, but the Taskfile should either wait or warn that the domain won't be reachable immediately. Add a wait loop or print a warning after hub:cloudfront

allamand pushed a commit to allamand/appmod-blueprints that referenced this pull request May 24, 2026
Replace legacy nginx ingress (class=nginx, controller inert on hub) with an
AWS Load Balancer Controller ingress on class=platform, sharing the
peeks-hub-ingress ALB with keycloak/argo/etc.

Uses LBC v2.14+ ALB url-rewrite transforms (annotation
alb.ingress.kubernetes.io/transforms.backstage) to strip the /backstage
prefix before forwarding to the backend, replacing the legacy nginx
rewrite-target: /$2 behaviour. Stickiness omitted for now.

Cherry-picked pattern from PR aws-samples#680 (feature/cloudfront-exposure).
…Hub catalog, EKS capabilities

- Add ALB transforms annotation for argo-workflows path rewrite (LBC v2.14.0+)
- Fix Keycloak SAML update to use CloudFront domain and wait for pod readiness
- Add idc:configure task: seeds SSM creds, runs configure_identity_center.py for SCIM
- Switch Backstage catalog from GitLab to GitHub with branch-aware URL
- Add static system-info entity with env var substitution (no hardcoded values)
- Add adminRoleName and modelS3Bucket to config.yaml and secrets-manager:seed
- Create KRO + ACK capabilities on hub (create-capability.yaml Job)
- Add KRO + ACK capability provisioning to spoke Crossplane composition
- Update .kiro steering files with current architecture knowledge

Refs: #690
Workshop User added 4 commits May 27, 2026 06:17
- Pre-create GitLab NLB and CloudFront distribution early in bootstrap
- Conditional SSH port (enabled in domain mode, disabled in cloudfront mode)
- Git protocol switches to HTTP in cloudfront mode (PAT-based auth)
- GitLab gets its own CloudFront domain (stored in private/gitlab-cloudfront-domain)
- gitlab_domain_name annotation propagated through secrets-manager:seed
- Enable gitlab addon in control-plane environment

Refs: #698, #699
Prevents cascade deletion of all child ApplicationSets when the hub
cluster secret is momentarily reset by ExternalSecret reconciliation.
@allamand allamand force-pushed the feature/cloudfront-exposure branch from 6e7e038 to 5d96585 Compare May 27, 2026 13:08
Workshop User added 2 commits May 27, 2026 13:46
Fleet member secrets should only be created after spoke clusters are
provisioned by Crossplane. They'll be re-added when spokes are ready.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants