Skip to content

feat(app-exposure): enable addon on hub + path-mode routing for CloudFront workshop-studio#686

Draft
allamand wants to merge 9 commits into
aws-samples:feature/platform-cluster-kro-ackfrom
allamand:feat/peeks-rebase-on-platform-cluster-kro-ack
Draft

feat(app-exposure): enable addon on hub + path-mode routing for CloudFront workshop-studio#686
allamand wants to merge 9 commits into
aws-samples:feature/platform-cluster-kro-ackfrom
allamand:feat/peeks-rebase-on-platform-cluster-kro-ack

Conversation

@allamand

Copy link
Copy Markdown
Contributor

Summary

Enables the app-exposure addon on the hub control-plane and extends the
AppExposure RGD with a path-only routing mode required for CloudFront
single-domain edges (Workshop Studio).

Context

PR #671 introduced AppExposure as a host-header-based RGD. On Workshop Studio
the hub is fronted by a single-domain CloudFront distribution, so every app
shares the same Host header and host-header listener rules can never match.
Apps must be differentiated by URL path only.

This PR makes AppExposure usable in that mode and turns it on by default on
the hub.

Changes

  • feat(kind-kro-ack): make hub:seed-secret idempotent and
    label-preserving (don't blow away enable_* labels on re-run).
  • feat(hub): enable app-exposure addon on the control-plane
    (enabled-addons.yaml, registry/platform.yaml).
  • feat(multi-acct): grant the KRO capability the ACK ELBv2 perms it
    needs to reconcile TargetGroup / Rule, and add the STS subject for
    pod-identity assumption.
  • feat(app-exposure): add routingMode field (default "host", enum
    "host"|"path"). Splits the rule into listenerRuleHost (host+path,
    unchanged behaviour) and listenerRulePath (path-only, no host
    condition) gated by includeWhen. New status fields
    ruleARNHost/ruleARNPath; legacy ruleARN preserved for CRD
    backward-compat.

Testing

End-to-end validated on a hub cluster (KRO capability, eu-west-1):

  • AppExposure claim with routingMode=path, path=/smoke, priority=999
  • KRO reconciles 3 resources: ACK TargetGroup, ACK Rule (path-only),
    AWS LBC TargetGroupBinding
  • ALB target reports healthy
  • HTTPS request via the CloudFront edge reaches the backend pod — chain
    CF -> ALB -> TG -> TGB -> pod proven working

Notes

  • Backward-compatible: existing AppExposure instances without routingMode
    default to "host" and keep their previous behaviour.
  • Complementary with feat(exposure): CloudFront mode for domain-less deployments #680 (which solves the same CloudFront single-domain
    problem for plain Helm Ingress charts). The two approaches can coexist on
    the same hub.

Related

allamand and others added 9 commits May 20, 2026 09:57
When the hub cluster Secret already exists, patch only the fields owned
by this task (gitops-bridge annotations + a small set of structural
labels) using kubectl annotate/label --overwrite. This preserves
labels/annotations added by other controllers — notably the enable_*
labels projected by the KRO eks-cluster RGD (rg-eks.yaml argocdSecret
resource) — instead of clobbering them via a blanket apply.

First-run path (secret absent) is unchanged: full manifest apply.

Rationale: re-running hub:seed-secret on an established hub used to
strip ~30 enable_<addon>=true labels, silently disabling addons until
the next KRO reconcile. The new flow makes the task safely re-runnable
and clarifies ownership boundaries between this task and KRO RGDs.
Adds app_exposure: true to the hub's enabled-addons map so the
fleet-secret chart projects enable_app_exposure=true onto the hub
cluster Secret. The matching ApplicationSet entry already exists in
gitops/addons/registry/platform.yaml (l.181) and will deploy the
KRO ResourceGraphDefinition appexposure.peeks.io plus the
app-exposure-edge-config ConfigMap.

Required exposure annotations (alb_listener_arn, ingress_domain_name,
aws_vpc_id, aws_region, exposure_mode) are seeded by the updated
hub:seed-secret task.
The eks-capabilities-kro ClusterRole was missing the ACK ELBv2 API groups,
which blocks the AppExposure RGD from creating TargetGroup/Rule resources.

The ClusterRoleBinding only listed the legacy 'capabilities.eks.amazonaws.com'
username — but the EKS access-entry maps the KRO capability IAM role to an
STS assumed-role principal (arn:aws:sts::<account>:assumed-role/<role>/KRO),
so the RoleBinding never matched at runtime.

Changes:
- Add elbv2.services.k8s.aws/* + elbv2.k8s.aws/targetgroupbindings rules
- Add STS assumed-role subject, templated from gitops-bridge annotations
  (aws_account_id, aws_cluster_name, resource_prefix)
- Plumb global.accountId + global.resourcePrefix through the AppSet registry
- Schema: new field routingMode (default="host", enum="host,path")
- Resources: split listenerRule into listenerRuleHost (host+path match) and
  listenerRulePath (path-only, no host condition) gated by includeWhen
- Status: new ruleARNHost/ruleARNPath fields; legacy ruleARN kept for CRD compat

Unblocks AppExposure for workshop-studio mode where all apps share the same
CloudFront single-domain distribution and routing must be differentiated by
URL path only (host-header match impossible with shared CF domain).

Validated end-to-end on smoke-nginx via CloudFront: chain CF -> ALB -> TG ->
TGB -> pod working, target healthy.
Replace legacy nginx ingress (class=nginx, controller inert on hub) with an
AWS Load Balancer Controller ingress on class=platform, sharing the
peeks-hub-ingress ALB with keycloak/argo/etc.

Uses LBC v2.14+ ALB url-rewrite transforms (annotation
alb.ingress.kubernetes.io/transforms.backstage) to strip the /backstage
prefix before forwarding to the backend, replacing the legacy nginx
rewrite-target: /$2 behaviour. Stickiness omitted for now.

Cherry-picked pattern from PR aws-samples#680 (feature/cloudfront-exposure).
…yaml

Refactor the chart to template a list of IngressClass / IngressClassParams
pairs from `.Values.classes`, instead of hardcoding a single `platform`
class. This unblocks group isolation for the agent platform: the upstream
`IngressClassParams platform` enforces `spec.group.name=platform`, which
overrides any `alb.ingress.kubernetes.io/group.name` annotation on
downstream Ingress resources (e.g. agentgateway).

Default values now provision two classes:
  - platform / group=platform / scheme=internet-facing
  - agent    / group=agent    / scheme=internet-facing

Backward-compatible: rendered output for the `platform` class is
identical to the previous single-class template (auto / oss modes).

Companion change in agent-platform repo will switch the agentgateway
chart's `ingress.className` from `platform` to `agent`.
Backstage chart uses .Values.global.ingress_name for
alb.ingress.kubernetes.io/load-balancer-name. The addon registry was
not propagating it, so backstage was joining ALB 'platform' instead
of 'peeks-hub-ingress' — causing 'conflicting load balancer name' on
the platform group.
…notations

IngressClassParams was hard-coded to group=platform / scheme=internet-facing,
which forced every Ingress chart to override via annotations and broke the
LBC group when a pre-created ALB used a different name (e.g. peeks-hub-ingress
in cloudfront-alb mode).

Now both classes ('platform' and 'agent') derive their config from cluster
secret annotations:
  - group.name <- ingress_name (default 'platform')
  - scheme     <- internal if exposure_mode=cloudfront-alb, else internet-facing

This unlocks two provisioner modes:
  1. taskfile: hub:ingress pre-creates the ALB; LBC adopts it via group match.
  2. lbc:      LBC creates the ALB lazily on first Ingress reconciliation.

Both work with cloudfront-alb (internal) and tls (internet-facing) exposure.
The status: precondition (kubectl get secret aws-credentials) caused
credentials:setup to silently skip the refresh when the secret already
existed, leaving stale STS tokens in long-lived clusters.

Symptoms observed:
- KRO claim eksclusterwithvpcs hub/hub stuck IN_PROGRESS for 14d
- ACK ELBv2 controller in CrashLoopBackOff (~2774 restarts) with 403
  AccessDenied on AssumeRole / DescribeLoadBalancers

Fix: drop the status: guard so credentials:setup always delegates to
credentials:refresh, which re-issues fresh STS creds via aws sts
assume-role on every task install run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant