feat(app-exposure): enable addon on hub + path-mode routing for CloudFront workshop-studio#686
Draft
allamand wants to merge 9 commits into
Conversation
When the hub cluster Secret already exists, patch only the fields owned by this task (gitops-bridge annotations + a small set of structural labels) using kubectl annotate/label --overwrite. This preserves labels/annotations added by other controllers — notably the enable_* labels projected by the KRO eks-cluster RGD (rg-eks.yaml argocdSecret resource) — instead of clobbering them via a blanket apply. First-run path (secret absent) is unchanged: full manifest apply. Rationale: re-running hub:seed-secret on an established hub used to strip ~30 enable_<addon>=true labels, silently disabling addons until the next KRO reconcile. The new flow makes the task safely re-runnable and clarifies ownership boundaries between this task and KRO RGDs.
Adds app_exposure: true to the hub's enabled-addons map so the fleet-secret chart projects enable_app_exposure=true onto the hub cluster Secret. The matching ApplicationSet entry already exists in gitops/addons/registry/platform.yaml (l.181) and will deploy the KRO ResourceGraphDefinition appexposure.peeks.io plus the app-exposure-edge-config ConfigMap. Required exposure annotations (alb_listener_arn, ingress_domain_name, aws_vpc_id, aws_region, exposure_mode) are seeded by the updated hub:seed-secret task.
The eks-capabilities-kro ClusterRole was missing the ACK ELBv2 API groups, which blocks the AppExposure RGD from creating TargetGroup/Rule resources. The ClusterRoleBinding only listed the legacy 'capabilities.eks.amazonaws.com' username — but the EKS access-entry maps the KRO capability IAM role to an STS assumed-role principal (arn:aws:sts::<account>:assumed-role/<role>/KRO), so the RoleBinding never matched at runtime. Changes: - Add elbv2.services.k8s.aws/* + elbv2.k8s.aws/targetgroupbindings rules - Add STS assumed-role subject, templated from gitops-bridge annotations (aws_account_id, aws_cluster_name, resource_prefix) - Plumb global.accountId + global.resourcePrefix through the AppSet registry
- Schema: new field routingMode (default="host", enum="host,path") - Resources: split listenerRule into listenerRuleHost (host+path match) and listenerRulePath (path-only, no host condition) gated by includeWhen - Status: new ruleARNHost/ruleARNPath fields; legacy ruleARN kept for CRD compat Unblocks AppExposure for workshop-studio mode where all apps share the same CloudFront single-domain distribution and routing must be differentiated by URL path only (host-header match impossible with shared CF domain). Validated end-to-end on smoke-nginx via CloudFront: chain CF -> ALB -> TG -> TGB -> pod working, target healthy.
Replace legacy nginx ingress (class=nginx, controller inert on hub) with an AWS Load Balancer Controller ingress on class=platform, sharing the peeks-hub-ingress ALB with keycloak/argo/etc. Uses LBC v2.14+ ALB url-rewrite transforms (annotation alb.ingress.kubernetes.io/transforms.backstage) to strip the /backstage prefix before forwarding to the backend, replacing the legacy nginx rewrite-target: /$2 behaviour. Stickiness omitted for now. Cherry-picked pattern from PR aws-samples#680 (feature/cloudfront-exposure).
…yaml Refactor the chart to template a list of IngressClass / IngressClassParams pairs from `.Values.classes`, instead of hardcoding a single `platform` class. This unblocks group isolation for the agent platform: the upstream `IngressClassParams platform` enforces `spec.group.name=platform`, which overrides any `alb.ingress.kubernetes.io/group.name` annotation on downstream Ingress resources (e.g. agentgateway). Default values now provision two classes: - platform / group=platform / scheme=internet-facing - agent / group=agent / scheme=internet-facing Backward-compatible: rendered output for the `platform` class is identical to the previous single-class template (auto / oss modes). Companion change in agent-platform repo will switch the agentgateway chart's `ingress.className` from `platform` to `agent`.
Backstage chart uses .Values.global.ingress_name for alb.ingress.kubernetes.io/load-balancer-name. The addon registry was not propagating it, so backstage was joining ALB 'platform' instead of 'peeks-hub-ingress' — causing 'conflicting load balancer name' on the platform group.
…notations
IngressClassParams was hard-coded to group=platform / scheme=internet-facing,
which forced every Ingress chart to override via annotations and broke the
LBC group when a pre-created ALB used a different name (e.g. peeks-hub-ingress
in cloudfront-alb mode).
Now both classes ('platform' and 'agent') derive their config from cluster
secret annotations:
- group.name <- ingress_name (default 'platform')
- scheme <- internal if exposure_mode=cloudfront-alb, else internet-facing
This unlocks two provisioner modes:
1. taskfile: hub:ingress pre-creates the ALB; LBC adopts it via group match.
2. lbc: LBC creates the ALB lazily on first Ingress reconciliation.
Both work with cloudfront-alb (internal) and tls (internet-facing) exposure.
The status: precondition (kubectl get secret aws-credentials) caused credentials:setup to silently skip the refresh when the secret already existed, leaving stale STS tokens in long-lived clusters. Symptoms observed: - KRO claim eksclusterwithvpcs hub/hub stuck IN_PROGRESS for 14d - ACK ELBv2 controller in CrashLoopBackOff (~2774 restarts) with 403 AccessDenied on AssumeRole / DescribeLoadBalancers Fix: drop the status: guard so credentials:setup always delegates to credentials:refresh, which re-issues fresh STS creds via aws sts assume-role on every task install run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables the
app-exposureaddon on the hub control-plane and extends theAppExposure RGD with a path-only routing mode required for CloudFront
single-domain edges (Workshop Studio).
Context
PR #671 introduced AppExposure as a host-header-based RGD. On Workshop Studio
the hub is fronted by a single-domain CloudFront distribution, so every app
shares the same Host header and host-header listener rules can never match.
Apps must be differentiated by URL path only.
This PR makes AppExposure usable in that mode and turns it on by default on
the hub.
Changes
feat(kind-kro-ack): makehub:seed-secretidempotent andlabel-preserving (don't blow away
enable_*labels on re-run).feat(hub): enableapp-exposureaddon on the control-plane(
enabled-addons.yaml,registry/platform.yaml).feat(multi-acct): grant the KRO capability the ACK ELBv2 perms itneeds to reconcile
TargetGroup/Rule, and add the STS subject forpod-identity assumption.
feat(app-exposure): addroutingModefield (default"host", enum"host"|"path"). Splits the rule intolistenerRuleHost(host+path,unchanged behaviour) and
listenerRulePath(path-only, no hostcondition) gated by
includeWhen. New status fieldsruleARNHost/ruleARNPath; legacyruleARNpreserved for CRDbackward-compat.
Testing
End-to-end validated on a hub cluster (KRO capability, eu-west-1):
AppExposureclaim withroutingMode=path,path=/smoke,priority=999TargetGroup, ACKRule(path-only),AWS LBC
TargetGroupBindingCF -> ALB -> TG -> TGB -> pod proven working
Notes
routingModedefault to
"host"and keep their previous behaviour.problem for plain Helm Ingress charts). The two approaches can coexist on
the same hub.
Related