fix: EKS cluster template and kro RGD issues#744
Conversation
- Add missing enableAwsEfsCsiDriver parameter to Backstage template - Fix readyWhen condition: use clusterState instead of state in rg-eks-vpc.yaml - Fix duplicate argocd.argoproj.io/sync-options annotation (YAML key collision) - Fix Backstage component link: add gitops/ prefix and use dynamic tenant param - Fix spec.owner: use user:guest instead of bare guests
The ACK secretsmanager controller running as an EKS Capability needs ClusterRole permissions to manage secretsmanager.services.k8s.aws CRDs. Without this, the controller cannot resolve secretString references to k8s secrets, causing clusterConfigSecret to never sync.
…conciliation - Add critical section in eks-best-practices.md to prevent deleting K8s resources to force reconciliation - Add troubleshooting best practices in aws-env-best-practices.md emphasizing patience and non-destructive debugging - These rules prevent breaking status propagation chains in Kro resource graphs and ACK controllers - Recommend 15-30 minute wait time for controller reconciliation after config changes
Missing:
|
- Merge duplicate sync-options annotations - Enable observability by default (cni_metrics, prometheus_node_exporter, kube_state_metrics) - Fix readyWhen: use eks.status.clusterState instead of eks.status.state - Add secretsmanager RBAC in eks-capabilities-rbac
- Merge duplicate sync-options annotations - Enable observability by default (cni_metrics, prometheus_node_exporter, kube_state_metrics) - Fix readyWhen: use eks.status.clusterState instead of eks.status.state - Add secretsmanager RBAC in eks-capabilities-rbac
shapirov103
left a comment
There was a problem hiding this comment.
Very cool, looks great, please see my comment.
| # Get admin role name from current AWS context | ||
| ADMIN_ROLE_NAME=$(aws sts get-caller-identity --query 'Arn' --output text | sed 's|.*assumed-role/||' | sed 's|/.*||') | ||
| # Use WSParticipantRole (Workshop Studio standard role). Fallback to caller identity for self-paced deployments. | ||
| ADMIN_ROLE_NAME=$(aws iam list-roles --query 'Roles[?contains(RoleName,`WSParticipantRole`)].RoleName' --output text 2>/dev/null) |
There was a problem hiding this comment.
WSParticipant role is specific to the workshop. Should go into the gitlab repo. What is the plan to start moving this out of the repo? Otherwise the refactoring effort for agent-platform will break the workshop and will be hard to merge back to main.
There was a problem hiding this comment.
We don't use utils.sh in the agentic branch (pr-709) but will need to double check there that we don't reference it out of the workshop pattern
There was a problem hiding this comment.
if we don't use this in the agentic branch, but the agentic branch is going to target main soon - what will happen with this change? Does it mean someone will have to replicate in the workshop? if so, why not do it now, save on the merge, I am afraid it maybe lost otherwise after the merge.
There was a problem hiding this comment.
yes if the plan is to merge agentic work into main quickly then we can save effort. but for now, main is broken with the optional module to create eks staging cluster not working. so everything depend on the timeline we are agree to merge agentic work to main, and I think there is still some work to do before that
There was a problem hiding this comment.
Sounds good, we can apply the fix now, the sh change is one of the things in the PR, other changes seem to be applicable regardless of the workshop
shapirov103
left a comment
There was a problem hiding this comment.
LGTM (utils.sh change to move out of the repo, deferred)
Summary
Fixes multiple issues in the EKS cluster Backstage template and kro ResourceGraphDefinition.
Closes #743
Changes
platform/backstage/templates/eks-cluster-template/template.yamlenableAwsEfsCsiDriverparameter definitiongitops/addons/charts/kro/resource-groups/manifests/eks/rg-eks-vpc.yamlreadyWhen:state→clusterStateto match EksCluster RGD status schemarg-eks-vpc.yamlargocd.argoproj.io/sync-optionsby combining valuestemplate.yamlgitops/prefix, use dynamic${{ parameters.tenant }}template.yamlspec.owner:guests→user:guestTesting
readyWhencondition resolves correctly with an active EKS clusterenableAwsEfsCsiDriverparameter correctly