Problem
Currently, the AWS provider attempts to deploy/destroy infrastructure without validating credentials or permissions upfront. This can lead to failures deep into the operation with unclear error messages.
Proposed Solution
Implement credential validation that runs at the start of all operations (Deploy, Destroy, Reconcile, and their dry-run equivalents) - ideally immediately after creating the AWS clients.
Validation Steps
-
Credential Validity: Verify that AWS credentials are valid and can authenticate
- Use
sts:GetCallerIdentity to validate credentials work
- Return clear error if credentials are invalid or expired
- Display the identity (user/role ARN) and account ID
-
Permission Check: Verify the credentials have all required IAM permissions
- Use IAM Policy Simulator API where possible
- Provide clear, actionable error messages listing missing permissions
-
Fail Fast: If validation fails, return informative error before attempting any infrastructure operations
Required AWS IAM Permissions
Based on analysis of the AWS provider implementation (pkg/provider/aws/) and Terraform modules, the following permissions are required:
STS (Security Token Service)
sts:GetCallerIdentity - Validate credentials
S3 (State Bucket Management)
These permissions are required by the Go CLI for Terraform state bucket lifecycle management:
s3:HeadBucket - Check if state bucket exists
s3:CreateBucket - Create state bucket
s3:PutBucketVersioning - Enable versioning on state bucket
s3:PutPublicAccessBlock - Block public access to state bucket
s3:ListObjectVersions - List objects before deletion (destroy)
s3:DeleteObject - Delete objects in bucket (destroy)
s3:DeleteBucket - Delete state bucket (destroy)
EC2 (VPC, Subnets, Network)
ec2:CreateVpc
ec2:DeleteVpc
ec2:DescribeVpcs
ec2:ModifyVpcAttribute (enable DNS)
ec2:CreateSubnet
ec2:DeleteSubnet
ec2:DescribeSubnets
ec2:CreateInternetGateway
ec2:DeleteInternetGateway
ec2:AttachInternetGateway
ec2:DetachInternetGateway
ec2:DescribeInternetGateways
ec2:AllocateAddress (Elastic IPs for NAT Gateways)
ec2:ReleaseAddress
ec2:DescribeAddresses
ec2:CreateNatGateway
ec2:DeleteNatGateway
ec2:DescribeNatGateways
ec2:CreateRouteTable
ec2:DeleteRouteTable
ec2:DescribeRouteTables
ec2:CreateRoute
ec2:AssociateRouteTable
ec2:DisassociateRouteTable
ec2:CreateSecurityGroup
ec2:DeleteSecurityGroup
ec2:DescribeSecurityGroups
ec2:AuthorizeSecurityGroupIngress
ec2:AuthorizeSecurityGroupEgress
ec2:CreateVpcEndpoint
ec2:DeleteVpcEndpoints
ec2:DescribeVpcEndpoints
ec2:DescribeNetworkInterfaces (wait for ENI cleanup)
ec2:DescribeAvailabilityZones
ec2:CreateTags
ec2:DeleteTags
EKS (Elastic Kubernetes Service)
eks:CreateCluster
eks:DeleteCluster
eks:DescribeCluster
eks:UpdateClusterVersion
eks:UpdateClusterConfig
eks:CreateNodegroup
eks:DeleteNodegroup
eks:DescribeNodegroup
eks:ListNodegroups
eks:UpdateNodegroupConfig
eks:TagResource
eks:UntagResource
IAM (Identity and Access Management)
iam:CreateRole
iam:DeleteRole
iam:GetRole
iam:AttachRolePolicy
iam:DetachRolePolicy
iam:ListAttachedRolePolicies
iam:PassRole (required for EKS to assume cluster/node roles)
iam:TagRole
EFS (Elastic File System) - Optional, only if EFS is enabled in config
elasticfilesystem:CreateFileSystem
elasticfilesystem:DeleteFileSystem
elasticfilesystem:DescribeFileSystems
elasticfilesystem:CreateMountTarget
elasticfilesystem:DeleteMountTarget
elasticfilesystem:DescribeMountTargets
elasticfilesystem:TagResource
Implementation Location
Add validation in pkg/provider/aws/provider.go:
-
Create a validateCredentials(ctx context.Context, clients *Clients, cfg *config.NebariConfig) error method
-
Call it from:
Deploy() - at the start, after client creation
Reconcile() - at the start, after client creation
Destroy() - at the start, after client creation
dryRunDeploy() - at the start, after client creation
dryRunDestroy() - at the start, after client creation
-
Use sts:GetCallerIdentity as a basic credential check
-
Optionally check for EFS permissions only if cfg.AWS.EFS.Enabled == true
-
Provide clear error messages listing any missing permissions
Example Error Message
Error: AWS credentials validation failed
Identity: arn:aws:iam::123456789012:user/my-user
Account: 123456789012
The provided AWS credentials are missing required permissions:
- ec2:CreateVpc
- ec2:CreateSubnet
- eks:CreateCluster
- iam:CreateRole
- iam:PassRole
References
Problem
Currently, the AWS provider attempts to deploy/destroy infrastructure without validating credentials or permissions upfront. This can lead to failures deep into the operation with unclear error messages.
Proposed Solution
Implement credential validation that runs at the start of all operations (Deploy, Destroy, Reconcile, and their dry-run equivalents) - ideally immediately after creating the AWS clients.
Validation Steps
Credential Validity: Verify that AWS credentials are valid and can authenticate
sts:GetCallerIdentityto validate credentials workPermission Check: Verify the credentials have all required IAM permissions
Fail Fast: If validation fails, return informative error before attempting any infrastructure operations
Required AWS IAM Permissions
Based on analysis of the AWS provider implementation (
pkg/provider/aws/) and Terraform modules, the following permissions are required:STS (Security Token Service)
sts:GetCallerIdentity- Validate credentialsS3 (State Bucket Management)
These permissions are required by the Go CLI for Terraform state bucket lifecycle management:
s3:HeadBucket- Check if state bucket existss3:CreateBucket- Create state buckets3:PutBucketVersioning- Enable versioning on state buckets3:PutPublicAccessBlock- Block public access to state buckets3:ListObjectVersions- List objects before deletion (destroy)s3:DeleteObject- Delete objects in bucket (destroy)s3:DeleteBucket- Delete state bucket (destroy)EC2 (VPC, Subnets, Network)
ec2:CreateVpcec2:DeleteVpcec2:DescribeVpcsec2:ModifyVpcAttribute(enable DNS)ec2:CreateSubnetec2:DeleteSubnetec2:DescribeSubnetsec2:CreateInternetGatewayec2:DeleteInternetGatewayec2:AttachInternetGatewayec2:DetachInternetGatewayec2:DescribeInternetGatewaysec2:AllocateAddress(Elastic IPs for NAT Gateways)ec2:ReleaseAddressec2:DescribeAddressesec2:CreateNatGatewayec2:DeleteNatGatewayec2:DescribeNatGatewaysec2:CreateRouteTableec2:DeleteRouteTableec2:DescribeRouteTablesec2:CreateRouteec2:AssociateRouteTableec2:DisassociateRouteTableec2:CreateSecurityGroupec2:DeleteSecurityGroupec2:DescribeSecurityGroupsec2:AuthorizeSecurityGroupIngressec2:AuthorizeSecurityGroupEgressec2:CreateVpcEndpointec2:DeleteVpcEndpointsec2:DescribeVpcEndpointsec2:DescribeNetworkInterfaces(wait for ENI cleanup)ec2:DescribeAvailabilityZonesec2:CreateTagsec2:DeleteTagsEKS (Elastic Kubernetes Service)
eks:CreateClustereks:DeleteClustereks:DescribeClustereks:UpdateClusterVersioneks:UpdateClusterConfigeks:CreateNodegroupeks:DeleteNodegroupeks:DescribeNodegroupeks:ListNodegroupseks:UpdateNodegroupConfigeks:TagResourceeks:UntagResourceIAM (Identity and Access Management)
iam:CreateRoleiam:DeleteRoleiam:GetRoleiam:AttachRolePolicyiam:DetachRolePolicyiam:ListAttachedRolePoliciesiam:PassRole(required for EKS to assume cluster/node roles)iam:TagRoleEFS (Elastic File System) - Optional, only if EFS is enabled in config
elasticfilesystem:CreateFileSystemelasticfilesystem:DeleteFileSystemelasticfilesystem:DescribeFileSystemselasticfilesystem:CreateMountTargetelasticfilesystem:DeleteMountTargetelasticfilesystem:DescribeMountTargetselasticfilesystem:TagResourceImplementation Location
Add validation in
pkg/provider/aws/provider.go:Create a
validateCredentials(ctx context.Context, clients *Clients, cfg *config.NebariConfig) errormethodCall it from:
Deploy()- at the start, after client creationReconcile()- at the start, after client creationDestroy()- at the start, after client creationdryRunDeploy()- at the start, after client creationdryRunDestroy()- at the start, after client creationUse
sts:GetCallerIdentityas a basic credential checkOptionally check for EFS permissions only if
cfg.AWS.EFS.Enabled == trueProvide clear error messages listing any missing permissions
Example Error Message
References