fix: add BackoffCredentialsProvider to mitigate STS throttling across all plugins#6637
Conversation
… all plugins Wrap StsAssumeRoleCredentialsProvider with BackoffCredentialsProvider in CredentialsProviderFactory. When credential resolution fails (e.g. role deleted or trust policy misconfigured), the wrapper caches the failure and applies exponential backoff (10s to 10min) before retrying STS, preventing excessive AssumeRole calls that cause STS throttling. This protects all plugins that use CredentialsProviderFactory including S3, OpenSearch, Lambda, SQS, and most AWS-integrated sources and sinks. Signed-off-by: Dinu John <86094133+dinujoh@users.noreply.github.com>
c105c4c to
1803a7f
Compare
graytaylor0
left a comment
There was a problem hiding this comment.
Thanks this will be a big improvement in STS call optimization.
dlvenable
left a comment
There was a problem hiding this comment.
This is a great approach. I have one comment about making it configurable.
| class BackoffCredentialsProvider implements AwsCredentialsProvider { | ||
| private static final Logger LOG = LoggerFactory.getLogger(BackoffCredentialsProvider.class); | ||
|
|
||
| static final Duration INITIAL_BACKOFF = Duration.ofSeconds(10); |
There was a problem hiding this comment.
I think we should make these configurable in the aws plugin. Not configured per credential, but at the high level.
aws:
max_backoff: 10m
configurations:
default:
sts_role_arn: arn:aws:iam::123456789012:role/MyRole
region: us-east-2
There was a problem hiding this comment.
Should max_backoff be under aws->configurations ?
aws:
configurations:
max_backoff: 10m
default:
sts_role_arn: ...
There was a problem hiding this comment.
The way I view configurations is to be essentially named configurations, so it would be a map. But, default is a special name. See #2570 for more detail.
So it should be directly under aws.
There was a problem hiding this comment.
aws can have other plugins correct ? for example AwsSecretPlugin ?
dlvenable
left a comment
There was a problem hiding this comment.
We can make this configurable in a follow on.
Description
Wrap StsAssumeRoleCredentialsProvider with BackoffCredentialsProvider in CredentialsProviderFactory. When credential resolution fails (e.g. role deleted or trust policy misconfigured), the wrapper caches the failure and applies exponential backoff (10s to 10min) before retrying STS, preventing excessive AssumeRole calls that cause STS throttling.
This protects all plugins that use CredentialsProviderFactory including S3, OpenSearch, Lambda, SQS, and most AWS-integrated sources and sinks.
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.