Add metrics in SecretsRefreshJob for SecretsManager exceptions#6252
Conversation
…eption - Currently adding these two metrics as a starting point - Reference for exceptions: https://github.com/aws/api-models-aws/blob/f622f1d5f018e4617cdc14766c3113b6a960703a/models/secrets-manager/service/2017-10-17/secrets-manager-2017-10-17.json - Tested through unit tests Signed-off-by: Nishita Nishita <bninishi@amazon.com>
| SECRETS_REFRESH_FAILURE, SECRET_CONFIG_ID_TAG, secretConfigId); | ||
| this.secretsRefreshTimer = pluginMetrics.timerWithTags( | ||
| SECRETS_REFRESH_DURATION, SECRET_CONFIG_ID_TAG, secretConfigId); | ||
| this.secretsManagerResourceNotFoundCounter = pluginMetrics.counter( |
There was a problem hiding this comment.
i'm not using counterWithTags because we wanna keep the schema for these new metrics similar to the rest of the metrics, for example S3: #6245 , so, skipping the extra SecretConfigId dimension here.
There was a problem hiding this comment.
I am not positive we really want to do this here. I think here we want to emit the exact same way the rest of the metric.
There was a problem hiding this comment.
We should look for consistency within this plugin over consistency with other plugins. I think the tags are ideal here.
There was a problem hiding this comment.
Noted, i'll update to use counterWithTags
| * | ||
| * @param e the Secrets Manager exception to record metrics for | ||
| */ | ||
| private void recordSecretsManagerException(final SecretsManagerException e) { |
There was a problem hiding this comment.
Do you want to check for AccessDeniedException as well? And the list may go on. If you are looking for a metric to track the secrets manager access failure, may be one counter for any kind of failure could do the job? Logs will have details about the kind of failure anyway.
There was a problem hiding this comment.
responded on slack
| SECRETS_REFRESH_FAILURE, SECRET_CONFIG_ID_TAG, secretConfigId); | ||
| this.secretsRefreshTimer = pluginMetrics.timerWithTags( | ||
| SECRETS_REFRESH_DURATION, SECRET_CONFIG_ID_TAG, secretConfigId); | ||
| this.secretsManagerResourceNotFoundCounter = pluginMetrics.counter( |
There was a problem hiding this comment.
We should look for consistency within this plugin over consistency with other plugins. I think the tags are ideal here.
| secretsRefreshSuccessCounter.increment(); | ||
| pluginConfigPublisher.notifyAllPluginConfigObservable(); | ||
| } catch(Exception e) { | ||
| if (e instanceof SecretsManagerException) { |
There was a problem hiding this comment.
It is a little odd and confusing that metrics recording is split into two places. We should probably do it all in the same place. Maybe include secretsRefreshFailureCounter.increment(); in the new method and move this conditional in there as well.
There was a problem hiding this comment.
Updated in rev-2, to keep the logic inside the catch block itself
| static final String SECRETS_REFRESH_SUCCESS = "secretsRefreshSuccess"; | ||
| static final String SECRETS_REFRESH_FAILURE = "secretsRefreshFailure"; | ||
| static final String SECRETS_REFRESH_DURATION = "secretsRefreshDuration"; | ||
| static final String SECRETS_MANAGER_RESOURCE_NOT_FOUND = "secretsManagerResourceNotFound"; |
There was a problem hiding this comment.
We currently use the prefix of secrets so let's stick with that. Also, resourceNotFound seems a little generic. Maybe these would be better: secretsSecretNotFound and secretsLimitExceeded.
There was a problem hiding this comment.
i'll update to add secretsSecretNotFound, secretsLimitExceeded, and secretsAccessDenied
Signed-off-by: Nishita Nishita <bninishi@amazon.com>
san81
left a comment
There was a problem hiding this comment.
Thank you for the contribution.
Updates in Rev-2:
Description
Add metrics in SecretsRefreshJob for the following exceptions thrown by SecretsManager
Reference for SecretsManagerException: https://github.com/aws/api-models-aws/blob/f622f1d5f018e4617cdc14766c3113b6a960703a/models/secrets-manager/service/2017-10-17/secrets-manager-2017-10-17.json
Added Unit Tests
Issues Resolved
n/a
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.