chore: Log and expose aws services error internal details#769
Conversation
WalkthroughThis PR adds AWS SDK error classification and enriched error logging throughout the relayer. A new ChangesAWS Error Classification and Enriched Logging
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves observability for AWS SDK failures by introducing shared utilities to classify SdkError variants and to render full nested error causes, then wiring those into AWS KMS and SQS call sites.
Changes:
- Added
utils::aws_errorwithclassify_sdk_errorand a re-export ofDisplayErrorContext, including unit tests. - Updated AWS KMS and SQS integrations to emit structured
tracinglogs with low-cardinality error kind tags plus full error-chain details. - Added direct dependencies on
aws-smithy-runtime-apiandaws-smithy-typesto support the new helpers.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/mod.rs | Exposes the new AWS error helper module via crate::utils. |
| src/utils/aws_error.rs | Adds shared AWS SDK error classification + full error-chain rendering utilities (with tests). |
| src/services/aws_kms/mod.rs | Logs structured AWS KMS failures with error kind + detailed context; updates returned error strings. |
| src/queues/sqs/backend.rs | Logs structured SQS send failures with error kind + detailed context; updates returned error strings. |
| Cargo.toml | Adds smithy runtime/type crates as direct dependencies. |
| Cargo.lock | Updates lockfile for the added dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| warn!( | ||
| error.kind = classify_sdk_error(&e), | ||
| error.detail = %DisplayErrorContext(&e), | ||
| kms_key_id = %key_id, | ||
| operation = "get_public_key_secp256k1", | ||
| "AWS KMS get_public_key failed" | ||
| ); | ||
| AwsKmsError::GetError(format!( | ||
| "Failed to get secp256k1 public key for key '{key_id}': {e:?}" | ||
| "Failed to get secp256k1 public key for key '{key_id}': {}", | ||
| DisplayErrorContext(&e) | ||
| )) |
| operation = "sign_digest_secp256k1", | ||
| "AWS KMS sign failed" | ||
| ); | ||
| AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string()) |
| "Failed to get Ed25519 public key for key '{key_id}': {}", | ||
| DisplayErrorContext(&e) |
| operation = "sign_ed25519", | ||
| "AWS KMS sign failed" | ||
| ); | ||
| AwsKmsError::SignError(DisplayErrorContext(&e).to_string()) |
| queue_url = %queue_url, | ||
| "Failed to send message to SQS" | ||
| ); | ||
| QueueBackendError::SqsError(format!("SendMessage failed: {}", DisplayErrorContext(&e))) |
| //! the full `std::error::Error::source()` chain so the underlying cause | ||
| //! (e.g., `connect timed out`, `dns error: failed to lookup address`) | ||
| //! appears in the log instead of just the top-level wrapper. | ||
| //! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/aws_kms/mod.rs`:
- Around line 327-336: The AWS KMS sign path currently maps all SDK errors to
AwsKmsError::PermissionError inside the map_err closure in
sign_digest_secp256k1; change that to return AwsKmsError::SignError (using the
same DisplayErrorContext(&e).to_string() message) so KMS sign() failures are
classified as signing errors rather than permission errors while keeping the
existing warn(...) call (which uses classify_sdk_error(&e) and
DisplayErrorContext(&e)) intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 50300784-48de-48d2-81b8-71a56cbc8c89
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlsrc/queues/sqs/backend.rssrc/services/aws_kms/mod.rssrc/utils/aws_error.rssrc/utils/mod.rs
| .map_err(|e| { | ||
| warn!( | ||
| error.kind = classify_sdk_error(&e), | ||
| error.detail = %DisplayErrorContext(&e), | ||
| kms_key_id = %key_id, | ||
| operation = "sign_digest_secp256k1", | ||
| "AWS KMS sign failed" | ||
| ); | ||
| AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string()) | ||
| })? |
There was a problem hiding this comment.
Map KMS sign() failures to SignError instead of PermissionError.
On Line 335, every SDK failure path is converted to AwsKmsError::PermissionError, including non-permission failures (timeouts, dispatch, response parsing). This misclassifies failures and can cause incorrect upstream handling.
Suggested fix
- AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string())
+ AwsKmsError::SignError(format!(
+ "Failed to sign secp256k1 digest for key '{key_id}': {}",
+ DisplayErrorContext(&e)
+ ))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/aws_kms/mod.rs` around lines 327 - 336, The AWS KMS sign path
currently maps all SDK errors to AwsKmsError::PermissionError inside the map_err
closure in sign_digest_secp256k1; change that to return AwsKmsError::SignError
(using the same DisplayErrorContext(&e).to_string() message) so KMS sign()
failures are classified as signing errors rather than permission errors while
keeping the existing warn(...) call (which uses classify_sdk_error(&e) and
DisplayErrorContext(&e)) intact.
Codecov Report❌ Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## main #769 +/- ##
==========================================
- Coverage 90.31% 90.29% -0.02%
==========================================
Files 291 292 +1
Lines 124510 124588 +78
==========================================
+ Hits 112448 112495 +47
- Misses 12062 12093 +31
🚀 New features to boost your workflow:
|
Summary
This PR adds logging to expose more details about AWS service errors, improving debugging.
Testing Process
Checklist
Note
If you are using Relayer in your stack, consider adding your team or organization to our list of Relayer Users in the Wild!
Summary by CodeRabbit