Skip to content

chore: Log and expose aws services error internal details#769

Open
zeljkoX wants to merge 1 commit into
mainfrom
aws-error-details
Open

chore: Log and expose aws services error internal details#769
zeljkoX wants to merge 1 commit into
mainfrom
aws-error-details

Conversation

@zeljkoX
Copy link
Copy Markdown
Collaborator

@zeljkoX zeljkoX commented May 8, 2026

Summary

This PR adds logging to expose more details about AWS service errors, improving debugging.

Testing Process

Checklist

  • Add a reference to related issues in the PR description.
  • Add unit tests if applicable.

Note

If you are using Relayer in your stack, consider adding your team or organization to our list of Relayer Users in the Wild!

Summary by CodeRabbit

  • Chores
    • Enhanced error diagnostics and logging for AWS SQS and KMS operations with richer error context and classification to improve troubleshooting and visibility into operation failures

@zeljkoX zeljkoX requested a review from a team as a code owner May 8, 2026 12:14
@zeljkoX zeljkoX requested a review from Copilot May 8, 2026 12:14
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR adds AWS SDK error classification and enriched error logging throughout the relayer. A new aws_error utility module provides stable, low-cardinality error categorization and full error source chain rendering. SQS and KMS backends integrate these utilities into their failure paths to emit detailed logs and user-facing error messages.

Changes

AWS Error Classification and Enriched Logging

Layer / File(s) Summary
Dependencies
Cargo.toml
Adds aws-smithy-types and aws-smithy-runtime-api v1 to support error introspection and formatting.
Error Utilities
src/utils/aws_error.rs, src/utils/mod.rs
New module exports classify_sdk_error (maps SdkError to stable &'static str tags) and re-exports DisplayErrorContext (renders full error source chain). Comprehensive unit tests cover construction, timeout, dispatch, and service error paths.
SQS Backend Integration
src/queues/sqs/backend.rs
Updates send_message_to_sqs error handling to classify errors and emit warn logs with error kind, detail, and context; returns QueueBackendError message via DisplayErrorContext.
KMS Module Integration
src/services/aws_kms/mod.rs
Updates error handling across secp256k1 and Ed25519 public key retrieval and signing operations to log classified warnings with operation labels and format AwsKmsError variants using DisplayErrorContext.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 AWS errors now wear their source chains bright,
Classification tags keep cardinality light,
From KMS keys to SQS queues they flow,
Each failure logged with details we should know!
Enriched diagnostics hop along the way,
Making troubleshooting brighter every day! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description follows the template structure with a summary section, testing process section, and checklist. However, both required checklist items (issue references and unit tests) are left unchecked, and the testing process section is empty. Complete the testing process section with details of how the changes were tested, and consider filling out the checklist items regarding issue references and unit tests if applicable to this change.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding logging to expose AWS service error details for improved debugging.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aws-error-details

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves observability for AWS SDK failures by introducing shared utilities to classify SdkError variants and to render full nested error causes, then wiring those into AWS KMS and SQS call sites.

Changes:

  • Added utils::aws_error with classify_sdk_error and a re-export of DisplayErrorContext, including unit tests.
  • Updated AWS KMS and SQS integrations to emit structured tracing logs with low-cardinality error kind tags plus full error-chain details.
  • Added direct dependencies on aws-smithy-runtime-api and aws-smithy-types to support the new helpers.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/utils/mod.rs Exposes the new AWS error helper module via crate::utils.
src/utils/aws_error.rs Adds shared AWS SDK error classification + full error-chain rendering utilities (with tests).
src/services/aws_kms/mod.rs Logs structured AWS KMS failures with error kind + detailed context; updates returned error strings.
src/queues/sqs/backend.rs Logs structured SQS send failures with error kind + detailed context; updates returned error strings.
Cargo.toml Adds smithy runtime/type crates as direct dependencies.
Cargo.lock Updates lockfile for the added dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +283 to 293
warn!(
error.kind = classify_sdk_error(&e),
error.detail = %DisplayErrorContext(&e),
kms_key_id = %key_id,
operation = "get_public_key_secp256k1",
"AWS KMS get_public_key failed"
);
AwsKmsError::GetError(format!(
"Failed to get secp256k1 public key for key '{key_id}': {e:?}"
"Failed to get secp256k1 public key for key '{key_id}': {}",
DisplayErrorContext(&e)
))
operation = "sign_digest_secp256k1",
"AWS KMS sign failed"
);
AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string())
Comment on lines +375 to +376
"Failed to get Ed25519 public key for key '{key_id}': {}",
DisplayErrorContext(&e)
operation = "sign_ed25519",
"AWS KMS sign failed"
);
AwsKmsError::SignError(DisplayErrorContext(&e).to_string())
Comment thread src/queues/sqs/backend.rs
queue_url = %queue_url,
"Failed to send message to SQS"
);
QueueBackendError::SqsError(format!("SendMessage failed: {}", DisplayErrorContext(&e)))
Comment thread src/utils/aws_error.rs
//! the full `std::error::Error::source()` chain so the underlying cause
//! (e.g., `connect timed out`, `dns error: failed to lookup address`)
//! appears in the log instead of just the top-level wrapper.
//!
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/aws_kms/mod.rs`:
- Around line 327-336: The AWS KMS sign path currently maps all SDK errors to
AwsKmsError::PermissionError inside the map_err closure in
sign_digest_secp256k1; change that to return AwsKmsError::SignError (using the
same DisplayErrorContext(&e).to_string() message) so KMS sign() failures are
classified as signing errors rather than permission errors while keeping the
existing warn(...) call (which uses classify_sdk_error(&e) and
DisplayErrorContext(&e)) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 50300784-48de-48d2-81b8-71a56cbc8c89

📥 Commits

Reviewing files that changed from the base of the PR and between 8194752 and b6762e6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml
  • src/queues/sqs/backend.rs
  • src/services/aws_kms/mod.rs
  • src/utils/aws_error.rs
  • src/utils/mod.rs

Comment on lines +327 to +336
.map_err(|e| {
warn!(
error.kind = classify_sdk_error(&e),
error.detail = %DisplayErrorContext(&e),
kms_key_id = %key_id,
operation = "sign_digest_secp256k1",
"AWS KMS sign failed"
);
AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string())
})?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map KMS sign() failures to SignError instead of PermissionError.

On Line 335, every SDK failure path is converted to AwsKmsError::PermissionError, including non-permission failures (timeouts, dispatch, response parsing). This misclassifies failures and can cause incorrect upstream handling.

Suggested fix
-                AwsKmsError::PermissionError(DisplayErrorContext(&e).to_string())
+                AwsKmsError::SignError(format!(
+                    "Failed to sign secp256k1 digest for key '{key_id}': {}",
+                    DisplayErrorContext(&e)
+                ))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/aws_kms/mod.rs` around lines 327 - 336, The AWS KMS sign path
currently maps all SDK errors to AwsKmsError::PermissionError inside the map_err
closure in sign_digest_secp256k1; change that to return AwsKmsError::SignError
(using the same DisplayErrorContext(&e).to_string() message) so KMS sign()
failures are classified as signing errors rather than permission errors while
keeping the existing warn(...) call (which uses classify_sdk_error(&e) and
DisplayErrorContext(&e)) intact.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.29%. Comparing base (8194752) to head (b6762e6).

Files with missing lines Patch % Lines
src/services/aws_kms/mod.rs 0.00% 26 Missing ⚠️
src/queues/sqs/backend.rs 0.00% 5 Missing ⚠️
src/utils/aws_error.rs 90.56% 5 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
ai 0.00% <0.00%> (ø)
dev 90.29% <57.14%> (-0.02%) ⬇️
properties 0.01% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@            Coverage Diff             @@
##             main     #769      +/-   ##
==========================================
- Coverage   90.31%   90.29%   -0.02%     
==========================================
  Files         291      292       +1     
  Lines      124510   124588      +78     
==========================================
+ Hits       112448   112495      +47     
- Misses      12062    12093      +31     
Files with missing lines Coverage Δ
src/queues/sqs/backend.rs 56.22% <0.00%> (-0.21%) ⬇️
src/utils/aws_error.rs 90.56% <90.56%> (ø)
src/services/aws_kms/mod.rs 85.06% <0.00%> (-4.46%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants