fix(core): honor AWS_ENDPOINT_URL_STS in IAM credentials-provider loader#5968
Open
edwardpark97 wants to merge 1 commit into
Open
fix(core): honor AWS_ENDPOINT_URL_STS in IAM credentials-provider loader#5968edwardpark97 wants to merge 1 commit into
edwardpark97 wants to merge 1 commit into
Conversation
In AWS partitions where a separate FIPS STS hostname is not published (e.g. us-gov-west-1, where the standard sts.us-gov-west-1.amazonaws.com is itself FIPS-validated and is the only STS endpoint in the region), the default credential provider's internal STS client otherwise constructs a non-existent sts-fips.<region>.amazonaws.com whenever AWS_USE_FIPS_ENDPOINT=true is set. DNS resolution fails, credential acquisition hangs, and ElastiCache/MemoryDB IAM authentication times out with 'Connection error: Timeout'. Honor an explicit AWS_ENDPOINT_URL_STS override on the SDK config loader when building the credentials provider. The Python SDK (boto3) already threads AWS_ENDPOINT_URL_STS into both direct STS calls and the credentials-provider STS client; this mirrors that behavior for the Rust SDK loader. Also explicitly disable FIPS on this loader: even when the override is provided, the AWS SDK's endpoint resolver fails fast when a FIPS partition is requested but the resolved (or user-provided) endpoint is not on the SDK's hard-coded FIPS endpoint list. Disabling FIPS here is safe because the override is scoped to credential acquisition; SigV4 presigning of the actual ElastiCache/MemoryDB connect request happens separately via aws-sigv4 and is unaffected. The user remains responsible for pointing AWS_ENDPOINT_URL_STS at a FIPS-validated endpoint where required. Adds a regression test exercising the new code path with both a populated and an empty AWS_ENDPOINT_URL_STS. Signed-off-by: Edward Park <edwardpark97@gmail.com>
xShinnRyuu
reviewed
May 26, 2026
| // the SDK endpoint resolver rejects user URLs not on its FIPS list. Scoped | ||
| // to this loader; SigV4 presigning is unaffected. See valkey-io/valkey-glide#5967. | ||
| if let Ok(sts_endpoint) = std::env::var("AWS_ENDPOINT_URL_STS") | ||
| && !sts_endpoint.is_empty() |
Collaborator
There was a problem hiding this comment.
Do we need to also check for whitespace-only values here?
| if let Ok(sts_endpoint) = std::env::var("AWS_ENDPOINT_URL_STS") | ||
| && !sts_endpoint.is_empty() | ||
| { | ||
| loader = loader.use_fips(false).endpoint_url(sts_endpoint); |
Collaborator
There was a problem hiding this comment.
The endpoint URL is passed directly without validating it uses HTTPS. An http:// URL would send credential requests over plaintext, potentially exposing STS session tokens.
| ## Pending 2.4 | ||
|
|
||
| #### Fixes | ||
| * CORE: Honor `AWS_ENDPOINT_URL_STS` in the IAM credentials-provider loader so ElastiCache/MemoryDB IAM auth works in AWS partitions that do not publish a separate FIPS STS hostname (e.g. `us-gov-west-1`). Previously, setting `AWS_USE_FIPS_ENDPOINT=true` made the SDK construct a non-existent `sts-fips.<region>.amazonaws.com`, causing credential acquisition to hang. Matches `boto3` behavior. ([#5967](https://github.com/valkey-io/valkey-glide/issues/5967)) |
Collaborator
There was a problem hiding this comment.
Since we have already release version 2.4.0 of GLIDE, there is a merge conflict here. Can you please update this to be under Pending 2.5?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes ElastiCache/MemoryDB IAM authentication in AWS partitions that do not publish a separate FIPS STS hostname (most notably
us-gov-west-1, where the standardsts.us-gov-west-1.amazonaws.comendpoint is itself FIPS-validated and is the only STS endpoint in the region). Previously, settingAWS_USE_FIPS_ENDPOINT=truemade the default credential provider's internal STS client synthesize a non-existentsts-fips.<region>.amazonaws.comhostname, causing credential acquisition to hang and IAM auth to time out withConnection error: Timeout. SettingAWS_ENDPOINT_URL_STSdid not help because the Rust SDK's credentials-provider STS client doesn't honor that env var (unlikeboto3, which threads it through).Issue link
This Pull Request is linked to issue: core: ElastiCache IAM auth hangs in us-gov-west-1 when AWS_USE_FIPS_ENDPOINT=true
Closes #5967
Features / Behaviour Changes
glide-core/src/iam/mod.rs::get_signing_identity()now honors an explicitAWS_ENDPOINT_URL_STSenvironment variable when building the credentials-provider config loader, and disables FIPS on that scoped loader so the SDK's endpoint resolver does not reject the override. This mirrorsboto3's long-standing behavior.AWS_ENDPOINT_URL_STSis unset or empty — the loader path is identical to before.glide-corefor IAM auth.Implementation
The change is a single ~30-line addition inside
get_signing_identity():Two reviewer hot-spots:
Why
.use_fips(false)is needed alongside.endpoint_url()— the SDK's endpoint resolver fails fast when a FIPS partition is requested but the resolved (or user-provided) endpoint isn't on the SDK's hard-coded FIPS endpoint list. Without.use_fips(false), the override is rejected before any DNS lookup is attempted, producing"an error occurred while loading credentials"within ~100ms (not a DNS timeout). Disabling FIPS on this scoped loader is safe because (a) the override is local to credential acquisition only and does not affect SigV4 presigning of the actual ElastiCache/MemoryDB connect request (which runs throughaws-sigv4independently), and (b) the user remains responsible for pointingAWS_ENDPOINT_URL_STSat a FIPS-validated endpoint where compliance requires it.The
!sts_endpoint.is_empty()guard —std::env::var()returnsOk("")when the env var is set but blank (common in Kubernetes manifests that templatize values). Treating blank as unset avoids passing an empty string toendpoint_url(), which would fail with a confusing error later.A short comment in the source flags both points and links back to issue #5967 for the full writeup.
Limitations
AWS_ENDPOINT_URL_STSis honored, not the more generalAWS_ENDPOINT_URL. The narrower env var is sufficient for the GovCloud use case and matches what we observedboto3honoring in the same code path.Testing
test_get_signing_identity_honors_aws_endpoint_url_stsinglide-core/src/iam/mod.rscovering both a populated and an emptyAWS_ENDPOINT_URL_STSvalue. The test exercises the new loader path with static credentials supplied viaAWS_ACCESS_KEY_IDso no actual STS call is made, but it proves the code is plumbed through without breaking the happy path. Runs in the existing#[serial]test group to avoid env-var races with other tests.us-gov-west-1) ElastiCache Serverless with a real IRSA setup. Before the fix: credential acquisition hangs and IAM auth times out withConnection error: Timeout. After the fix: PING/PONG completes in ~2s, and KV/publisher/subscriber clients all connect cleanly.cargo fmt --check(fromglide-core/) — cleancargo clippy --all-targets -- -D warnings(fromglide-core/) — exit 0, no warningscargo test --lib iam::(fromglide-core/) — 9 passed, 0 failed (8 pre-existing + the new regression test)Checklist
cargo fmt --check,cargo clippy --all-targets -- -D warnings, andcargo test --lib iam::all pass fromglide-core/. (No Prettier-relevant files changed.)main.