chore: add support to use ManagedIdentityCredential when AZURE_CLIENT_ID env is specified#146
chore: add support to use ManagedIdentityCredential when AZURE_CLIENT_ID env is specified#146mittachaitu wants to merge 2 commits into
Conversation
Greptile SummaryThis PR adds
Confidence Score: 3/5The core auth fallback logic is architecturally sound, but the call_once failure-retry semantics cause a serial 5-second timeout chain for every concurrent client when DefaultAzureCredential consistently fails. When AZURE_CLIENT_ID is set and DefaultAzureCredential is expected to fail, N concurrent workers serialize through N separate 5-second GetToken() timeouts before each falls back to ManagedIdentityCredential, directly undermining startup reliability. cpp/azure/client/client.cc — specifically GetSharedDefaultAzureCredential and the call_once failure path Important Files Changed
Reviews (3): Last reviewed commit: "chore: address bot comments" | Re-trigger Greptile |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAzure client adds an optional ChangesManaged Identity authentication support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/azure/client_configuration/client_configuration.cc`:
- Around line 40-43: The code currently logs the raw AZURE_CLIENT_ID value
(variable cid) before assigning it to client_id; update the logging in the
client configuration and client code paths (where cid/client_id are used and
where LOG(DEBUG) prints the id) to avoid emitting the full identifier — either
log a generic message like "Using AZURE_CLIENT_ID for managed identity
(present)" or log a masked form (e.g., only last 4 chars) instead; locate usages
around the getenv<std::string>("AZURE_CLIENT_ID", "") call and the LOG(DEBUG)
statements in client_configuration.cc and client.cc and replace the direct cid
print with the non-sensitive alternative.
In `@cpp/azure/client/client.cc`:
- Around line 66-69: verify_credentials currently doesn't compare the newly
parsed _client_id, so an override can be ignored and an old client reused;
update the verify_credentials function to include _client_id in its matching
logic (alongside other credential fields it checks) so that any change to
client_id forces a mismatch and triggers client refresh; locate
verify_credentials and add comparison against the _client_id member (and ensure
any cache/key used to identify the active client incorporates client_id as
well).
- Around line 113-131: The current try/catch only surrounds BlobServiceClient
construction but Azure auth is deferred, so replace the approach by attempting
to acquire a token from DefaultAzureCredential before committing to the
credential: instantiate DefaultAzureCredential, call its token acquisition
method (e.g., GetToken / AcquireToken equivalent) for the storage scope and if
that call succeeds use that credential to construct BlobServiceClient; if the
token acquisition throws or fails and _client_id.has_value() create a
ManagedIdentityCredential, attempt token acquisition with it, and only then
construct BlobServiceClient; ensure exceptions are logged with e.what() and
rethrow if both token acquisitions fail; target symbols: DefaultAzureCredential,
ManagedIdentityCredential, BlobServiceClient, and the credential
GetToken/AcquireToken call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 36de2ac3-453b-4286-a75d-71807b65fab7
📒 Files selected for processing (4)
cpp/azure/client/client.cccpp/azure/client/client.hcpp/azure/client_configuration/client_configuration.cccpp/azure/client_configuration/client_configuration.h
…configured This commit add supports for ManagedIdentityCredential when AZURE_CLIENT_ID is configured Signed-off-by: Mitta Sai Chaithanya <mittas@microsoft.com>
dca2f1b to
87eb066
Compare
| auto default_cred = std::make_shared<Azure::Identity::DefaultAzureCredential>(); | ||
| Azure::Core::Credentials::TokenRequestContext token_ctx; | ||
| token_ctx.Scopes = {"https://storage.azure.com/.default"}; | ||
| Azure::Core::Context ctx; | ||
| (void)default_cred->GetToken(token_ctx, ctx); // force auth now | ||
| credential = default_cred; | ||
| LOG(DEBUG) << "Azure client initialized with DefaultAzureCredential for account: " << _account_name.value(); |
There was a problem hiding this comment.
Token caching regression: removing
static exposes IMDS throttling risk
The original code used a static DefaultAzureCredential instance specifically to share token caching across all clients and avoid overwhelming IMDS — the comment even warned about "fatal throttling errors." This PR removes static (correctly, to allow the fallback to work), but does not replace it with any caching mechanism. Combined with the new GetToken() call on line 121, every AzureClient construction now independently acquires a fresh token from IMDS/Azure AD. Under load (e.g., many concurrent model-streamer workers, each constructing its own AzureClient), this will hit IMDS with one token request per client, exactly the scenario the original author explicitly guarded against.
Consider hoisting the DefaultAzureCredential to a process-level singleton (e.g., a static local in a free function or a static class member) that is separate from the fallback path. When GetToken() succeeds on that shared instance, use it; when it throws, fall back to a per-client ManagedIdentityCredential.
| Azure::Core::Credentials::TokenRequestContext token_ctx; | ||
| token_ctx.Scopes = {"https://storage.azure.com/.default"}; | ||
| Azure::Core::Context ctx; | ||
| (void)default_cred->GetToken(token_ctx, ctx); // force auth now |
There was a problem hiding this comment.
Unbounded block in constructor due to missing cancellation deadline
DefaultAzureCredential probes every credential source in its chain — EnvironmentCredential, WorkloadIdentityCredential, ManagedIdentityCredential (via IMDS), AzureCliCredential, etc. — sequentially. The Azure::Core::Context ctx here is default-constructed with no deadline, so if the host is outside Azure or IMDS is throttled/unreachable, the constructor blocks for the full SDK timeout per credential type before finally throwing. In practice this can be tens of seconds, stalling whichever thread is constructing the client.
Pass a context with a deadline so the probe fails fast, e.g. Azure::Core::Context::ApplicationContext.WithDeadline(std::chrono::system_clock::now() + std::chrono::seconds(5)).
Signed-off-by: Mitta Sai Chaithanya <mittas@microsoft.com>
| std::shared_ptr<Azure::Identity::DefaultAzureCredential> GetSharedDefaultAzureCredential() { | ||
| static std::shared_ptr<Azure::Identity::DefaultAzureCredential> shared_credential; | ||
| static std::once_flag init_flag; | ||
| std::call_once(init_flag, []() { | ||
| auto cred = std::make_shared<Azure::Identity::DefaultAzureCredential>(); | ||
| // Validate credential chain early via GetToken() as fail-fast mechanism. | ||
| Azure::Core::Credentials::TokenRequestContext token_ctx; | ||
| token_ctx.Scopes = {"https://storage.azure.com/.default"}; | ||
| Azure::Core::Context ctx = Azure::Core::Context::ApplicationContext.WithDeadline( | ||
| std::chrono::system_clock::now() + std::chrono::seconds(5)); | ||
| (void)cred->GetToken(token_ctx, ctx); // throws if credential chain fails | ||
| shared_credential = cred; // only assign if GetToken() succeeds | ||
| }); | ||
| return shared_credential; | ||
| } |
There was a problem hiding this comment.
Serial retry storm when DefaultAzureCredential consistently fails
Per the C++ standard, when the callable passed to std::call_once exits via exception, the once_flag is not marked "done" and the next call will retry the callable. In the exact scenario this PR targets — AZURE_CLIENT_ID set, DefaultAzureCredential expected to fail — every concurrent AzureClient construction will serially queue through call_once and each wait up to 5 seconds before throwing. With N concurrent workers this produces O(N × 5 s) startup latency, which can be 40+ seconds for an 8-worker deployment.
A simple fix is to track the failure state in a static std::atomic<bool> so that after the first failed probe the function throws immediately without retrying the 5-second GetToken() timeout each time.
|
@mittachaitu thanks for the PR. It is unfortunate that it looks like the default credential C++ SDK does not respect the In general, it would be interesting to see if we could just add support for it in the identity C++ SDK itself especially given this is a parity gap with the Python SDK. I'd be happy to help review that with Azure SDK with you if you have the bandwidth to send a PR up. In terms of unblocking us now, I'm a little hesitant to try to try default credentials and then assume if it fails and I think for now the safest thing for us to do until the upstream C++ SDK adds support, would be to define our own credential chain that forks the default azure credential chain with the added logic that it will inject a Let me know what you think. |
What This PR Does:
This PR add support to authenticate azure-blob-storage through ManagedIdentityCredential when AZURE_CLIENT_ID env is configured and DefaultAzureCredential is failed.
Summary by CodeRabbit
New Features
Bug Fixes