Skip to content

Fix Azure CLI authentication for federated token service principals#1274

Merged
rauchy merged 4 commits into
mainfrom
es-1557323
Aug 29, 2025
Merged

Fix Azure CLI authentication for federated token service principals#1274
rauchy merged 4 commits into
mainfrom
es-1557323

Conversation

@rauchy

@rauchy rauchy commented Aug 26, 2025

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

WHAT: This PR enhances the Azure CLI authentication logic in the Databricks Go SDK to properly handle federated token service principals used in AKS workload identity scenarios.

The specific changes include:

  • Enhanced MSI detection logic in config/auth_azure_cli.go to recognize service principals with GUID-like names as federated token authentication
  • Added isGuidLike() helper function to detect GUID patterns (8-4-4-4-12 character format)
  • Updated the authentication flow to skip tenant ID parameters for federated token service principals (treating them like MSI)
  • Added comprehensive test coverage in config/auth_azure_cli_federated_token_test.go

WHY: The existing MSI detection logic only recognized system/user assigned identities by their specific names (systemAssignedIdentity or userAssignedIdentity). However, when using AKS with workload identity, service principals authenticate using federated tokens and show their client ID as the name (e.g., 5817e630-86b3-4f67-a38e-a63e6a1a401c).

This caused the SDK to incorrectly treat federated token service principals as regular service principals, leading to:

  1. SDK passing --tenant <tenant_id> to az account get-access-token
  2. Azure CLI rejecting the request because federated tokens don't work with explicit tenant parameters
  3. Complete authentication failure in AKS environments

The decision to use GUID pattern detection was made because:

  • Federated token service principals consistently show client IDs (GUIDs) as their names
  • This approach is more efficient than a fallback mechanism (no retry needed)
  • It matches the authentication flow observed in working environments where no tenant parameter is used from the start
  • It preserves backward compatibility for all existing authentication methods

How is this tested?

Unit Tests:

  • Added TestAzureCliCredentials_FederatedTokenServicePrincipal which simulates the exact federated token scenario using the client ID pattern from the reported issue
  • Test uses FAIL_IF_TENANT_ID_SET=true environment variable to ensure the fix correctly skips tenant ID usage (test would fail if --tenant parameter was passed)
  • All existing Azure CLI authentication tests continue to pass, ensuring no regressions

Test Coverage Validation:

  • Federated token service principals: Correctly detected and skip tenant ID ✅
  • Traditional MSI (system/user assigned identities): Behavior unchanged ✅
  • Regular service principals: Continue to use tenant ID as before ✅
  • Edge cases: GUID detection handles malformed strings appropriately ✅

The test uses the existing mock infrastructure (testdata/az) rather than custom mocks, ensuring consistency with other authentication tests.

@rauchy rauchy requested a review from hectorcast-db August 26, 2025 13:37
@rauchy rauchy temporarily deployed to test-trigger-is August 26, 2025 13:37 — with GitHub Actions Inactive
@rauchy rauchy temporarily deployed to test-trigger-is August 26, 2025 13:38 — with GitHub Actions Inactive
@rauchy rauchy temporarily deployed to test-trigger-is August 26, 2025 13:50 — with GitHub Actions Inactive
@rauchy rauchy temporarily deployed to test-trigger-is August 26, 2025 13:51 — with GitHub Actions Inactive
Comment thread config/auth_azure_cli.go Outdated
if !isMsi {

// 2. Service principal with client ID as name (likely federated token)
// When user.name is a GUID and user.type is servicePrincipal, it's often federated token auth

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's often federated token auth

What happens when it is not? Could we be introducing a breaking change for existing users?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about regular service principals that use GUID client IDs but sign in with a client secret (not federated tokens)? Since both look like GUIDs, this logic can’t really tell them apart.

If that’s the issue, I can change the flow: first try with --tenant (same as today), and if that fails for service principals, fall back to trying without it. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about regular service principals that use GUID client IDs but sign in with a client secret (not federated tokens)? Since both look like GUIDs, this logic can’t really tell them apart.

Yes. I don't know if it can happen, but I am concerned of breaking any workflow which currently works.

If that’s the issue, I can change the flow: first try with --tenant (same as today), and if that fails for service principals, fall back to trying without it. WDYT?

That makes sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rauchy rauchy requested a review from hectorcast-db August 27, 2025 07:52
@rauchy rauchy temporarily deployed to test-trigger-is August 27, 2025 12:21 — with GitHub Actions Inactive
@rauchy rauchy temporarily deployed to test-trigger-is August 27, 2025 12:21 — with GitHub Actions Inactive
rauchy and others added 4 commits August 29, 2025 09:30
Reproduces issue where service principals authenticated via federated
tokens are incorrectly identified as regular service principals, causing
authentication failures when tenant ID is passed to az account get-access-token.

This happens in AKS environments with workload identity where service
principals have client IDs as names rather than systemAssignedIdentity
or userAssignedIdentity.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Enhances MSI detection logic to recognize service principals with GUID-like
names as federated token authentication, which should not use tenant ID
parameters. This resolves authentication failures in AKS workload identity
environments.

The fix detects federated tokens by checking if account.User.Name matches
the GUID pattern (8-4-4-4-12 format), treating them like MSI for token
retrieval. This avoids tenant ID usage that causes auth failures.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of detecting federated tokens via GUID patterns, use a more robust
fallback mechanism: try with tenant ID first (preserving existing behavior),
then retry without tenant ID if it fails for service principals.

This addresses the concern that GUID detection cannot distinguish between
regular service principals and federated token service principals, as both
use client IDs as names.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

  • PR number: 1274
  • Commit SHA: da953ae23cd3b7496ef61f163579359b690a6e4d

Checks will be approved automatically on success.

@rauchy rauchy temporarily deployed to test-trigger-is August 29, 2025 09:32 — with GitHub Actions Inactive
@rauchy rauchy added this pull request to the merge queue Aug 29, 2025
Merged via the queue into main with commit 1020902 Aug 29, 2025
15 checks passed
@rauchy rauchy deleted the es-1557323 branch August 29, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants