Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
324 changes: 324 additions & 0 deletions docs/instance-discovery-rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,324 @@
# Instance Discovery Rules for MSAL

## Purpose

Instance discovery resolves an authority host (e.g., `login.microsoftonline.com`) to its **metadata**: preferred network host, preferred cache host, and a list of aliases. This metadata enables SSO across aliased environments and ensures tokens are sent to the correct endpoint.

This document describes the rules implemented in MSAL.NET to aid reimplementation in other MSAL libraries.

---

## 1. Core Data Model

Instance discovery produces a single entry per authority environment:

```
InstanceDiscoveryMetadataEntry:
preferred_network: string # host to use for token requests
preferred_cache: string # host to use as cache key
aliases: string[] # all equivalent hosts (used for SSO, cache lookups)
```

A successful network response returns an array of these entries:

```json
{
"tenant_discovery_endpoint": "https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration",
"metadata": [
{
"preferred_network": "login.microsoftonline.com",
"preferred_cache": "login.windows.net",
"aliases": ["login.microsoftonline.com", "login.windows.net", "login.microsoft.com", "sts.windows.net"]
}
]
}
```

---

## 2. Applicability

| Authority Type | Instance Discovery Supported |
|---|---|
| AAD (Entra ID) | ✅ Yes |
| B2C | ❌ No — use self-entry |
| ADFS | ❌ No — use self-entry |
| CIAM / dSTS | ❌ No — use self-entry |
Comment thread
Avery-Dunn marked this conversation as resolved.

A **self-entry** means: `preferred_network = preferred_cache = aliases = [configured_authority_host]`.

**Rule:** If the authority type does not support instance discovery, skip all providers and immediately return a self-entry.

---

## 3. Known Environments (Hardcoded)

MSAL maintains a static, hardcoded list of known cloud environments and their metadata. This avoids network calls for common clouds.

| Cloud | Aliases | Preferred Network | Preferred Cache |
|---|---|---|---|
| **Public** | `login.microsoftonline.com`, `login.windows.net`, `login.microsoft.com`, `sts.windows.net` | `login.microsoftonline.com` | `login.windows.net` |
| **China** | `login.partner.microsoftonline.cn`, `login.chinacloudapi.cn` | `login.partner.microsoftonline.cn` | `login.partner.microsoftonline.cn` |
| **US Gov** | `login.microsoftonline.us`, `login.usgovcloudapi.net` | `login.microsoftonline.us` | `login.microsoftonline.us` |
| **Germany (legacy)** | `login.microsoftonline.de` | `login.microsoftonline.de` | `login.microsoftonline.de` |
| **US (login-us)** | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` |
| **PPE** | `login.windows-ppe.net`, `sts.windows-ppe.net`, `login.microsoft-ppe.com` | `login.windows-ppe.net` | `login.windows-ppe.net` |
| **Bleu (FR)** | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` |
| **Delos (DE)** | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` |
| **GovSG** | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` |

**Known environment check for the known metadata provider**: The known metadata provider is only usable when **all** environments already present in the token cache are themselves known. If any cached environment is unknown, the known provider must be skipped (because the network may return richer alias data).

---

## 4. Discovery Endpoint Selection

When a network call is needed, MSAL must choose which host to call:

| Authority Host | Discovery Endpoint Host |
|---|---|
| Known environment (e.g., `login.microsoftonline.com`) | Same host: `https://{authority_host}/common/discovery/instance` |
| Unknown environment (e.g., `login.microsoft.new`) | Fallback to default trusted host: `https://login.microsoftonline.com/common/discovery/instance` |

The query parameters are:
```
GET https://{discovery_host}/common/discovery/instance
?api-version=1.1
&authorization_endpoint=https://{authority_host}/{tenant}/oauth2/v2.0/authorize
```

---

## 5. Provider Resolution Order

### 5.1 Full Flow (GetMetadataEntryAsync — used during token acquisition)

Providers are consulted in strict priority order. The first non-null result wins.

```
1. [If instance discovery is disabled (WithInstanceDiscovery(false))]
→ Check region discovery provider (regions are not affected by the disable flag)
→ If still null, return self-entry. STOP.

2. Region discovery provider

3. Network call (FetchNetworkMetadataOrFallback)
→ On success: cache all entries by alias in the network cache
→ On "invalid_instance" error: see §6
→ On any other error (404, 502, network failure, etc.): see §7

4. [If still null] Log warning, create self-entry, cache it in network cache
```

### 5.2 Cache-Preferring Flow (GetMetadataEntryTryAvoidNetworkAsync — used during token acquisition with cache check)

This flow tries to avoid network calls when possible:

```
1. Region discovery provider

2. [If instance discovery is disabled] → return self-entry

3. Network cache (static, populated from prior network calls)

4. Known metadata provider (hardcoded, but only if all cached environments are known)

5. Full flow (§5.1)

6. [If still null] Return self-entry
```

### 5.3 Offline Flow (GetMetadataEntryAvoidNetwork — used for GetAccounts/AcquireTokenSilent)

No network calls are ever made:

```
1. [If instance discovery is enabled]:
a. Network cache
b. Known metadata provider (with null existing environments → always usable)

2. [If still null] Return self-entry
```

---

## 6. Error Handling: `invalid_instance`

When the discovery endpoint returns an `invalid_instance` error (the AAD server-side error `AADSTS50049`), this means the authority genuinely does not exist.

| `ValidateAuthority` setting | Behavior |
|---|---|
| `true` (default) | **Throw** `MsalServiceException` with error code `invalid_instance`. Do NOT cache. |
| `false` | Return a self-entry. Continue without validation. |

---

## 7. Error Handling: All Other Errors (404, 502, network failures, etc.)

When instance discovery fails with any error other than `invalid_instance`:

1. **Try the known metadata provider** for the authority host (with empty `existingEnvironmentsInCache` to force lookup).
2. If the known provider has no entry, **create a self-entry**.
3. **Cache the result** in the network cache so that subsequent calls do NOT retry the network call.
4. Return the entry.

**Critical rule**: The fallback entry MUST be cached. Without caching, every token request retries the failing network call, causing performance degradation and unnecessary traffic. (See [issue #5804](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/issues/5804).)

---

## 8. Caching Rules

### 8.1 Network Cache

- **Scope**: Static / process-wide (shared across all app instances in the same process).
- **Key**: Authority host string (case-sensitive).
- **Population**:
- On successful network response: cache each entry keyed by each of its aliases.
- On network failure (non-`invalid_instance`): cache the fallback entry keyed by the authority host.
- **Eviction**: None. Entries persist for the process lifetime.
- **Thread safety**: Must be thread-safe (concurrent dictionary or equivalent).

### 8.2 Known Metadata Cache

- **Scope**: Static / compiled into the library.
- **Immutable**: Never modified at runtime.
- **Guard**: Only usable when all environments in the token cache are themselves known environments.

---

## 9. Interaction with Regions

- Regional discovery (e.g., `centralus.login.microsoft.com`) runs independently of instance discovery.
- Even when instance discovery is disabled (`WithInstanceDiscovery(false)`), region discovery still runs.
- Regional metadata takes precedence when available (checked before the network call).

---

## 10. Interaction with Authority Validation

Authority validation is a separate step that runs **after** instance discovery:

1. Instance discovery resolves metadata (preferred network, aliases).
2. If `ValidateAuthority` is true AND instance discovery is enabled:
- Validate the authority against the OIDC endpoint.
- Cache successful validations by environment.

---

## 11. Validation Test Matrix

The following test scenarios should be implemented in any MSAL that supports instance discovery. Tests use HTTP mocking (no real network calls).

### T1: Known Cloud — Discovery Happens on Same Host

**Setup**: Authority = `https://login.microsoftonline.com/tenant` (or any known cloud host).
**Expected**:
- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance`.
- Token request goes to `https://login.microsoftonline.com/tenant/oauth2/v2.0/token`.
- Token is acquired successfully.

**Test hosts to cover**: `login.microsoftonline.com`, `login.microsoftonline.us`, `login.microsoftonline.de`, `login.partner.microsoftonline.cn`, `login.sovcloud-identity.fr`, `login.sovcloud-identity.de`, `login.sovcloud-identity.sg`.

### T2: Instance Discovery Disabled — No Network Discovery Call

**Setup**: Authority = any (known or unknown), `WithInstanceDiscovery(false)`.
**Expected**:
- No GET request to the discovery endpoint.
- Token request goes directly to the configured authority.
- Token is acquired successfully.

### T3: Unknown Cloud — Discovery Falls Back to Default Trusted Host

**Setup**: Authority = `https://unknown.host.example/tenant` (not in known list).
**Expected**:
- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance` (the default trusted host).
- If discovery succeeds: metadata is cached, token request uses the resolved preferred network.
- If discovery fails (e.g., 404): fallback entry is created and cached, token request goes to the original authority.

### T4: Discovery Failure (404/502) — Fallback Is Cached, No Retry

**Setup**: Authority = unknown host. Discovery endpoint returns 404 or 502.
**Expected**:
- First `AcquireTokenForClient`: discovery fails, fallback entry created and cached, token acquired from IdP.
- Second `AcquireTokenForClient` (same scope): token served from cache, no network calls.
- Third `AcquireTokenForClient` (different scope): token acquired from IdP, NO discovery call (fallback is cached).

**HTTP mocks (in order)**:
1. GET discovery → 404 (or 502)
2. POST token → 200 (success)
3. POST token → 200 (success) — for the different-scope call

If the SDK makes an unexpected discovery call, the mock framework should fail.

### T5: Discovery Failure (Network Error / HttpRequestException) — Fallback, No Retry

**Setup**: Authority = unknown host. Discovery endpoint throws a network-level exception.
**Expected**: Same as T4 — fallback is cached, subsequent calls don't retry.

### T6: Discovery Failure with `invalid_instance` — Throw (ValidateAuthority=true)

**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance` error. `ValidateAuthority` = true (default).
**Expected**:
- `MsalServiceException` is thrown with error code `invalid_instance`.
- The known metadata provider is NOT consulted as a fallback.

### T7: Discovery Failure with `invalid_instance` — Proceed (ValidateAuthority=false)

**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance`. `ValidateAuthority` = false.
**Expected**:
- No exception thrown.
- A self-entry is returned (authority used as-is).
- Token is acquired successfully.

### T8: Known Metadata — Used When All Cache Environments Are Known

**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries only for known environments.
**Expected**:
- Known metadata provider returns the hardcoded entry.
- No network discovery call.

### T9: Known Metadata Bypassed — Unknown Environment in Cache

**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries for both known and unknown environments.
**Expected**:
- Known metadata provider is bypassed (returns null).
- Network discovery call is made.

### T10: B2C / ADFS — Instance Discovery Skipped

**Setup**: Authority = B2C or ADFS authority.
**Expected**:
- No instance discovery call.
- Self-entry is returned.

### T11: Airgapped Cloud with Regions — Discovery Disabled

**Setup**: Authority = unknown host. `WithInstanceDiscovery(false)`. `WithAzureRegion("centralus")`.
**Expected**:
- No instance discovery call.
- Region discovery still runs.
- Token request goes to the regionalized endpoint.

### T12: Airgapped Cloud with Regions — Discovery Enabled but Fails

**Setup**: Authority = unknown host. `WithAzureRegion("centralus")`. Instance discovery call throws (e.g., `HttpRequestException`).
**Expected**:
- Instance discovery failure is swallowed.
- Token request still succeeds using the regionalized endpoint.
- No retry of instance discovery on subsequent calls.

---

## 12. Implementation Checklist

- [ ] Hardcode the known cloud metadata table (§3).
- [ ] Implement the three resolution flows (§5.1, §5.2, §5.3).
- [ ] Implement discovery endpoint host selection (§4).
- [ ] Handle `invalid_instance` separately from other errors (§6 vs §7).
- [ ] Cache fallback entries on non-`invalid_instance` failures (§7 — critical).
- [ ] Use a process-wide static cache for network results (§8.1).
- [ ] Guard known metadata usage by checking all cached environments are known (§8.2).
- [ ] Support `WithInstanceDiscovery(false)` to disable network discovery.
- [ ] Ensure region discovery is independent of instance discovery toggle (§9).
- [ ] Self-entry: `preferred_network = preferred_cache = aliases = [authority_host]` (§2).
- [ ] Implement all tests T1–T12 in §11.
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ namespace Microsoft.Identity.Client.Instance.Discovery
internal interface INetworkCacheMetadataProvider
{
void AddMetadata(string environment, InstanceDiscoveryMetadataEntry entry);

/// <summary>
/// Caches <paramref name="entry"/> under each host in <paramref name="entry"/>.Aliases.
/// If the entry has no aliases, falls back to <paramref name="fallbackEnvironment"/>.
/// </summary>
void AddMetadataWithAliases(InstanceDiscoveryMetadataEntry entry, string fallbackEnvironment);

InstanceDiscoveryMetadataEntry GetMetadata(string environment, ILoggerAdapter logger);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -202,15 +202,19 @@ private async Task<InstanceDiscoveryMetadataEntry> FetchNetworkMetadataOrFallbac
requestContext.Logger.Error($"[Instance Discovery] Instance discovery failed - invalid instance! ");
throw;
}
catch (Exception e)
catch (Exception e) when (!requestContext.UserCancellationToken.IsCancellationRequested)
{
requestContext.Logger.Warning(
$"[Instance Discovery] Instance Discovery failed. MSAL will continue without network instance metadata. \n\r" +
$"[Instance Discovery] Instance Discovery failed. MSAL will continue without instance metadata. \n\r" +
$" Exception: {e} ");
return

var fallbackEntry =
_knownMetadataProvider.GetMetadata(authorityUri.Host, Enumerable.Empty<string>(), requestContext.Logger)
?? CreateEntryForSingleAuthority(authorityUri);
?? CreateEntryForSingleAuthority(authorityUri);

_networkCacheMetadataProvider.AddMetadataWithAliases(fallbackEntry, authorityUri.Host);

return fallbackEntry;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,23 @@ public void AddMetadata(string environment, InstanceDiscoveryMetadataEntry entry
s_cache.AddOrUpdate(environment, entry, (_, _) => entry);
}

public void AddMetadataWithAliases(InstanceDiscoveryMetadataEntry entry, string fallbackEnvironment)
{
string[] aliases = entry?.Aliases;
if (aliases != null && aliases.Length > 0)
{
foreach (string alias in aliases)
{
if (!string.IsNullOrWhiteSpace(alias))
s_cache.AddOrUpdate(alias, entry, (_, _) => entry);
}
}
else if (!string.IsNullOrWhiteSpace(fallbackEnvironment))
{
s_cache.AddOrUpdate(fallbackEnvironment, entry, (_, _) => entry);
}
}
Comment thread
Avery-Dunn marked this conversation as resolved.

internal static void ResetStaticCacheForTest()
{
s_cache.Clear();
Expand Down
Loading
Loading