|
| 1 | +# Instance Discovery Rules for MSAL |
| 2 | + |
| 3 | +## Purpose |
| 4 | + |
| 5 | +Instance discovery resolves an authority host (e.g., `login.microsoftonline.com`) to its **metadata**: preferred network host, preferred cache host, and a list of aliases. This metadata enables SSO across aliased environments and ensures tokens are sent to the correct endpoint. |
| 6 | + |
| 7 | +This document describes the rules implemented in MSAL.NET to aid reimplementation in other MSAL libraries. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## 1. Core Data Model |
| 12 | + |
| 13 | +Instance discovery produces a single entry per authority environment: |
| 14 | + |
| 15 | +``` |
| 16 | +InstanceDiscoveryMetadataEntry: |
| 17 | + preferred_network: string # host to use for token requests |
| 18 | + preferred_cache: string # host to use as cache key |
| 19 | + aliases: string[] # all equivalent hosts (used for SSO, cache lookups) |
| 20 | +``` |
| 21 | + |
| 22 | +A successful network response returns an array of these entries: |
| 23 | + |
| 24 | +```json |
| 25 | +{ |
| 26 | + "tenant_discovery_endpoint": "https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration", |
| 27 | + "metadata": [ |
| 28 | + { |
| 29 | + "preferred_network": "login.microsoftonline.com", |
| 30 | + "preferred_cache": "login.windows.net", |
| 31 | + "aliases": ["login.microsoftonline.com", "login.windows.net", "login.microsoft.com", "sts.windows.net"] |
| 32 | + } |
| 33 | + ] |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## 2. Applicability |
| 40 | + |
| 41 | +| Authority Type | Instance Discovery Supported | |
| 42 | +|---|---| |
| 43 | +| AAD (Entra ID) | ✅ Yes | |
| 44 | +| B2C | ❌ No — use self-entry | |
| 45 | +| ADFS | ❌ No — use self-entry | |
| 46 | +| CIAM / dSTS | ❌ No — use self-entry | |
| 47 | + |
| 48 | +A **self-entry** means: `preferred_network = preferred_cache = aliases = [configured_authority_host]`. |
| 49 | + |
| 50 | +**Rule:** If the authority type does not support instance discovery, skip all providers and immediately return a self-entry. |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## 3. Known Environments (Hardcoded) |
| 55 | + |
| 56 | +MSAL maintains a static, hardcoded list of known cloud environments and their metadata. This avoids network calls for common clouds. |
| 57 | + |
| 58 | +| Cloud | Aliases | Preferred Network | Preferred Cache | |
| 59 | +|---|---|---|---| |
| 60 | +| **Public** | `login.microsoftonline.com`, `login.windows.net`, `login.microsoft.com`, `sts.windows.net` | `login.microsoftonline.com` | `login.windows.net` | |
| 61 | +| **China** | `login.partner.microsoftonline.cn`, `login.chinacloudapi.cn` | `login.partner.microsoftonline.cn` | `login.partner.microsoftonline.cn` | |
| 62 | +| **US Gov** | `login.microsoftonline.us`, `login.usgovcloudapi.net` | `login.microsoftonline.us` | `login.microsoftonline.us` | |
| 63 | +| **Germany (legacy)** | `login.microsoftonline.de` | `login.microsoftonline.de` | `login.microsoftonline.de` | |
| 64 | +| **US (login-us)** | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` | |
| 65 | +| **PPE** | `login.windows-ppe.net`, `sts.windows-ppe.net`, `login.microsoft-ppe.com` | `login.windows-ppe.net` | `login.windows-ppe.net` | |
| 66 | +| **Bleu (FR)** | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` | |
| 67 | +| **Delos (DE)** | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` | |
| 68 | +| **GovSG** | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` | |
| 69 | + |
| 70 | +**Known environment check for the known metadata provider**: The known metadata provider is only usable when **all** environments already present in the token cache are themselves known. If any cached environment is unknown, the known provider must be skipped (because the network may return richer alias data). |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 4. Discovery Endpoint Selection |
| 75 | + |
| 76 | +When a network call is needed, MSAL must choose which host to call: |
| 77 | + |
| 78 | +| Authority Host | Discovery Endpoint Host | |
| 79 | +|---|---| |
| 80 | +| Known environment (e.g., `login.microsoftonline.com`) | Same host: `https://{authority_host}/common/discovery/instance` | |
| 81 | +| Unknown environment (e.g., `login.microsoft.new`) | Fallback to default trusted host: `https://login.microsoftonline.com/common/discovery/instance` | |
| 82 | + |
| 83 | +The query parameters are: |
| 84 | +``` |
| 85 | +GET https://{discovery_host}/common/discovery/instance |
| 86 | + ?api-version=1.1 |
| 87 | + &authorization_endpoint=https://{authority_host}/{tenant}/oauth2/v2.0/authorize |
| 88 | +``` |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## 5. Provider Resolution Order |
| 93 | + |
| 94 | +### 5.1 Full Flow (GetMetadataEntryAsync — used during token acquisition) |
| 95 | + |
| 96 | +Providers are consulted in strict priority order. The first non-null result wins. |
| 97 | + |
| 98 | +``` |
| 99 | +1. [If instance discovery is disabled (WithInstanceDiscovery(false))] |
| 100 | + → Check region discovery provider (regions are not affected by the disable flag) |
| 101 | + → If still null, return self-entry. STOP. |
| 102 | +
|
| 103 | +2. Region discovery provider |
| 104 | +
|
| 105 | +3. Network call (FetchNetworkMetadataOrFallback) |
| 106 | + → On success: cache all entries by alias in the network cache |
| 107 | + → On "invalid_instance" error: see §6 |
| 108 | + → On any other error (404, 502, network failure, etc.): see §7 |
| 109 | +
|
| 110 | +4. [If still null] Log warning, create self-entry, cache it in network cache |
| 111 | +``` |
| 112 | + |
| 113 | +### 5.2 Cache-Preferring Flow (GetMetadataEntryTryAvoidNetworkAsync — used during token acquisition with cache check) |
| 114 | + |
| 115 | +This flow tries to avoid network calls when possible: |
| 116 | + |
| 117 | +``` |
| 118 | +1. Region discovery provider |
| 119 | +
|
| 120 | +2. [If instance discovery is disabled] → return self-entry |
| 121 | +
|
| 122 | +3. Network cache (static, populated from prior network calls) |
| 123 | +
|
| 124 | +4. Known metadata provider (hardcoded, but only if all cached environments are known) |
| 125 | +
|
| 126 | +5. Full flow (§5.1) |
| 127 | +
|
| 128 | +6. [If still null] Return self-entry |
| 129 | +``` |
| 130 | + |
| 131 | +### 5.3 Offline Flow (GetMetadataEntryAvoidNetwork — used for GetAccounts/AcquireTokenSilent) |
| 132 | + |
| 133 | +No network calls are ever made: |
| 134 | + |
| 135 | +``` |
| 136 | +1. [If instance discovery is enabled]: |
| 137 | + a. Network cache |
| 138 | + b. Known metadata provider (with null existing environments → always usable) |
| 139 | +
|
| 140 | +2. [If still null] Return self-entry |
| 141 | +``` |
| 142 | + |
| 143 | +--- |
| 144 | + |
| 145 | +## 6. Error Handling: `invalid_instance` |
| 146 | + |
| 147 | +When the discovery endpoint returns an `invalid_instance` error (the AAD server-side error `AADSTS50049`), this means the authority genuinely does not exist. |
| 148 | + |
| 149 | +| `ValidateAuthority` setting | Behavior | |
| 150 | +|---|---| |
| 151 | +| `true` (default) | **Throw** `MsalServiceException` with error code `invalid_instance`. Do NOT cache. | |
| 152 | +| `false` | Return a self-entry. Continue without validation. | |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## 7. Error Handling: All Other Errors (404, 502, network failures, etc.) |
| 157 | + |
| 158 | +When instance discovery fails with any error other than `invalid_instance`: |
| 159 | + |
| 160 | +1. **Try the known metadata provider** for the authority host (with empty `existingEnvironmentsInCache` to force lookup). |
| 161 | +2. If the known provider has no entry, **create a self-entry**. |
| 162 | +3. **Cache the result** in the network cache so that subsequent calls do NOT retry the network call. |
| 163 | +4. Return the entry. |
| 164 | + |
| 165 | +**Critical rule**: The fallback entry MUST be cached. Without caching, every token request retries the failing network call, causing performance degradation and unnecessary traffic. (See [issue #5804](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/issues/5804).) |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +## 8. Caching Rules |
| 170 | + |
| 171 | +### 8.1 Network Cache |
| 172 | + |
| 173 | +- **Scope**: Static / process-wide (shared across all app instances in the same process). |
| 174 | +- **Key**: Authority host string (case-sensitive). |
| 175 | +- **Population**: |
| 176 | + - On successful network response: cache each entry keyed by each of its aliases. |
| 177 | + - On network failure (non-`invalid_instance`): cache the fallback entry keyed by the authority host. |
| 178 | +- **Eviction**: None. Entries persist for the process lifetime. |
| 179 | +- **Thread safety**: Must be thread-safe (concurrent dictionary or equivalent). |
| 180 | + |
| 181 | +### 8.2 Known Metadata Cache |
| 182 | + |
| 183 | +- **Scope**: Static / compiled into the library. |
| 184 | +- **Immutable**: Never modified at runtime. |
| 185 | +- **Guard**: Only usable when all environments in the token cache are themselves known environments. |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## 9. Interaction with Regions |
| 190 | + |
| 191 | +- Regional discovery (e.g., `centralus.login.microsoft.com`) runs independently of instance discovery. |
| 192 | +- Even when instance discovery is disabled (`WithInstanceDiscovery(false)`), region discovery still runs. |
| 193 | +- Regional metadata takes precedence when available (checked before the network call). |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## 10. Interaction with Authority Validation |
| 198 | + |
| 199 | +Authority validation is a separate step that runs **after** instance discovery: |
| 200 | + |
| 201 | +1. Instance discovery resolves metadata (preferred network, aliases). |
| 202 | +2. If `ValidateAuthority` is true AND instance discovery is enabled: |
| 203 | + - Validate the authority against the OIDC endpoint. |
| 204 | + - Cache successful validations by environment. |
| 205 | + |
| 206 | +--- |
| 207 | + |
| 208 | +## 11. Validation Test Matrix |
| 209 | + |
| 210 | +The following test scenarios should be implemented in any MSAL that supports instance discovery. Tests use HTTP mocking (no real network calls). |
| 211 | + |
| 212 | +### T1: Known Cloud — Discovery Happens on Same Host |
| 213 | + |
| 214 | +**Setup**: Authority = `https://login.microsoftonline.com/tenant` (or any known cloud host). |
| 215 | +**Expected**: |
| 216 | +- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance`. |
| 217 | +- Token request goes to `https://login.microsoftonline.com/tenant/oauth2/v2.0/token`. |
| 218 | +- Token is acquired successfully. |
| 219 | + |
| 220 | +**Test hosts to cover**: `login.microsoftonline.com`, `login.microsoftonline.us`, `login.microsoftonline.de`, `login.partner.microsoftonline.cn`, `login.sovcloud-identity.fr`, `login.sovcloud-identity.de`, `login.sovcloud-identity.sg`. |
| 221 | + |
| 222 | +### T2: Instance Discovery Disabled — No Network Discovery Call |
| 223 | + |
| 224 | +**Setup**: Authority = any (known or unknown), `WithInstanceDiscovery(false)`. |
| 225 | +**Expected**: |
| 226 | +- No GET request to the discovery endpoint. |
| 227 | +- Token request goes directly to the configured authority. |
| 228 | +- Token is acquired successfully. |
| 229 | + |
| 230 | +### T3: Unknown Cloud — Discovery Falls Back to Default Trusted Host |
| 231 | + |
| 232 | +**Setup**: Authority = `https://unknown.host.example/tenant` (not in known list). |
| 233 | +**Expected**: |
| 234 | +- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance` (the default trusted host). |
| 235 | +- If discovery succeeds: metadata is cached, token request uses the resolved preferred network. |
| 236 | +- If discovery fails (e.g., 404): fallback entry is created and cached, token request goes to the original authority. |
| 237 | + |
| 238 | +### T4: Discovery Failure (404/502) — Fallback Is Cached, No Retry |
| 239 | + |
| 240 | +**Setup**: Authority = unknown host. Discovery endpoint returns 404 or 502. |
| 241 | +**Expected**: |
| 242 | +- First `AcquireTokenForClient`: discovery fails, fallback entry created and cached, token acquired from IdP. |
| 243 | +- Second `AcquireTokenForClient` (same scope): token served from cache, no network calls. |
| 244 | +- Third `AcquireTokenForClient` (different scope): token acquired from IdP, NO discovery call (fallback is cached). |
| 245 | + |
| 246 | +**HTTP mocks (in order)**: |
| 247 | +1. GET discovery → 404 (or 502) |
| 248 | +2. POST token → 200 (success) |
| 249 | +3. POST token → 200 (success) — for the different-scope call |
| 250 | + |
| 251 | +If the SDK makes an unexpected discovery call, the mock framework should fail. |
| 252 | + |
| 253 | +### T5: Discovery Failure (Network Error / HttpRequestException) — Fallback, No Retry |
| 254 | + |
| 255 | +**Setup**: Authority = unknown host. Discovery endpoint throws a network-level exception. |
| 256 | +**Expected**: Same as T4 — fallback is cached, subsequent calls don't retry. |
| 257 | + |
| 258 | +### T6: Discovery Failure with `invalid_instance` — Throw (ValidateAuthority=true) |
| 259 | + |
| 260 | +**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance` error. `ValidateAuthority` = true (default). |
| 261 | +**Expected**: |
| 262 | +- `MsalServiceException` is thrown with error code `invalid_instance`. |
| 263 | +- The known metadata provider is NOT consulted as a fallback. |
| 264 | + |
| 265 | +### T7: Discovery Failure with `invalid_instance` — Proceed (ValidateAuthority=false) |
| 266 | + |
| 267 | +**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance`. `ValidateAuthority` = false. |
| 268 | +**Expected**: |
| 269 | +- No exception thrown. |
| 270 | +- A self-entry is returned (authority used as-is). |
| 271 | +- Token is acquired successfully. |
| 272 | + |
| 273 | +### T8: Known Metadata — Used When All Cache Environments Are Known |
| 274 | + |
| 275 | +**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries only for known environments. |
| 276 | +**Expected**: |
| 277 | +- Known metadata provider returns the hardcoded entry. |
| 278 | +- No network discovery call. |
| 279 | + |
| 280 | +### T9: Known Metadata Bypassed — Unknown Environment in Cache |
| 281 | + |
| 282 | +**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries for both known and unknown environments. |
| 283 | +**Expected**: |
| 284 | +- Known metadata provider is bypassed (returns null). |
| 285 | +- Network discovery call is made. |
| 286 | + |
| 287 | +### T10: B2C / ADFS — Instance Discovery Skipped |
| 288 | + |
| 289 | +**Setup**: Authority = B2C or ADFS authority. |
| 290 | +**Expected**: |
| 291 | +- No instance discovery call. |
| 292 | +- Self-entry is returned. |
| 293 | + |
| 294 | +### T11: Airgapped Cloud with Regions — Discovery Disabled |
| 295 | + |
| 296 | +**Setup**: Authority = unknown host. `WithInstanceDiscovery(false)`. `WithAzureRegion("centralus")`. |
| 297 | +**Expected**: |
| 298 | +- No instance discovery call. |
| 299 | +- Region discovery still runs. |
| 300 | +- Token request goes to the regionalized endpoint. |
| 301 | + |
| 302 | +### T12: Airgapped Cloud with Regions — Discovery Enabled but Fails |
| 303 | + |
| 304 | +**Setup**: Authority = unknown host. `WithAzureRegion("centralus")`. Instance discovery call throws (e.g., `HttpRequestException`). |
| 305 | +**Expected**: |
| 306 | +- Instance discovery failure is swallowed. |
| 307 | +- Token request still succeeds using the regionalized endpoint. |
| 308 | +- No retry of instance discovery on subsequent calls. |
| 309 | + |
| 310 | +--- |
| 311 | + |
| 312 | +## 12. Implementation Checklist |
| 313 | + |
| 314 | +- [ ] Hardcode the known cloud metadata table (§3). |
| 315 | +- [ ] Implement the three resolution flows (§5.1, §5.2, §5.3). |
| 316 | +- [ ] Implement discovery endpoint host selection (§4). |
| 317 | +- [ ] Handle `invalid_instance` separately from other errors (§6 vs §7). |
| 318 | +- [ ] Cache fallback entries on non-`invalid_instance` failures (§7 — critical). |
| 319 | +- [ ] Use a process-wide static cache for network results (§8.1). |
| 320 | +- [ ] Guard known metadata usage by checking all cached environments are known (§8.2). |
| 321 | +- [ ] Support `WithInstanceDiscovery(false)` to disable network discovery. |
| 322 | +- [ ] Ensure region discovery is independent of instance discovery toggle (§9). |
| 323 | +- [ ] Self-entry: `preferred_network = preferred_cache = aliases = [authority_host]` (§2). |
| 324 | +- [ ] Implement all tests T1–T12 in §11. |
0 commit comments