Skip to content

Commit b2582de

Browse files
committed
Fix for #2804
1 parent 86aff9b commit b2582de

3 files changed

Lines changed: 388 additions & 4 deletions

File tree

docs/instance-discovery-rules.md

Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
# Instance Discovery Rules for MSAL
2+
3+
## Purpose
4+
5+
Instance discovery resolves an authority host (e.g., `login.microsoftonline.com`) to its **metadata**: preferred network host, preferred cache host, and a list of aliases. This metadata enables SSO across aliased environments and ensures tokens are sent to the correct endpoint.
6+
7+
This document describes the rules implemented in MSAL.NET to aid reimplementation in other MSAL libraries.
8+
9+
---
10+
11+
## 1. Core Data Model
12+
13+
Instance discovery produces a single entry per authority environment:
14+
15+
```
16+
InstanceDiscoveryMetadataEntry:
17+
preferred_network: string # host to use for token requests
18+
preferred_cache: string # host to use as cache key
19+
aliases: string[] # all equivalent hosts (used for SSO, cache lookups)
20+
```
21+
22+
A successful network response returns an array of these entries:
23+
24+
```json
25+
{
26+
"tenant_discovery_endpoint": "https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration",
27+
"metadata": [
28+
{
29+
"preferred_network": "login.microsoftonline.com",
30+
"preferred_cache": "login.windows.net",
31+
"aliases": ["login.microsoftonline.com", "login.windows.net", "login.microsoft.com", "sts.windows.net"]
32+
}
33+
]
34+
}
35+
```
36+
37+
---
38+
39+
## 2. Applicability
40+
41+
| Authority Type | Instance Discovery Supported |
42+
|---|---|
43+
| AAD (Entra ID) | ✅ Yes |
44+
| B2C | ❌ No — use self-entry |
45+
| ADFS | ❌ No — use self-entry |
46+
| CIAM / dSTS | ❌ No — use self-entry |
47+
48+
A **self-entry** means: `preferred_network = preferred_cache = aliases = [configured_authority_host]`.
49+
50+
**Rule:** If the authority type does not support instance discovery, skip all providers and immediately return a self-entry.
51+
52+
---
53+
54+
## 3. Known Environments (Hardcoded)
55+
56+
MSAL maintains a static, hardcoded list of known cloud environments and their metadata. This avoids network calls for common clouds.
57+
58+
| Cloud | Aliases | Preferred Network | Preferred Cache |
59+
|---|---|---|---|
60+
| **Public** | `login.microsoftonline.com`, `login.windows.net`, `login.microsoft.com`, `sts.windows.net` | `login.microsoftonline.com` | `login.windows.net` |
61+
| **China** | `login.partner.microsoftonline.cn`, `login.chinacloudapi.cn` | `login.partner.microsoftonline.cn` | `login.partner.microsoftonline.cn` |
62+
| **US Gov** | `login.microsoftonline.us`, `login.usgovcloudapi.net` | `login.microsoftonline.us` | `login.microsoftonline.us` |
63+
| **Germany (legacy)** | `login.microsoftonline.de` | `login.microsoftonline.de` | `login.microsoftonline.de` |
64+
| **US (login-us)** | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` | `login-us.microsoftonline.com` |
65+
| **PPE** | `login.windows-ppe.net`, `sts.windows-ppe.net`, `login.microsoft-ppe.com` | `login.windows-ppe.net` | `login.windows-ppe.net` |
66+
| **Bleu (FR)** | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` | `login.sovcloud-identity.fr` |
67+
| **Delos (DE)** | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` | `login.sovcloud-identity.de` |
68+
| **GovSG** | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` | `login.sovcloud-identity.sg` |
69+
70+
**Known environment check for the known metadata provider**: The known metadata provider is only usable when **all** environments already present in the token cache are themselves known. If any cached environment is unknown, the known provider must be skipped (because the network may return richer alias data).
71+
72+
---
73+
74+
## 4. Discovery Endpoint Selection
75+
76+
When a network call is needed, MSAL must choose which host to call:
77+
78+
| Authority Host | Discovery Endpoint Host |
79+
|---|---|
80+
| Known environment (e.g., `login.microsoftonline.com`) | Same host: `https://{authority_host}/common/discovery/instance` |
81+
| Unknown environment (e.g., `login.microsoft.new`) | Fallback to default trusted host: `https://login.microsoftonline.com/common/discovery/instance` |
82+
83+
The query parameters are:
84+
```
85+
GET https://{discovery_host}/common/discovery/instance
86+
?api-version=1.1
87+
&authorization_endpoint=https://{authority_host}/{tenant}/oauth2/v2.0/authorize
88+
```
89+
90+
---
91+
92+
## 5. Provider Resolution Order
93+
94+
### 5.1 Full Flow (GetMetadataEntryAsync — used during token acquisition)
95+
96+
Providers are consulted in strict priority order. The first non-null result wins.
97+
98+
```
99+
1. [If instance discovery is disabled (WithInstanceDiscovery(false))]
100+
→ Check region discovery provider (regions are not affected by the disable flag)
101+
→ If still null, return self-entry. STOP.
102+
103+
2. Region discovery provider
104+
105+
3. Network call (FetchNetworkMetadataOrFallback)
106+
→ On success: cache all entries by alias in the network cache
107+
→ On "invalid_instance" error: see §6
108+
→ On any other error (404, 502, network failure, etc.): see §7
109+
110+
4. [If still null] Log warning, create self-entry, cache it in network cache
111+
```
112+
113+
### 5.2 Cache-Preferring Flow (GetMetadataEntryTryAvoidNetworkAsync — used during token acquisition with cache check)
114+
115+
This flow tries to avoid network calls when possible:
116+
117+
```
118+
1. Region discovery provider
119+
120+
2. [If instance discovery is disabled] → return self-entry
121+
122+
3. Network cache (static, populated from prior network calls)
123+
124+
4. Known metadata provider (hardcoded, but only if all cached environments are known)
125+
126+
5. Full flow (§5.1)
127+
128+
6. [If still null] Return self-entry
129+
```
130+
131+
### 5.3 Offline Flow (GetMetadataEntryAvoidNetwork — used for GetAccounts/AcquireTokenSilent)
132+
133+
No network calls are ever made:
134+
135+
```
136+
1. [If instance discovery is enabled]:
137+
a. Network cache
138+
b. Known metadata provider (with null existing environments → always usable)
139+
140+
2. [If still null] Return self-entry
141+
```
142+
143+
---
144+
145+
## 6. Error Handling: `invalid_instance`
146+
147+
When the discovery endpoint returns an `invalid_instance` error (the AAD server-side error `AADSTS50049`), this means the authority genuinely does not exist.
148+
149+
| `ValidateAuthority` setting | Behavior |
150+
|---|---|
151+
| `true` (default) | **Throw** `MsalServiceException` with error code `invalid_instance`. Do NOT cache. |
152+
| `false` | Return a self-entry. Continue without validation. |
153+
154+
---
155+
156+
## 7. Error Handling: All Other Errors (404, 502, network failures, etc.)
157+
158+
When instance discovery fails with any error other than `invalid_instance`:
159+
160+
1. **Try the known metadata provider** for the authority host (with empty `existingEnvironmentsInCache` to force lookup).
161+
2. If the known provider has no entry, **create a self-entry**.
162+
3. **Cache the result** in the network cache so that subsequent calls do NOT retry the network call.
163+
4. Return the entry.
164+
165+
**Critical rule**: The fallback entry MUST be cached. Without caching, every token request retries the failing network call, causing performance degradation and unnecessary traffic. (See [issue #5804](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/issues/5804).)
166+
167+
---
168+
169+
## 8. Caching Rules
170+
171+
### 8.1 Network Cache
172+
173+
- **Scope**: Static / process-wide (shared across all app instances in the same process).
174+
- **Key**: Authority host string (case-sensitive).
175+
- **Population**:
176+
- On successful network response: cache each entry keyed by each of its aliases.
177+
- On network failure (non-`invalid_instance`): cache the fallback entry keyed by the authority host.
178+
- **Eviction**: None. Entries persist for the process lifetime.
179+
- **Thread safety**: Must be thread-safe (concurrent dictionary or equivalent).
180+
181+
### 8.2 Known Metadata Cache
182+
183+
- **Scope**: Static / compiled into the library.
184+
- **Immutable**: Never modified at runtime.
185+
- **Guard**: Only usable when all environments in the token cache are themselves known environments.
186+
187+
---
188+
189+
## 9. Interaction with Regions
190+
191+
- Regional discovery (e.g., `centralus.login.microsoft.com`) runs independently of instance discovery.
192+
- Even when instance discovery is disabled (`WithInstanceDiscovery(false)`), region discovery still runs.
193+
- Regional metadata takes precedence when available (checked before the network call).
194+
195+
---
196+
197+
## 10. Interaction with Authority Validation
198+
199+
Authority validation is a separate step that runs **after** instance discovery:
200+
201+
1. Instance discovery resolves metadata (preferred network, aliases).
202+
2. If `ValidateAuthority` is true AND instance discovery is enabled:
203+
- Validate the authority against the OIDC endpoint.
204+
- Cache successful validations by environment.
205+
206+
---
207+
208+
## 11. Validation Test Matrix
209+
210+
The following test scenarios should be implemented in any MSAL that supports instance discovery. Tests use HTTP mocking (no real network calls).
211+
212+
### T1: Known Cloud — Discovery Happens on Same Host
213+
214+
**Setup**: Authority = `https://login.microsoftonline.com/tenant` (or any known cloud host).
215+
**Expected**:
216+
- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance`.
217+
- Token request goes to `https://login.microsoftonline.com/tenant/oauth2/v2.0/token`.
218+
- Token is acquired successfully.
219+
220+
**Test hosts to cover**: `login.microsoftonline.com`, `login.microsoftonline.us`, `login.microsoftonline.de`, `login.partner.microsoftonline.cn`, `login.sovcloud-identity.fr`, `login.sovcloud-identity.de`, `login.sovcloud-identity.sg`.
221+
222+
### T2: Instance Discovery Disabled — No Network Discovery Call
223+
224+
**Setup**: Authority = any (known or unknown), `WithInstanceDiscovery(false)`.
225+
**Expected**:
226+
- No GET request to the discovery endpoint.
227+
- Token request goes directly to the configured authority.
228+
- Token is acquired successfully.
229+
230+
### T3: Unknown Cloud — Discovery Falls Back to Default Trusted Host
231+
232+
**Setup**: Authority = `https://unknown.host.example/tenant` (not in known list).
233+
**Expected**:
234+
- Instance discovery GET request goes to `https://login.microsoftonline.com/common/discovery/instance` (the default trusted host).
235+
- If discovery succeeds: metadata is cached, token request uses the resolved preferred network.
236+
- If discovery fails (e.g., 404): fallback entry is created and cached, token request goes to the original authority.
237+
238+
### T4: Discovery Failure (404/502) — Fallback Is Cached, No Retry
239+
240+
**Setup**: Authority = unknown host. Discovery endpoint returns 404 or 502.
241+
**Expected**:
242+
- First `AcquireTokenForClient`: discovery fails, fallback entry created and cached, token acquired from IdP.
243+
- Second `AcquireTokenForClient` (same scope): token served from cache, no network calls.
244+
- Third `AcquireTokenForClient` (different scope): token acquired from IdP, NO discovery call (fallback is cached).
245+
246+
**HTTP mocks (in order)**:
247+
1. GET discovery → 404 (or 502)
248+
2. POST token → 200 (success)
249+
3. POST token → 200 (success) — for the different-scope call
250+
251+
If the SDK makes an unexpected discovery call, the mock framework should fail.
252+
253+
### T5: Discovery Failure (Network Error / HttpRequestException) — Fallback, No Retry
254+
255+
**Setup**: Authority = unknown host. Discovery endpoint throws a network-level exception.
256+
**Expected**: Same as T4 — fallback is cached, subsequent calls don't retry.
257+
258+
### T6: Discovery Failure with `invalid_instance` — Throw (ValidateAuthority=true)
259+
260+
**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance` error. `ValidateAuthority` = true (default).
261+
**Expected**:
262+
- `MsalServiceException` is thrown with error code `invalid_instance`.
263+
- The known metadata provider is NOT consulted as a fallback.
264+
265+
### T7: Discovery Failure with `invalid_instance` — Proceed (ValidateAuthority=false)
266+
267+
**Setup**: Authority = unknown host. Discovery endpoint returns `invalid_instance`. `ValidateAuthority` = false.
268+
**Expected**:
269+
- No exception thrown.
270+
- A self-entry is returned (authority used as-is).
271+
- Token is acquired successfully.
272+
273+
### T8: Known Metadata — Used When All Cache Environments Are Known
274+
275+
**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries only for known environments.
276+
**Expected**:
277+
- Known metadata provider returns the hardcoded entry.
278+
- No network discovery call.
279+
280+
### T9: Known Metadata Bypassed — Unknown Environment in Cache
281+
282+
**Setup**: Authority = `login.microsoftonline.com`. Token cache contains entries for both known and unknown environments.
283+
**Expected**:
284+
- Known metadata provider is bypassed (returns null).
285+
- Network discovery call is made.
286+
287+
### T10: B2C / ADFS — Instance Discovery Skipped
288+
289+
**Setup**: Authority = B2C or ADFS authority.
290+
**Expected**:
291+
- No instance discovery call.
292+
- Self-entry is returned.
293+
294+
### T11: Airgapped Cloud with Regions — Discovery Disabled
295+
296+
**Setup**: Authority = unknown host. `WithInstanceDiscovery(false)`. `WithAzureRegion("centralus")`.
297+
**Expected**:
298+
- No instance discovery call.
299+
- Region discovery still runs.
300+
- Token request goes to the regionalized endpoint.
301+
302+
### T12: Airgapped Cloud with Regions — Discovery Enabled but Fails
303+
304+
**Setup**: Authority = unknown host. `WithAzureRegion("centralus")`. Instance discovery call throws (e.g., `HttpRequestException`).
305+
**Expected**:
306+
- Instance discovery failure is swallowed.
307+
- Token request still succeeds using the regionalized endpoint.
308+
- No retry of instance discovery on subsequent calls.
309+
310+
---
311+
312+
## 12. Implementation Checklist
313+
314+
- [ ] Hardcode the known cloud metadata table (§3).
315+
- [ ] Implement the three resolution flows (§5.1, §5.2, §5.3).
316+
- [ ] Implement discovery endpoint host selection (§4).
317+
- [ ] Handle `invalid_instance` separately from other errors (§6 vs §7).
318+
- [ ] Cache fallback entries on non-`invalid_instance` failures (§7 — critical).
319+
- [ ] Use a process-wide static cache for network results (§8.1).
320+
- [ ] Guard known metadata usage by checking all cached environments are known (§8.2).
321+
- [ ] Support `WithInstanceDiscovery(false)` to disable network discovery.
322+
- [ ] Ensure region discovery is independent of instance discovery toggle (§9).
323+
- [ ] Self-entry: `preferred_network = preferred_cache = aliases = [authority_host]` (§2).
324+
- [ ] Implement all tests T1–T12 in §11.

src/client/Microsoft.Identity.Client/Instance/Discovery/InstanceDiscoveryManager.cs

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -205,12 +205,16 @@ private async Task<InstanceDiscoveryMetadataEntry> FetchNetworkMetadataOrFallbac
205205
catch (Exception e)
206206
{
207207
requestContext.Logger.Warning(
208-
$"[Instance Discovery] Instance Discovery failed. MSAL will continue without network instance metadata. \n\r" +
208+
$"[Instance Discovery] Instance Discovery failed. MSAL will continue without instance metadata. \n\r" +
209209
$" Exception: {e} ");
210-
211-
return
210+
211+
var fallbackEntry =
212212
_knownMetadataProvider.GetMetadata(authorityUri.Host, Enumerable.Empty<string>(), requestContext.Logger)
213-
?? CreateEntryForSingleAuthority(authorityUri);
213+
?? CreateEntryForSingleAuthority(authorityUri);
214+
215+
_networkCacheMetadataProvider.AddMetadata(authorityUri.Host, fallbackEntry);
216+
217+
return fallbackEntry;
214218
}
215219
}
216220

0 commit comments

Comments
 (0)