Skip to content

Detect orphaned KG certs via public key modulus comparison#6020

Open
Robbie-Microsoft wants to merge 5 commits into
mainfrom
rginsburg/mtls_restart_orphan_check
Open

Detect orphaned KG certs via public key modulus comparison#6020
Robbie-Microsoft wants to merge 5 commits into
mainfrom
rginsburg/mtls_restart_orphan_check

Conversation

@Robbie-Microsoft
Copy link
Copy Markdown
Contributor

@Robbie-Microsoft Robbie-Microsoft commented May 20, 2026

Problem

After a VM or app restart, KeyGuard regenerates the per-boot private key in the same CNG container while the old cert remains on disk. The cert appears valid (time-wise, HasPrivateKey=true) but the key material has changed. MSAL loads it from the persistent store, attempts an mTLS handshake, and fails.

Fix

Compare the cert's embedded public key (modulus) against the public key currently in the CNG container. A mismatch conclusively means the container was regenerated — the cert is orphaned. The check runs inside WindowsPersistentCertificateCache.Read during candidate selection.

Orphaned certs are removed from the store on detection (not just skipped) so they don't accumulate and don't require the modulus check on every subsequent Read.

Design notes

  • Non-CNG keys (software CSP) are accepted on faith — KG regeneration does not apply
  • GetRSAPrivateKey() returning null for an RSA cert is treated as orphaned (inaccessible = unusable); non-RSA certs (ECDSA, etc.) are accepted on faith
  • CryptographicException during key load is treated as orphaned (inaccessible = unusable)
  • The CNG modified-time heuristic (Check 3 from Dragos's proposal) is intentionally omitted — the modulus comparison is definitive and covers the same case without the false-negative window
  • The internal Read overload accepts an injectable isOrphaned delegate for testability; IPersistentCertificateCache interface is unchanged

Testing

6 new unit tests in PersistentCertificateStoreUnitTests.cs:

  • IsCertKeyOrphaned_ReturnsTrue_For_NullCert
  • IsCertKeyOrphaned_ReturnsFalse_For_ValidCert
  • IsCertKeyOrphaned_ReturnsTrue_For_RsaCert_WithNoPrivateKey
  • PublicKeyMatchesCert_ReturnsTrue_When_KeyMatchesCert
  • PublicKeyMatchesCert_ReturnsFalse_When_ModulusMismatch (passes a different RSACng to simulate post-reboot key replacement)
  • Read_RemovesOrphanedCert_FromStore (injects always-true orphan check, verifies cert is deleted from store and Read returns false)

Companion to #6017 (token_not_after OID eviction — different failure mode).

After a VM or app restart, KeyGuard regenerates the per-boot private key in
the same CNG container while the old cert remains on disk. The cert appears
valid (time-wise, HasPrivateKey=true) but the key material has changed.

Fix: compare the cert's embedded public key modulus against the public key
currently exported from the CNG container. A mismatch conclusively means the
container was regenerated and the cert is orphaned. The check runs inside
WindowsPersistentCertificateCache.Read during candidate selection, so the
loop continues looking for a valid cert rather than returning an unusable one.

Non-CNG keys (software CSP) are accepted on faith since KeyGuard-style
regeneration does not apply to them. CryptographicException during key load
is treated as orphaned (key inaccessible = unusable).

The CNG container modified-time heuristic (Check 3 from the original proposal)
is intentionally omitted. Both it and the modulus comparison detect the same
event, but the modulus check is definitive: independently generated RSA keys
sharing a modulus is computationally infeasible. The modified-time check is a
heuristic with a known false-negative window and adds no additional coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 22:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Windows KeyGuard-specific “orphaned cert” detection to MSAL’s persistent mTLS certificate selection logic by validating that the certificate’s embedded RSA public key matches the current RSA public key in the associated CNG container. This prevents MSAL from reusing a persisted cert whose key container was regenerated after reboot, avoiding repeated SCHANNEL handshake failures.

Changes:

  • Add modulus/exponent comparison helpers (IsCertKeyOrphaned, PublicKeyMatchesCert) to detect CNG-container/key mismatches.
  • Update WindowsPersistentCertificateCache.Read to skip orphaned candidates and continue searching for a usable cert.
  • Add unit tests covering the new helper methods.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/Microsoft.Identity.Test.Unit/ManagedIdentityTests/PersistentCertificateStoreUnitTests.cs Adds tests for orphan detection and RSA public key matching logic.
src/client/Microsoft.Identity.Client/ManagedIdentity/V2/WindowsPersistentCertificateCache.cs Skips persisted cert candidates whose CNG container key no longer matches the cert’s embedded public key.
src/client/Microsoft.Identity.Client/ManagedIdentity/V2/MtlsCertificateCache.cs Introduces orphan-detection helpers based on RSA modulus/exponent comparison with logging on crypto failures.
Comments suppressed due to low confidence (1)

tests/Microsoft.Identity.Test.Unit/ManagedIdentityTests/PersistentCertificateStoreUnitTests.cs:912

  • Same issue as the KeyMatches test: on net48, RSA.Create(2048) commonly returns RSACryptoServiceProvider, so key2 as RSACng will be null and the test will fail. Create key2 as an RSACng explicitly (or gate the test on key2 is RSACng) to keep the multi-targeted test suite passing.
            using var key1 = RSA.Create(2048);
            using var key2 = RSA.Create(2048);
            var rsaCng2 = key2 as RSACng;
            Assert.IsNotNull(rsaCng2, "Expected RSACng on Windows.");

RSA.Create(2048) does not return RSACng on ARM64 Windows (.NET 8).
Use new RSACng(2048) explicitly so the cast is always valid.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Robbie-Microsoft Robbie-Microsoft marked this pull request as ready for review May 20, 2026 22:43
@Robbie-Microsoft Robbie-Microsoft requested a review from a team as a code owner May 20, 2026 22:43
Copilot AI review requested due to automatic review settings May 20, 2026 22:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Robbie-Microsoft and others added 2 commits May 20, 2026 18:57
When IsCertKeyOrphaned detects a cert whose CNG container key no longer
matches the cert's public key (post-reboot KeyGuard regeneration), the
cert is now removed from the Windows cert store instead of just skipped.

Previously the orphaned cert stayed in the store indefinitely, causing
a redundant modulus check on every Read call.

Changes:
- WindowsPersistentCertificateCache.Read: collect orphaned cert thumbprints
  during the scan loop; after the loop, open ReadWrite store and remove them
  (best-effort, wrapped in try/catch)
- Added internal Read overload accepting Func<X509Certificate2, ILoggerAdapter, bool>
  isOrphaned delegate for testability; IPersistentCertificateCache interface unchanged
- Added unit test: Read_RemovesOrphanedCert_FromStore

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GetRSAPrivateKey() returns null for both non-RSA certs (ECDSA, DSA) and
for RSA certs with an inaccessible private key. Previously both fell into
the 'accept on faith' branch, which could allow a useless RSA cert to pass
the orphan check.

Fix: check GetRSAPublicKey() to distinguish RSA certs from non-RSA certs.
An RSA cert with no accessible private key is treated as orphaned.
Non-RSA certs continue to be accepted on faith (KG regeneration does not apply).

Add unit test: IsCertKeyOrphaned_ReturnsTrue_For_RsaCert_WithNoPrivateKey

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/client/Microsoft.Identity.Client/ManagedIdentity/V2/WindowsPersistentCertificateCache.cs:190

  • RemoveByThumbprints uses List<string> + thumbprints.Contains(...) inside a loop over all certificates, making removal O(storeSize * orphanCount). Since this can run on the read path after a reboot, consider using a HashSet<string> (e.g., OrdinalIgnoreCase) for thumbprints (and/or X509Certificate2Collection.Find by thumbprint) to keep the removal cost predictable.
                int removed = 0;
                foreach (var cert in items)
                {
                    try
                    {
                        if (thumbprints.Contains(cert.Thumbprint))
                        {
                            store.Remove(cert);
                            removed++;

…ock orphan removal

- Rename internal Read overload to public TryRead with out param last
  (public Read delegates to TryRead with the default IsCertKeyOrphaned check)
- Extract SnapshotCertificates(store, logger) helper to eliminate 5 duplicate
  try/catch CopyTo fallback blocks across Read, Write, DeleteAllForAlias,
  PruneExpiredForAlias, and RemoveByThumbprints
- Wrap RemoveByThumbprints in InterprocessLock.TryWithAliasLock (300ms,
  best-effort) — same pattern as Write/Delete/DeleteAllForAlias to prevent
  concurrent store writes during orphan cleanup
- Update orphan removal test to call TryRead with the new signature

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants