Fix template engine test parallelism issue with GetVisualStudioInstances#55045
Conversation
…M API The VS Setup Configuration COM API (ISetupConfiguration2) has a known concurrency bug that causes failures (exit code 57005/0xDEAD) when multiple processes enumerate VS instances simultaneously. This hits template engine integration tests that run dotnet new in parallel. The previous fix (PR #44930) added an in-process lock (s_guard) but that doesn't help when multiple test processes call the API concurrently. This fix adds: - A named system mutex (Global\DotNetSdk_VSSetupConfiguration) to serialize cross-process access to the COM API - Retry logic with exponential backoff (3 attempts) as a safety net for cases where external processes also call the API without our mutex Fixes #44878 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@joeloff - are you familiar with this space? Looking for an SME to validate this is a reasonable approach. This is being hit consistently in the SDK tests. |
There was a problem hiding this comment.
Pull request overview
This PR mitigates intermittent failures in template engine / dotnet new integration tests caused by concurrent enumeration of Visual Studio instances via the VS Setup Configuration COM API (ISetupConfiguration2::EnumInstances). It does so by serializing access across processes and adding a retry safety net around the COM enumeration.
Changes:
- Add a named system-wide mutex (
Global\DotNetSdk_VSSetupConfiguration) to serialize COM access across concurrently runningdotnetprocesses. - Add retry logic with exponential backoff around the Visual Studio instance enumeration call path.
- Refactor the existing workload enumeration into a new
GetInstalledWorkloadsCorehelper so it can be invoked within the mutex/retry wrapper.
baronfel
left a comment
There was a problem hiding this comment.
This looks reasonable - nitpick idea: any care for jitter to prevent repeated collisions?
9dd6c43 to
5b85111
Compare
- Remove unused s_guard field (would cause CS0414 warning/build failure) - Add random jitter (0-50ms) to retry delays to prevent thundering-herd collisions between processes retrying simultaneously - Clarify MaxRetryAttempts doc to distinguish retries from total attempts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5b85111 to
a4882ff
Compare
Great feedback - copilot address by adding a random offset. |
Summary
The VS Setup Configuration COM API (
ISetupConfiguration2) has a known concurrency bug that causes failures (exit code 57005/0xDEAD) when multiple processes enumerate VS instances simultaneously. This intermittently hits template engine integration tests that rundotnet newin parallel on CI.Root Cause
When multiple
dotnet newprocesses run concurrently (as happens in parallel test execution), each process callsVisualStudioWorkloads.GetInstalledWorkloads→ COMISetupConfiguration2::EnumInstances(). The COM API has an unintended concurrent access issue in a critical section that wasn't locked (tracked internally at https://dev.azure.com/devdiv/DevDiv/_workitems/edit/2241752).The previous fix (PR #44930) added an in-process lock (
s_guard) but that doesn't protect against cross-process concurrent access.Fix
This PR adds two layers of protection:
Named system mutex (
Global\DotNetSdk_VSSetupConfiguration) — serializes cross-process access to the COM API. Alldotnetprocesses on the same machine will take turns calling the VS enumeration API.Retry with exponential backoff (3 attempts, 100ms/200ms/400ms delays) — safety net for cases where external processes (not using our mutex) also call the API concurrently.
Testing
The fix was verified to compile successfully. The intermittent nature of this issue means it can only be validated by observing reduced failure rates on CI over time.
Fixes #44878