Skip to content

chore: Mitigate iOS device test timeouts#5063

Closed
jamescrosswell wants to merge 4 commits intomainfrom
ios-devicetests-timeout
Closed

chore: Mitigate iOS device test timeouts#5063
jamescrosswell wants to merge 4 commits intomainfrom
ios-devicetests-timeout

Conversation

@jamescrosswell
Copy link
Copy Markdown
Collaborator

@jamescrosswell jamescrosswell commented Mar 25, 2026

Problem

The iOS device tests workflow is frequently timing out and failing in CI. ... e.g. this run:

 Installing application 'Sentry.Maui.Device.TestApp' on 'iPhone 17'
dbug: Installing '/Users/runner/work/sentry-dotnet/sentry-dotnet/test/Sentry.Maui.Device.TestApp/bin/Release/net10.0-ios/iossimulator-arm64/Sentry.Maui.Device.TestApp.app' to 'iPhone 17' (38.71 MB)
dbug: 
dbug: Running /Users/runner/.dotnet/tools/.store/microsoft.dotnet.xharness.cli/11.0.0-prerelease.26117.1/microsoft.dotnet.xharness.cli/11.0.0-prerelease.26117.1/tools/net10.0/any/../../../runtimes/any/native/mlaunch/bin/mlaunch
dbug: Using Xcode 26.2 found in /Applications/Xcode_26.2.app
dbug: xcrun simctl list --json --json-output /var/folders/t5/f77_gwnj6p95qxy9py3fckx00000gn/T/tmpwuEbrW.tmp
dbug: Xamarin.Hosting: No need to boot (already booted): iPhone 17
dbug: Xamarin.Hosting: Installing on iPhone 17 (E9565FE2-ACE0-4805-8655-C0AE78642343) by executing 'xcrun simctl install E9565FE2-ACE0-4805-8655-C0AE78642343 /Users/runner/work/sentry-dotnet/sentry-dotnet/test/Sentry.Maui.Device.TestApp/bin/Release/net10.0-ios/iossimulator-arm64/Sentry.Maui.Device.TestApp.app'
fail: Cancelling the run after 600 seconds as application failed to launch in time
dbug: Killing process 73803 as it was cancelled
dbug: Process mlaunch exited with 137
dbug: Process 57725 already exited or busy: No process is associated with this object.
dbug: Killing process 57725 as it was cancelled
dbug: System.OperationCanceledException: The operation was canceled.
         at System.Threading.CancellationToken.ThrowOperationCanceledException()
         at System.Threading.CancellationToken.ThrowIfCancellationRequested()
         at Microsoft.DotNet.XHarness.Apple.BaseOrchestrator.InstallApp(AppBundleInformation appBundleInfo, IDevice device, TestTargetOs target, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.XHarness.Apple/Orchestration/BaseOrchestrator.cs:line 393
         at Microsoft.DotNet.XHarness.Apple.BaseOrchestrator.OrchestrateOperationInternal(TestTargetOs target, String deviceName, Boolean includeWirelessDevices, Boolean resetSimulator, Boolean enableLldb, GetAppBundleInfoFunc getAppBundle, ExecuteMacCatalystAppFunc executeMacCatalystApp, ExecuteAppFunc executeApp, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.XHarness.Apple/Orchestration/BaseOrchestrator.cs:line 291
         at Microsoft.DotNet.XHarness.Apple.BaseOrchestrator.OrchestrateOperation(TestTargetOs target, String deviceName, Boolean includeWirelessDevices, Boolean resetSimulator, Boolean enableLldb, GetAppBundleInfoFunc getAppBundle, ExecuteMacCatalystAppFunc executeMacCatalystApp, ExecuteAppFunc executeApp, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.XHarness.Apple/Orchestration/BaseOrchestrator.cs:line 96
XHarness exit code: 90 (APP_LAUNCH_TIMEOUT)
Exception: /Users/runner/work/sentry-dotnet/sentry-dotnet/scripts/device-test.ps1:106
Line |
 106 |                  throw 'xharness run failed with non-zero exit code'
     |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | xharness run failed with non-zero exit code
Error: Process completed with exit code 1.

After rerunning it (manually) once or twice it usually succeeds.

Attempted solution

This PR attemts to mitigate this by doing a couple of things:

  1. Pre-boot the simulator with xcrun simctl bootstatus -b, which waits until the simulator is truly operational (not just in "Booted" state). The errors show the simulator as already "booted" but xcrun simctl install still hangs — so maybe the simulator was only partially initialized when running these tests previously.
  2. Moved retry logic (with simulator reset between attempts) from YAML into device-test.ps1, so the YAML needs just one step instead of a continue-on-error + duplicate retry pattern.

The retry count has also been increased to 3 (from 2) and to account for this + the fact that the retries have been consolidated into thes ps1 script, the timeout has been increased from 40mins (for an individual attempt) to 120 minutes (for all 3 attempts combined). That's a massive timeout, but frequently if a human being wasn't monitoring the CI runs previously, it might be as much as 24 hours before someone manually reran the iOS device tests (something I've been doing personally many times a day across multiple different branches, with no small amount of swearing accompanying the chore).

In any case - 🤞🏻

#skip-changelog

@jamescrosswell jamescrosswell changed the title Mitigate iOS device test timeouts chore: Mitigate iOS device test timeouts Mar 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 25, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.00%. Comparing base (bcc6476) to head (ffbfcd0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5063      +/-   ##
==========================================
- Coverage   74.01%   74.00%   -0.01%     
==========================================
  Files         499      499              
  Lines       18065    18065              
  Branches     3518     3518              
==========================================
- Hits        13370    13369       -1     
- Misses       3836     3841       +5     
+ Partials      859      855       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +157 to +162
return [PSCustomObject]@{
Udid = $selected.Device.udid
Name = $selected.Device.name
DeviceType = $selected.Device.deviceTypeIdentifier
Runtime = $runtimeKey
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: Get-IosSimulatorUdid return type change breaks ios.Tests.ps1

The function Get-IosSimulatorUdid now returns a PSCustomObject instead of a string UDID. However, integration-test/ios.Tests.ps1 calls this function at line 11 and uses $simulator directly with xcrun simctl install, xcrun simctl uninstall, xcrun simctl spawn, and xcrun simctl launch commands which expect a string UDID. This will cause runtime failures in the iOS integration tests because PowerShell will coerce the object to a string representation instead of the actual UDID value.

Verification

Read scripts/device-test-utils.ps1 to see the return type change from string to PSCustomObject. Read integration-test/ios.Tests.ps1 which calls Get-IosSimulatorUdid at line 11 and uses $simulator directly at lines 43, 53-54, 68-69, 70-74 with xcrun simctl commands. Confirmed ios.Tests.ps1 is not in the list of files modified in this PR, meaning it won't be updated to handle the new return type.

Identified by Warden code-review · 5YQ-FP3

Start-Sleep -Seconds 5 # give the daemon time to re-initialise

# Create a brand-new simulator with the same device type & runtime
$newUdid = (& xcrun simctl create $SimInfo.Name $SimInfo.DeviceType $SimInfo.Runtime).Trim()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error check on xcrun simctl create can cause cascading failures with invalid UDID

Line 23 captures the output of xcrun simctl create but does not check $LASTEXITCODE. If the create command fails (e.g., invalid DeviceType/Runtime, or resource exhaustion), the error message text becomes the $newUdid value. Subsequent operations at lines 27-28 will attempt to boot this garbage string as a UDID, and line 32 returns it to the caller, causing hard-to-diagnose CI failures.

Verification

Read Reset-IosSimulator function completely. Traced flow from xcrun simctl create at line 23 through simctl boot at line 27 and return at line 32. Confirmed no $LASTEXITCODE check or try/catch around the create operation. Compared with other functions in the file which also lack such checks.

Identified by Warden find-bugs · 4YX-2ML

@jamescrosswell
Copy link
Copy Markdown
Collaborator Author

Admitting defeat...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant