Commit 36df374
[ci] Retry on-device install, fail fast, diagnose failures, always capture logcat (#11831)
## Description
The on-device package-test template `apk-instrumentation.yaml` builds+installs the app with `-t:Install` and then runs `dotnet test`. When the emulator drops off ADB mid-install (`error XAGCPU7000: Mono.AndroidTools.AdbException: device offline`) the previous behavior was doubly bad:
1. The install ran with `continueOnError: true`, so the failure was swallowed and the pipeline still ran `dotnet test` against a device where the app was **never installed**, producing a misleading `INSTRUMENTATION_FAILED` and only going red at the final `fail if any issues occurred` gate — after wasting time.
2. A single transient ADB blip failed the whole lane with no attempt to recover, and we captured nothing to explain *why* the install failed.
## Changes
- **Retry the install** via the `run-dotnet-preview` template's existing `retryCountOnTaskFailure: 3`, so a transient `device offline` blip can recover on a retry instead of failing the lane.
- **Fail fast** (`continueOnError: false`) once retries are exhausted: the lane fails immediately and the `run` step (whose `condition` defaults to `succeeded()`) is skipped — no point running tests against a device with no app installed.
- **Diagnose install failures**: a new `failed()`-gated, time-bounded step right after install snapshots device state so we can *classify* the failure next time instead of guessing — connectivity (`adb devices -l` / `get-state`), disk pressure (`df`, `dumpsys diskstats`), storage-service readiness (`dumpsys storaged` — the `StorageStatsManager` NPE seen during `install-create`), boot completion, and accumulated test apps (`pm list packages -3`).
- **Never lose logcat**: the `capture logcat` step's condition is changed from the default `succeeded()` to `always()`, so the best-effort `adb logcat -d` runs on success, on a failed step (e.g. fail-fast install), **and** on job cancellation/timeout (e.g. a hung test). The capture is best-effort (`continueOnError: true` + `|| echo`) and tolerates an offline device. This matches the step's "Always capture full device logcat" intent and addresses review feedback.
## Why `always()` (logcat) and not just removing the condition
Omitting `condition:` is **not** neutral — an Azure Pipelines step with no condition defaults to `succeeded()`, i.e. it is skipped as soon as any prior step fails. That is exactly the old, buggy behavior. `succeededOrFailed()` fixes the failed-step case but still skips on cancellation (a job-level timeout counts as cancellation), which is precisely when a hung test's logcat is most valuable — so `always()` is used.
## Behavior change
- A genuinely unrecoverable install failure now fails the job at the install step (after retries) instead of the final gate.
- Because the flavors (Debug / aab / NoAab / CoreCLR / …) share one job and downstream steps are gated on `succeeded()`, a hard install failure in one flavor will now skip the remaining flavors in that job. This is acceptable: a device offline across all retries is very unlikely to install the next flavor either, and fail-fast gives a clear signal.
## Context
Tracking issue for the underlying emulator/ADB install flakiness (`device offline` and the `StorageStatsManager` NPE): #11830
Observed on build [1488505](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1488505) (`Package Tests > macOS > Tests > APKs 1`).
A companion PR adds an equivalent device-state snapshot to the `DeviceTest` on-failure teardown for the MSBuildDeviceIntegration `DeployToDevice`/`InstallAndRun` tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 4f7efd5 commit 36df374
1 file changed
Lines changed: 38 additions & 2 deletions
Lines changed: 38 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
| |||
24 | 31 | | |
25 | 32 | | |
26 | 33 | | |
27 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
28 | 58 | | |
29 | 59 | | |
30 | 60 | | |
| |||
63 | 93 | | |
64 | 94 | | |
65 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
66 | 102 | | |
67 | 103 | | |
68 | 104 | | |
69 | 105 | | |
70 | 106 | | |
71 | 107 | | |
72 | | - | |
| 108 | + | |
73 | 109 | | |
74 | 110 | | |
75 | 111 | | |
| |||
0 commit comments