Commit a1fc916
Additional diagnostics for DML failure path (#28495)
### Description
<!-- Describe your changes. -->
In DmlGraphFusionHelper::ExecuteReusableCommandList, after
ExecuteCommandList fails:
* Broaden the failure branch from just DXGI_ERROR_DEVICE_REMOVED to also
catch DEVICE_HUNG, DEVICE_RESET, and
DRIVER_INTERNAL_ERROR.
* Query GetDeviceRemovedReason on both the DML and D3D12 devices
(matching the pattern in DmlCommandRecorder.cpp).
* Throw via ORT_THROW_HR_MSG with a clear message that names the failure
as a TDR / device-removal event, calls out and includes all three
HRESULTs for triage. Preserves the prior thrown-HRESULT for the existing
DEVICE_REMOVED path
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
While investigating a WebNN sample failure on Chrome running Stable
Diffusion 1.5 on an AMD Radeon 860M iGPU, ORT 1.23.4 surfaced this
error:
`DmlGraphFusionHelper.cpp(1078) ... 887A0006 The GPU will not respond to
more commands, most likely because of an invalid command passed by the
calling application.`
0x887A0006 is DXGI_ERROR_DEVICE_HUNG. The text "...invalid command
passed by the calling application" seems to be the FormatMessage string
for that HRESULT.
The pre-existing code in
DmlGraphFusionHelper::ExecuteReusableCommandList only special-cased
DXGI_ERROR_DEVICE_REMOVED, so for DEVICE_HUNG / DEVICE_RESET /
DRIVER_INTERNAL_ERROR HRESULTs the user just got the raw message. I
wanted to add a little more diagnostic information to this.
Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>1 parent d464b2a commit a1fc916
1 file changed
Lines changed: 42 additions & 4 deletions
Lines changed: 42 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1094 | 1094 | | |
1095 | 1095 | | |
1096 | 1096 | | |
1097 | | - | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
1098 | 1112 | | |
1099 | | - | |
1100 | | - | |
1101 | | - | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
1102 | 1140 | | |
1103 | 1141 | | |
1104 | 1142 | | |
| |||
0 commit comments