|
| 1 | +# Fix #225 (fix-forward): D3D12 Chunk Name, Depth, and Exact-ID Matching |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +PR #226 (squash 3735f2b) shipped the D3D12 adapter-matching logic but the |
| 6 | +structured-data guard never fires on real hardware. Hardware verification by |
| 7 | +issue reporter @YunHsiao (renderdoc 1.42, Windows multi-GPU D3D12 box) confirmed |
| 8 | +that the user-facing crash is resolved by the vendor-priority fallback introduced |
| 9 | +in #226, but the structured-data path — the primary, deterministic signal — is |
| 10 | +silently skipped on every real capture. Three distinct defects prevent it from |
| 11 | +working. This change is a fix-forward on the merged commit; issue #225 remains |
| 12 | +open until hardware re-verification passes. |
| 13 | + |
| 14 | +## Problem |
| 15 | + |
| 16 | +### Canonical dump (ground truth) |
| 17 | + |
| 18 | +Reporter provided the structured-data tree from renderdoc 1.42's bundled Python |
| 19 | +module on a real D3D12 capture. The ONLY chunk carrying adapter identity is |
| 20 | +`chunk[0]`, whose name is literally: |
| 21 | + |
| 22 | +``` |
| 23 | +Internal::Driver Initialisation Parameters |
| 24 | +``` |
| 25 | + |
| 26 | +Its subtree: |
| 27 | + |
| 28 | +``` |
| 29 | +Internal::Driver Initialisation Parameters [Chunk] children=1 |
| 30 | + InitParams [D3D12InitParams] children=7 |
| 31 | + MinimumFeatureLevel [D3D_FEATURE_LEVEL] |
| 32 | + AdapterDesc [DXGI_ADAPTER_DESC] children=9 |
| 33 | + Description [string] = 'NVIDIA RTX 4500 Ada Generation' |
| 34 | + VendorId [uint32_t] |
| 35 | + DeviceId [uint32_t] |
| 36 | + SubSysId [uint32_t] |
| 37 | + Revision [uint32_t] |
| 38 | + DedicatedVideoMemory [uint64_t] |
| 39 | + DedicatedSystemMemory [uint64_t] |
| 40 | + SharedSystemMemory [uint64_t] |
| 41 | + AdapterLuid [LUID] |
| 42 | + usedDXIL, VendorExtensions, VendorUAV, VendorUAVSpace, SDKVersion |
| 43 | +``` |
| 44 | + |
| 45 | +All remaining chunks are `ID3D12Device::*` / `ID3D12Resource::SetName` API calls |
| 46 | +and 4834× `Internal::Initial Contents`. There is no `EnumAdapters`, no |
| 47 | +`CreateDXGIFactory`, no `DriverInit` chunk anywhere in the capture. |
| 48 | + |
| 49 | +Available GPUs on reporter's machine: |
| 50 | +- `AMD Radeon(TM) Graphics` vendor=2 deviceID=5056 |
| 51 | +- `NVIDIA RTX 4500 Ada Generation` vendor=6 deviceID=10161 |
| 52 | +- `WARP Rasterizer` vendor=9 deviceID=0 |
| 53 | + |
| 54 | +### Defect 1 — outer chunk guard never matches |
| 55 | + |
| 56 | +`d3d_markers = ("DriverInit", "EnumAdapters", "CreateDXGIFactory")` |
| 57 | + |
| 58 | +None of the three substrings occur in `"Internal::Driver Initialisation Parameters"`. |
| 59 | +The space between "Driver" and "Initialisation" defeats `"DriverInit"`. The guard |
| 60 | +falls through on every chunk and `_find_adapter_description` is never called. |
| 61 | + |
| 62 | +### Defect 2 — `_find_adapter_description` only walks 2 levels |
| 63 | + |
| 64 | +The real `Description` field is at depth 3 relative to the chunk root: |
| 65 | + |
| 66 | +``` |
| 67 | +chunk (depth 0) |
| 68 | + InitParams (depth 1) |
| 69 | + AdapterDesc (depth 2) |
| 70 | + Description (depth 3) ← required |
| 71 | +``` |
| 72 | + |
| 73 | +The current walker checks direct children (depth 1) for `Description`/`DeviceName` |
| 74 | +and direct children of `AdapterDesc`/`pAdapter`/`adapter` (depth 2 for the |
| 75 | +container, depth 3 only if the container is at depth 1). Because `AdapterDesc` is |
| 76 | +at depth 2 (a grandchild), the current inner loop is never reached. |
| 77 | + |
| 78 | +### Defect 3 — name-substring match is fragile for same-vendor multi-GPU |
| 79 | + |
| 80 | +The name-substring comparison `desc_l in name_l or name_l in desc_l` is the |
| 81 | +only disambiguation signal once a Description is found. On a system with two |
| 82 | +NVIDIA cards whose names differ only by suffix, substring matching can produce a |
| 83 | +false-positive match. The `AdapterDesc` already exposes `DeviceId` (PCI Device |
| 84 | +ID, uint32) and `VendorId` (PCI Vendor ID, uint32). The reporter explicitly |
| 85 | +recommended hardening to exact numeric match. |
| 86 | + |
| 87 | +`GPUDevice.deviceID` in the renderdoc Python API carries the PCI Device ID, which |
| 88 | +is what `DXGI_ADAPTER_DESC.DeviceId` encodes. An exact `DeviceId` match is |
| 89 | +unambiguous and free of string-normalization risk. |
| 90 | + |
| 91 | +## Design |
| 92 | + |
| 93 | +### Part 1 — extend chunk name markers |
| 94 | + |
| 95 | +Add `"Driver Initialisation Parameters"` to `d3d_markers`. Retain the three |
| 96 | +existing substrings (`"DriverInit"`, `"EnumAdapters"`, `"CreateDXGIFactory"`) for |
| 97 | +compatibility with other renderdoc versions or API paths that may use them. |
| 98 | + |
| 99 | +```python |
| 100 | +d3d_markers = ( |
| 101 | + "Driver Initialisation Parameters", # renderdoc 1.42 D3D12 |
| 102 | + "DriverInit", |
| 103 | + "EnumAdapters", |
| 104 | + "CreateDXGIFactory", |
| 105 | +) |
| 106 | +``` |
| 107 | + |
| 108 | +### Part 2 — bounded recursive descent in `_find_adapter_description` |
| 109 | + |
| 110 | +Replace the 2-level manual walk with a depth-bounded recursive descent (cap at |
| 111 | +depth 4 or 5). At each node: |
| 112 | + |
| 113 | +1. If the node name is `Description` or `DeviceName`, return its string value and |
| 114 | + a paired `DeviceId` if available at the same level. |
| 115 | +2. If the node name is `AdapterDesc`, `pAdapter`, or `adapter`, recurse into it |
| 116 | + with depth decremented — this finds `Description` at depth 3 regardless of |
| 117 | + intermediate wrapper levels. |
| 118 | +3. Otherwise recurse into all children with depth decremented. |
| 119 | + |
| 120 | +Return type changes to a small named tuple carrying |
| 121 | +`(description, device_id)` so the caller can use the numeric `DeviceId` for the |
| 122 | +primary match. `VendorId` is deliberately NOT parsed (see Risks — it cannot be |
| 123 | +compared against the renderdoc `GPUVendor` enum, so carrying it would be dead). |
| 124 | + |
| 125 | +`DeviceId` is a `uint32_t` structured-data scalar. The renderdoc `SDObject` |
| 126 | +interface used everywhere else in this code reads strings via `AsString()`; the |
| 127 | +integer-valued counterpart is `AsInt()` (there is no `AsUInt()` on `SDObject`). |
| 128 | +The implementation MUST read `DeviceId` via `child.AsInt()`, consistent with the |
| 129 | +existing `AsString()` usage on the same interface. There is no top-level integer |
| 130 | +`.data` attribute to read directly. |
| 131 | + |
| 132 | +### Part 3 — exact numeric DeviceId match as primary signal |
| 133 | + |
| 134 | +In `_match_capture_gpu`, after obtaining the result from |
| 135 | +`_find_adapter_description`, apply a three-tier selection: |
| 136 | + |
| 137 | +1. **Primary**: exact `DeviceId` match — `match.device_id == g.deviceID` for |
| 138 | + each GPU `g`. The first matching GPU wins. A `DeviceId` of `0` is treated as |
| 139 | + a software/WARP sentinel (software, WARP and virtualized adapters report |
| 140 | + `DXGI_ADAPTER_DESC.DeviceId == 0`): the tier-1 gate is truthiness, not |
| 141 | + `is not None`, so a present-but-zero id falls through to tier 2; and any |
| 142 | + candidate GPU whose `deviceID == 0` is skipped inside the exact-match loop so |
| 143 | + a real non-zero parsed id can never coincidentally bind the WARP adapter. |
| 144 | +2. **Secondary**: name-substring match (existing logic) — used when no GPU's |
| 145 | + `deviceID` matches the parsed value, when `DeviceId` is absent, or when the |
| 146 | + parsed `DeviceId` is the `0` sentinel. |
| 147 | +3. **Tertiary**: vendor-priority fallback (existing logic, unchanged). |
| 148 | + |
| 149 | +### Known limitations / risks |
| 150 | + |
| 151 | +**Why VendorId is not parsed.** `DXGI_ADAPTER_DESC.VendorId` is the raw PCI |
| 152 | +vendor ID (e.g., NVIDIA = 0x10DE = 4318). The renderdoc `GPUVendor` enum uses |
| 153 | +ordinals (nVidia = 6). These values are in unrelated spaces, so `VendorId` |
| 154 | +cannot be compared against `g.vendor` and there is no other consumer for it. |
| 155 | +The field is therefore deliberately NOT read from the chunk (carrying it would |
| 156 | +be dead state). `DeviceId` → `g.deviceID` is the only exact key used. |
| 157 | + |
| 158 | +**(RISK-A) `deviceID` vs PCI `DeviceId` same-space assumption.** Tier-1 assumes |
| 159 | +the renderdoc `GPUDevice.deviceID` is the same integer as the PCI |
| 160 | +`DXGI_ADAPTER_DESC.DeviceId`. This holds on the D3D12 path per the reporter |
| 161 | +dump but is unverifiable on Linux/Vulkan from this repo. A mismatch silently |
| 162 | +demotes tier-1 to name-substring (correctness preserved, determinism lost). |
| 163 | +*Reporter verification question:* "On the #225 capture, print |
| 164 | +`GetAvailableGPUs()[i].deviceID` for the NVIDIA RTX 4500 entry AND |
| 165 | +`AdapterDesc.DeviceId` parsed from the same capture; confirm they are |
| 166 | +bitwise-equal (expected 10161)." |
| 167 | + |
| 168 | +**(RISK-B) Chunk-name version pinning.** The substring |
| 169 | +`"Driver Initialisation Parameters"` is renderdoc-1.42-specific (British |
| 170 | +spelling). A future renderdoc renaming it makes the guard miss every chunk and |
| 171 | +silently reverts to vendor-priority — exactly the #225 failure class. Mitigated |
| 172 | +by the new FIX-3 diagnostic `_log.warning` ("no recognized adapter chunk … — |
| 173 | +renderdoc version may have changed chunk naming") which fires only on the |
| 174 | +multi-GPU + structured-data + no-marker-chunk path, making the regression |
| 175 | +loud and diagnosable instead of silent. *Reporter verification question:* "After |
| 176 | +a renderdoc upgrade, if multi-GPU replay picks the wrong card, grep the daemon |
| 177 | +log for `renderdoc version may have changed chunk naming`; its presence confirms |
| 178 | +the chunk was renamed and the new name must be added to `d3d_markers`." |
| 179 | + |
| 180 | +**Bounded-recursion depth.** A depth cap of 4 is sufficient for the observed tree |
| 181 | +(Description at depth 3). Setting the cap to 5 provides one level of headroom. |
| 182 | +Setting it higher risks traversing large subtrees (4834× `Initial Contents` chunks |
| 183 | +are siblings at the top level, not descendants, so they are not a concern; the |
| 184 | +guard in Part 1 prevents entering them). |
| 185 | + |
| 186 | +**Chunk-name variance across renderdoc versions.** The real chunk name contains |
| 187 | +"Initialisation" (British spelling). Future renderdoc versions may rename it. |
| 188 | +Adding the new marker as an additional entry (not replacing the old ones) hedges |
| 189 | +against this. If the chunk name is ever confirmed to differ, the new entry should |
| 190 | +be updated in a follow-up. |
| 191 | + |
| 192 | +**`GPUDevice` field names.** The field names `g.deviceID` and `g.name` are |
| 193 | +inferred from the existing code and the reported dump. The implementer must |
| 194 | +confirm these names against the bundled `renderdoc.pyi` or equivalent before |
| 195 | +finalising the implementation. |
| 196 | + |
| 197 | +## Spec Delta |
| 198 | + |
| 199 | +The existing scenario "Multi-GPU capture replay" under **Requirement: Replay |
| 200 | +lifecycle** in `openspec/specs/daemon/spec.md` (added by #226) requires the |
| 201 | +following amendment: |
| 202 | + |
| 203 | +The target scenario in `openspec/specs/daemon/spec.md` (HEAD lines 38–45) uses |
| 204 | +bold `**WHEN**` / `**AND**` / `**THEN**` markers. The delta below reproduces |
| 205 | +those bytes verbatim so it applies cleanly: |
| 206 | + |
| 207 | +```diff |
| 208 | + #### Scenario: Multi-GPU capture replay |
| 209 | + - **WHEN** the capture was taken on a multi-GPU system |
| 210 | + - **AND** multiple GPUs are available for replay |
| 211 | +-- **THEN** the daemon selects the GPU whose name matches structured-data adapter |
| 212 | +- metadata extracted from the capture |
| 213 | ++- **THEN** the daemon walks structured-data chunks whose name contains any of |
| 214 | ++ "Driver Initialisation Parameters", "DriverInit", "EnumAdapters", or |
| 215 | ++ "CreateDXGIFactory" |
| 216 | ++- **AND** within each such chunk descends recursively (depth ≤ 5) to locate an |
| 217 | ++ AdapterDesc/pAdapter/adapter subtree |
| 218 | ++- **AND** selects the GPU whose PCI DeviceId exactly matches |
| 219 | ++ DXGI_ADAPTER_DESC.DeviceId when that field is present |
| 220 | ++- **AND** falls back to case-insensitive name-substring match when no exact |
| 221 | ++ DeviceId match is found |
| 222 | + - **AND** if no structured-data match exists, Software/WARP adapters are excluded |
| 223 | + and the highest-ranked discrete GPU is preferred (nVidia > AMD > Intel) |
| 224 | + - **AND** a single available GPU is always returned directly without inspection |
| 225 | +``` |
| 226 | + |
| 227 | +This delta has been applied verbatim to `openspec/specs/daemon/spec.md` as |
| 228 | +part of this change. |
| 229 | + |
| 230 | +## Files Changed |
| 231 | + |
| 232 | +- `src/rdc/daemon_server.py` — `d3d_markers` tuple; `_find_adapter_description` |
| 233 | + return type and walk algorithm; `_match_capture_gpu` selection logic |
| 234 | +- `openspec/specs/daemon/spec.md` — scenario amendment (see Spec Delta above) |
| 235 | +- `tests/unit/test_daemon_server_unit.py` — new unit tests (see `test-plan.md`) |
0 commit comments