[DX] Add ray tracing state object, SBT, and DispatchRays bring-up#1275
Merged
Conversation
82 tasks
7315abc to
06f0ce9
Compare
This was referenced Jun 3, 2026
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 3, 2026
…ldRay Three small PSO RT tests stacked on llvm#1275, each isolating one shader- observable closest-hit system value from llvm#1268's 👍 list. Same shape as the prior batch in llvm#1277 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest- hit shader reports a known `BuiltInTriangleIntersectionAttributes ::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used midpoint(v0, v1) and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal RT bring-up rebased on top, all three pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: all three pass on Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton + the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 3, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery in compute) physically can't express — they're only reachable through a DispatchRays-driven RT pipeline. - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay from an above-the-triangle origin. First-level CH sees payload=0 → bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 → writes 0x10. Unwinds: first-level OR's in 0x100. Final payload 0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion- Depth: 2` so both TraceRay calls are within budget. - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes 0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only — inline RT has no equivalent) so CH is skipped and payload keeps its initial 0xAAAA. Output [0xBEEF, 0xAAAA]. - `callable-shader.test` — two callable shaders writing distinct sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)` so the SBT callable region's per-record routing is exercised independently of the hit-group / miss routing already covered in llvm#1277. Callable shaders themselves don't exist in inline RT. This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv` backend lists every callable's IncomingCallableDataKHR variable in every callable entry point's interface, violating VUID-Standalone- Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk- CreateShaderModule. The framework's Vulkan SBT / callable path is correct — running `spirv-opt --remove-unused-interface-variables` on the DXC output cleans the SPIR-V and the test passes natively. Track upstream. All three pass on Metal once the bring-up PR ahead of this commit sets the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so nested TraceRay actually unwinds (with the default of 1, the second TraceRay was silently dropped and the recursion test produced 0x1 instead of 0x110). Locally verified on the user's Linux box: - Vulkan via the native offloader: recursion + skip-CH PASS; callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V as documented above). - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all three PASS. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: - All three PASS (recursion + skip-CH + callable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 3, 2026
Four small tests stacked on top of llvm#1275, each isolating one shader-observable PSO raytracing surface. They follow the same shape as the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes `DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid plumbs through to the per-lane system value with no BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding). - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the constant `DispatchRaysDimensions()` into one uint per lane. Confirms every lane sees the host-side `{W, H, D}` even when only one dimension > 1. - `miss-shader-index.test` — two miss shaders writing distinct sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1 respectively; rays start far enough from the geometry that every ray misses. Verifies the SBT miss region's per-record routing. - `ray-contribution-to-hit-group-index.test` — two hit groups with distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same triangle. Verifies the SBT hit-group region's per-record routing. The first two have no AS / Miss / HitGroup in their pipeline at all — just a raygen + a UAV — which exercises the minimum viable RT pipeline shape (one raygen group, zero-sized miss / hit / callable SBT regions). The latter two reuse the single-triangle BLAS/TLAS from `raygen-roundtrip.test`. All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on top, all four pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: - Vulkan via the native offloader against an NVIDIA RTX 3060: all four tests PASS. - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe on the same GPU: all four tests PASS. And on macOS 15 / metal-irconverter 3.1.1: - Metal via the native offloader: all four tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 8, 2026
Four small tests stacked on top of llvm#1275, each isolating one shader-observable PSO raytracing surface. They follow the same shape as the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes `DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid plumbs through to the per-lane system value with no BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding). - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the constant `DispatchRaysDimensions()` into one uint per lane. Confirms every lane sees the host-side `{W, H, D}` even when only one dimension > 1. - `miss-shader-index.test` — two miss shaders writing distinct sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1 respectively; rays start far enough from the geometry that every ray misses. Verifies the SBT miss region's per-record routing. - `ray-contribution-to-hit-group-index.test` — two hit groups with distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same triangle. Verifies the SBT hit-group region's per-record routing. The first two have no AS / Miss / HitGroup in their pipeline at all — just a raygen + a UAV — which exercises the minimum viable RT pipeline shape (one raygen group, zero-sized miss / hit / callable SBT regions). The latter two reuse the single-triangle BLAS/TLAS from `raygen-roundtrip.test`. All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on top, all four pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: - Vulkan via the native offloader against an NVIDIA RTX 3060: all four tests PASS. - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe on the same GPU: all four tests PASS. And on macOS 15 / metal-irconverter 3.1.1: - Metal via the native offloader: all four tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 8, 2026
…ldRay Three small PSO RT tests stacked on llvm#1275, each isolating one shader- observable closest-hit system value from llvm#1268's 👍 list. Same shape as the prior batch in llvm#1277 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest- hit shader reports a known `BuiltInTriangleIntersectionAttributes ::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used midpoint(v0, v1) and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal RT bring-up rebased on top, all three pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: all three pass on Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton + the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95
added a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 8, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery in compute) physically can't express — they're only reachable through a DispatchRays-driven RT pipeline. - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay from an above-the-triangle origin. First-level CH sees payload=0 → bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 → writes 0x10. Unwinds: first-level OR's in 0x100. Final payload 0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion- Depth: 2` so both TraceRay calls are within budget. - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes 0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only — inline RT has no equivalent) so CH is skipped and payload keeps its initial 0xAAAA. Output [0xBEEF, 0xAAAA]. - `callable-shader.test` — two callable shaders writing distinct sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)` so the SBT callable region's per-record routing is exercised independently of the hit-group / miss routing already covered in llvm#1277. Callable shaders themselves don't exist in inline RT. This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv` backend lists every callable's IncomingCallableDataKHR variable in every callable entry point's interface, violating VUID-Standalone- Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk- CreateShaderModule. The framework's Vulkan SBT / callable path is correct — running `spirv-opt --remove-unused-interface-variables` on the DXC output cleans the SPIR-V and the test passes natively. Track upstream. All three pass on Metal once the bring-up PR ahead of this commit sets the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so nested TraceRay actually unwinds (with the default of 1, the second TraceRay was silently dropped and the recursion test produced 0x1 instead of 0x110). Locally verified on the user's Linux box: - Vulkan via the native offloader: recursion + skip-CH PASS; callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V as documented above). - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all three PASS. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: - All three PASS (recursion + skip-CH + callable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second per-backend bring-up in the PSO raytracing series (llvm#1268). Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML schema, hands out shader identifiers via ID3D12StateObjectProperties, lays out the SBT in an upload heap, and routes DispatchRays through ID3D12GraphicsCommandList4 (same query path the AS build already uses). DXRayTracingPipelineState derives from DXPipelineState with an IsRayTracing flag on the base for classof — matching the VulkanPipelineState pattern. It carries the ID3D12StateObject + a cached ID3D12StateObjectProperties + a StringMap<const void *> that resolves each shader EntryPoint or hit-group Name to its 32-byte shader identifier blob. The identifiers are driver-owned and stay alive for the Properties COM lifetime, so the PSO keeps Properties alive. DXShaderBindingTable holds a single upload-heap buffer plus four pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group, callable) — `RANGE` for raygen since it's always one record, and `RANGE_AND_STRIDE` for the others. createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects for the DXIL library (one export per Shader entry), per-hit-group subobjects with closest-hit / any-hit / intersection imports, the pipeline shader config (max payload + max attribute bytes), pipeline config (max recursion depth), and a global root signature subobject. The root signature comes from the library's embedded RTS0 part when present, falling back to the BindingsDesc path (matching the existing compute / raster pipeline behaviour). Wide strings for the subobject exports live in a SmallVector that outlives the SODesc, since the helper classes store pointers into the strings rather than copying. createShaderBindingTable lays out each entry as [identifier][LocalRootData][padding-to-stride] with per-region stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot- Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and per-region size = align(count * stride, D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification; a staging copy into a default heap is a follow-up. dispatchRays queries the underlying CommandListX for ID3D12GraphicsCommandList4 (matching the AS-build path), binds the global root signature via SetComputeRootSignature, calls SetPipelineState1 with the state object, and issues DispatchRays with a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus the dispatch dimensions. The descriptor heap + descriptor-table bindings are set up by the existing createComputeCommands helper before the encoder is created. createComputeCommands grows an isRayTracing branch at the dispatch point so it calls dispatchRays instead of dispatch, reusing all of the descriptor-heap and root-signature wiring. InvocationState carries a ShaderBindingTable unique_ptr that's only populated for RT pipelines. executeProgram's isRayTracing branch builds a RayTracingPipelineCreate- Desc from Pipeline.Shaders / HitGroups / RTConfig, calls createPipelineRT then createShaderBindingTable, then re-enters createComputeCommands which dispatches via the new RT path. raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should PASS via this implementation on Windows CI (and via Wine + vkd3d-proton locally on Linux). The Clang token still catches the compile failure on clang-dxc since [shader("raygeneration")] doesn't yet lower to either DXIL libraries or SPIR-V on that path. Locally verified by cross-compiling lib/API/DX/Device.cpp via `clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK headers and the project's bundled DirectX-Headers. Runtime verification is left to Windows CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RayTracing pipelines compile every entry point — raygen, miss, closest-hit, any-hit, intersection, callable — into a single DXIL library via `dxc -T lib_6_x` / `clang-dxc -T lib_6_x`. That's the shape every real DXR app ships: D3D12's CreateStateObject requires a DXIL-library subobject anyway, and the driver fuses entry points across the whole library at link time, so writing one .hlsl file and compiling it once is both idiomatic and the path the framework's `%dxc_target_lib` substitution emits. Compute and raster pipelines stay one-to-one (the existing position- based mapping handles VS+PS, AS+MS+PS, etc.). RT pipelines today need N positional args even though one library blob holds every entry — which the foundational `raygen-roundtrip.test` runs straight into: 3 Shaders[] entries vs 1 input file fails the count check before any GPU work happens. Detect the RT-pipeline-with-one-input shape and copy the library blob into every `Shaders[].Shader` slot via `MemoryBuffer::getMemBufferCopy`. Each entry owns its own buffer copy (DXIL libraries are KBs, no real memory pressure) keeping the existing `unique_ptr<MemoryBuffer>` ownership model intact. Non-RT pipelines still go through the positional path and still enforce the count check. Verified by re-running `raygen-roundtrip.test`'s pipeline.yaml + the DXIL library via Wine + vkd3d-proton with a single .o argument — same 0xBEEF result the prior three-arg invocation produced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+1597
to
+1601
| static std::wstring widen(llvm::StringRef S) { | ||
| // Entry-point names and hit-group names are ASCII; a straight 1:1 widen | ||
| // is sufficient. | ||
| return std::wstring(S.begin(), S.end()); | ||
| } |
Contributor
There was a problem hiding this comment.
This can be a free function.
Contributor
There was a problem hiding this comment.
Perhaps we fix this in a follow-up PR where we also refactor out all the std::wstring calls to this helper.
manon-traverse
approved these changes
Jun 19, 2026
manon-traverse
left a comment
Contributor
There was a problem hiding this comment.
Just needs to be clang-tidied and pass CI, other than that this looks fine to me.
EmilioLaiso
approved these changes
Jun 22, 2026
EmilioLaiso
added a commit
that referenced
this pull request
Jun 24, 2026
Depends on #1275 ## Summary Last backend in the PSO RT bring-up stack. DXR-style ray tracing reaches Metal through `metal_irconverter`: each RT entry point is lowered from DXIL to a Metal IR function, raygen is emitted as a kernel (`IRRayGenerationCompilationKernel`) so it can be dispatched directly, and miss / closest-hit / any-hit / intersection / callable functions are emitted as visible functions and pulled into a `MTLVisibleFunctionTable`. Fills in the three virtuals the foundation PR left stubbed on Metal: - `MTLDevice::createPipelineRT` compiles every `Shaders[]` entry against a single `IRRayTracingPipelineConfiguration` (max attribute / recursion budget from the YAML `RTConfig`), builds one `MTL::Library` per entry, hands the raygen function to the compute pipeline as the kernel, and registers the rest as `LinkedFunctions`. The freshly-built pipeline then mints a `MTLVisibleFunctionTable` and resolves each callable function's handle into a slot index that the SBT builder reuses. `setMaxCallStackDepth(MaxTraceRecursionDepth)` is set so nested `TraceRay` actually unwinds (default of 1 silently drops the second trace). - `MTLDevice::createShaderBindingTable` lays the four SBT regions out via the shared `computeSBTLayout` helper sized for `IRShaderIdentifier` records, looks up each region entry's `ShaderName` in the pipeline's name → `IRShaderIdentifier` map, and `memcpy`s the records into a shared-storage `MTL::Buffer` the runtime dereferences at dispatch. - `MTLComputeEncoder::dispatchRays` binds the raygen pipeline and runs `dispatchThreads(Width, Height, Depth)` on the encoder. The caller (`createRayTracingCommands` in `MTLDevice`) builds the per-dispatch `IRDispatchRaysArgument` struct (SBT region addresses + sizes, GRS / `ResDescHeap` GPU pointers, visible / intersection function table `resourceID`s), parks it in a shared `MTL::Buffer` kept alive on the command buffer's KeepAlive list, and binds it at `kIRRayDispatchArgumentsBindPoint` so callees reached via `TraceRay()` inherit the same dispatch state through that pointer. Plumbs the existing `executeProgram` RT branch on Metal the same way the VK / DX backends already do (validate `Shaders` / `SBT` / `RTConfig`, build `RayTracingPipelineCreateDesc` from the YAML pipeline, create PSO, build SBT, record commands), and adds the `raytracing-pipeline` lit feature on Metal so `test/Feature/RT/raygen-roundtrip.test` drops `Metal` from its `XFAIL` list and passes natively on Apple Silicon. This bring-up only handles Triangle hit groups whose only member is a `ClosestHit` shader — any-hit / intersection / procedural / local root signatures land in follow-ups; `createPipelineRT` returns a clear unsupported error for those shapes instead of silently producing wrong output. ## Test plan Local on an NVIDIA RTX 3060: - [ ] Linux Vulkan (native `offloader`) - [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 24, 2026
Four small tests stacked on top of llvm#1275, each isolating one shader-observable PSO raytracing surface. They follow the same shape as the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes `DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid plumbs through to the per-lane system value with no BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding). - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the constant `DispatchRaysDimensions()` into one uint per lane. Confirms every lane sees the host-side `{W, H, D}` even when only one dimension > 1. - `miss-shader-index.test` — two miss shaders writing distinct sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1 respectively; rays start far enough from the geometry that every ray misses. Verifies the SBT miss region's per-record routing. - `ray-contribution-to-hit-group-index.test` — two hit groups with distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same triangle. Verifies the SBT hit-group region's per-record routing. The first two have no AS / Miss / HitGroup in their pipeline at all — just a raygen + a UAV — which exercises the minimum viable RT pipeline shape (one raygen group, zero-sized miss / hit / callable SBT regions). The latter two reuse the single-triangle BLAS/TLAS from `raygen-roundtrip.test`. All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on top, all four pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: - Vulkan via the native offloader against an NVIDIA RTX 3060: all four tests PASS. - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe on the same GPU: all four tests PASS. And on macOS 15 / metal-irconverter 3.1.1: - Metal via the native offloader: all four tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 24, 2026
…ldRay Three small PSO RT tests stacked on llvm#1275, each isolating one shader- observable closest-hit system value from llvm#1268's 👍 list. Same shape as the prior batch in llvm#1277 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest- hit shader reports a known `BuiltInTriangleIntersectionAttributes ::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used midpoint(v0, v1) and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal RT bring-up rebased on top, all three pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: all three pass on Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton + the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 24, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery in compute) physically can't express — they're only reachable through a DispatchRays-driven RT pipeline. - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay from an above-the-triangle origin. First-level CH sees payload=0 → bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 → writes 0x10. Unwinds: first-level OR's in 0x100. Final payload 0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion- Depth: 2` so both TraceRay calls are within budget. - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes 0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only — inline RT has no equivalent) so CH is skipped and payload keeps its initial 0xAAAA. Output [0xBEEF, 0xAAAA]. - `callable-shader.test` — two callable shaders writing distinct sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)` so the SBT callable region's per-record routing is exercised independently of the hit-group / miss routing already covered in llvm#1277. Callable shaders themselves don't exist in inline RT. This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv` backend lists every callable's IncomingCallableDataKHR variable in every callable entry point's interface, violating VUID-Standalone- Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk- CreateShaderModule. The framework's Vulkan SBT / callable path is correct — running `spirv-opt --remove-unused-interface-variables` on the DXC output cleans the SPIR-V and the test passes natively. Track upstream. All three pass on Metal once the bring-up PR ahead of this commit sets the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so nested TraceRay actually unwinds (with the default of 1, the second TraceRay was silently dropped and the recursion test produced 0x1 instead of 0x110). Locally verified on the user's Linux box: - Vulkan via the native offloader: recursion + skip-CH PASS; callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V as documented above). - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all three PASS. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: - All three PASS (recursion + skip-CH + callable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 25, 2026
Four small tests stacked on top of llvm#1275, each isolating one shader-observable PSO raytracing surface. They follow the same shape as the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes `DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid plumbs through to the per-lane system value with no BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding). - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the constant `DispatchRaysDimensions()` into one uint per lane. Confirms every lane sees the host-side `{W, H, D}` even when only one dimension > 1. - `miss-shader-index.test` — two miss shaders writing distinct sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1 respectively; rays start far enough from the geometry that every ray misses. Verifies the SBT miss region's per-record routing. - `ray-contribution-to-hit-group-index.test` — two hit groups with distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same triangle. Verifies the SBT hit-group region's per-record routing. The first two have no AS / Miss / HitGroup in their pipeline at all — just a raygen + a UAV — which exercises the minimum viable RT pipeline shape (one raygen group, zero-sized miss / hit / callable SBT regions). The latter two reuse the single-triangle BLAS/TLAS from `raygen-roundtrip.test`. All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on top, all four pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: - Vulkan via the native offloader against an NVIDIA RTX 3060: all four tests PASS. - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe on the same GPU: all four tests PASS. And on macOS 15 / metal-irconverter 3.1.1: - Metal via the native offloader: all four tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 26, 2026
Depends on #1281 ## Summary Four small PSO raytracing tests stacked on top of #1275, each isolating one shader-observable surface from the 👍 list in #1268. Same shape as the inline-RT batch already in flight in #1271 / #1272 / #1274 / #1276 — one `.test` file per behavior, single-purpose shader, exact buffer comparison. - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes `DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid plumbs through to the per-lane system value with no BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding). - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the constant `DispatchRaysDimensions()` into one uint per lane. Confirms every lane sees the host-side `{W, H, D}` even when only one dimension > 1. - `miss-shader-index.test` — two miss shaders writing distinct sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1 respectively; rays start far enough from the geometry that every ray misses. Verifies the SBT miss region's per-record routing. - `ray-contribution-to-hit-group-index.test` — two hit groups with distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same triangle. Verifies the SBT hit-group region's per-record routing. The first two have no AS / Miss / HitGroup in their pipeline at all — just a raygen + a UAV — which doubles as a regression check for the minimum viable RT pipeline shape (one raygen group, zero-sized miss / hit / callable SBT regions). The latter two reuse the single-triangle BLAS / TLAS from `raygen-roundtrip.test`. All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader("…")]` entry points to either DXIL libraries or SPIR-V. With the Metal RT bring-up in #1281 rebased underneath this branch, all four pass natively on Apple Silicon and `Metal` is dropped from the XFAIL list. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 26, 2026
…ldRay Three small PSO RT tests stacked on llvm#1275, each isolating one shader- observable closest-hit system value from llvm#1268's 👍 list. Same shape as the prior batch in llvm#1277 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest- hit shader reports a known `BuiltInTriangleIntersectionAttributes ::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used midpoint(v0, v1) and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal RT bring-up rebased on top, all three pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: all three pass on Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton + the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 26, 2026
…ldRay (#1278) Depends on #1281 ## Summary Three small PSO raytracing tests stacked on #1275, each isolating one shader-observable closest-hit system value from #1268's 👍 list. Same shape as the prior batch in #1277 — one `.test` file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest-hit shader reports a known `BuiltInTriangleIntersectionAttributes::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used `midpoint(v0, v1)` and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader("…")]` entry points. With the Metal RT bring-up in #1281 rebased underneath this branch, all three pass natively on Apple Silicon and `Metal` is dropped from the XFAIL list. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
to Traverse-Research/offload-test-suite
that referenced
this pull request
Jun 26, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery in compute) physically can't express — they're only reachable through a DispatchRays-driven RT pipeline. - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay from an above-the-triangle origin. First-level CH sees payload=0 → bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 → writes 0x10. Unwinds: first-level OR's in 0x100. Final payload 0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion- Depth: 2` so both TraceRay calls are within budget. - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes 0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only — inline RT has no equivalent) so CH is skipped and payload keeps its initial 0xAAAA. Output [0xBEEF, 0xAAAA]. - `callable-shader.test` — two callable shaders writing distinct sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)` so the SBT callable region's per-record routing is exercised independently of the hit-group / miss routing already covered in llvm#1277. Callable shaders themselves don't exist in inline RT. This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv` backend lists every callable's IncomingCallableDataKHR variable in every callable entry point's interface, violating VUID-Standalone- Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk- CreateShaderModule. The framework's Vulkan SBT / callable path is correct — running `spirv-opt --remove-unused-interface-variables` on the DXC output cleans the SPIR-V and the test passes natively. Track upstream. All three pass on Metal once the bring-up PR ahead of this commit sets the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so nested TraceRay actually unwinds (with the default of 1, the second TraceRay was silently dropped and the recursion test produced 0x1 instead of 0x110). Locally verified on the user's Linux box: - Vulkan via the native offloader: recursion + skip-CH PASS; callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V as documented above). - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all three PASS. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: - All three PASS (recursion + skip-CH + callable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #1273
Summary
Second per-backend bring-up in the PSO raytracing series (#1268). Stacks on top of #1273 (the Vulkan bring-up + the API surface). Mirrors that PR for D3D12: builds an
ID3D12StateObjectfrom the YAML schema, hands out shader identifiers viaID3D12StateObjectProperties, lays out the SBT in an upload heap, and routesDispatchRaysthroughID3D12GraphicsCommandList4(same query path the AS-build already uses).D3D12 implementation:
DXRayTracingPipelineStatederives fromDXPipelineStatewith anIsRayTracingflag on the base forclassof— matching theVulkanPipelineStatepattern. It carries theID3D12StateObject, a cachedID3D12StateObjectProperties, and aStringMap<const void *>resolving each raygen / miss / callable shader'sEntryPointand each hit-group'sNameto its 32-byte shader identifier blob. The identifiers are driver-owned and stay alive for thePropertiesCOM lifetime, so the PSO keepsPropertiesalive. Closest-hit / any-hit / intersection shaders are not directly addressable viaGetShaderIdentifier— they're imported through a hit-group subobject — so the eager-cache loop skips them.DXShaderBindingTableholds a single upload-heap buffer plus four pre-builtD3D12_DISPATCH_RAYS_DESCranges (raygen, miss, hit-group, callable) —RANGEfor raygen since it's always one record, andRANGE_AND_STRIDEfor the others.createPipelineRTbuilds aCD3DX12_STATE_OBJECT_DESCwith subobjects for the DXIL library (one export perShaderentry), per-hit-group subobjects with closest-hit / any-hit / intersection imports, the pipeline shader config (max payload + max attribute bytes), pipeline config (max recursion depth), and a global root signature subobject. The root signature comes from the library's embeddedRTS0part when present, falling back to theBindingsDescpath (matching the existing compute / raster pipeline behaviour). Wide strings for the subobject exports live in aSmallVectorthat outlives theSODesc, since the helper classes store pointers into the strings rather than copying.createShaderBindingTablelays out each entry as[identifier][LocalRootData][padding-to-stride]with per-region stride =align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRootData-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT)and per-region size =align(count * stride, D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an upload heap withD3D12_RESOURCE_STATE_GENERIC_READ— PR3 simplification; a staging copy into a default heap is a follow-up.dispatchRaysqueries the underlyingCommandListXforID3D12GraphicsCommandList4(matching the AS-build path), binds the global root signature viaSetComputeRootSignature, callsSetPipelineState1with the state object, and issuesDispatchRayswith aD3D12_DISPATCH_RAYS_DESCpopulated from the SBT's four ranges plus the dispatch dimensions. The descriptor heap + descriptor-table bindings are set up by the existingcreateComputeCommandshelper before the encoder is created.createComputeCommandsgrows anisRayTracingbranch at the dispatch point so it callsdispatchRaysinstead ofdispatch, reusing all of the descriptor-heap and root-signature wiring.InvocationStatecarries aShaderBindingTableunique_ptrthat's only populated for RT pipelines.executeProgram'sisRayTracingbranch builds aRayTracingPipelineCreateDescfromPipeline.Shaders/HitGroups/RTConfig, callscreatePipelineRTthencreateShaderBindingTable, then re-enterscreateComputeCommandswhich dispatches via the new RT path.Test side:
raygen-roundtrip.test'sXFAILbecomesClang, Metal— DirectX should PASS via this implementation on Windows CI (and via Wine + vkd3d-proton locally on Linux). The Clang token still catches the compile failure onclang-dxcsince[shader("raygeneration")]doesn't yet lower to either DXIL libraries or SPIR-V on that path.Test plan
Local on an NVIDIA RTX 3060:
offloader)offloader.exe)offloader.exe)offloader.exe)CI (RT-capable runners):
RaytracingTier 1.2)VK_KHR_ray_tracing_pipeline)supportsRaytracing)