Skip to content

[VK] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1273

Merged
manon-traverse merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-vulkan
Jun 18, 2026
Merged

[VK] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1273
manon-traverse merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-vulkan

Conversation

@MarijnS95

@MarijnS95 MarijnS95 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

First per-backend bring-up in the PSO raytracing series (#1268). Stacks on top of #1270 (foundational schema + lit infrastructure + XFAILed test). Adds the API surface needed by the upcoming D3D12 and Metal PRs plus the Vulkan implementation behind it.

API surface:

  • ComputeEncoder::dispatchRays(PSO, SBT, W, H, D) virtual on the existing compute encoder (no separate RayTracingEncoder).
  • Device::createPipelineRT + Device::createShaderBindingTable virtuals with a new RayTracingPipelineCreateDesc carrying the DXIL library blob, the shader entry points (Stage + EntryPoint), the hit-group list, and the RayTracingPipelineConfig.
  • include/API/ShaderBindingTable.h holding the abstract runtime base; backend SBT classes derive from it with LLVM-style classof / cast<>.
  • Rename: PR Add RayTracing pipeline kind, shader stages, and YAML schema #1270's YAML struct ShaderBindingTableShaderBindingTableDesc so the bare name is free for the runtime class (parallel to BLASDesc / TLASDesc vs AccelerationStructure). YAML key stays ShaderBindingTable:.
  • D3D12 and Metal stub the new methods with not-yet-supported errors; their bring-up lands in follow-up PRs.

Vulkan implementation:

  • The pre-existing RaytracingFunctions RT struct lumped AS and RT-pipeline entry points together; they split into ASFunctions AS + RTPipelineFunctions RT so the names match the actual feature-gate split (AS + ray-query is a complete configuration; RT pipeline layers on top). HasRayTracingSupport renames to HasASSupport; HasRTPipelineSupport tracks the new extension.
  • VK_KHR_ray_tracing_pipeline is requested when reported, with VkPhysicalDeviceRayTracingPipelineFeaturesKHR chained pre-query and the gating rayTracingPipeline bool checked post-query (matches the AS / BDA pattern from Add RT acceleration structure abstraction with size queries and resource allocation #1232). Sub-features the tests don't exercise (capture-replay / indirect-trace / traversal-primitive-culling) are cleared.
  • Function pointers vkCreateRayTracingPipelinesKHR, vkGetRayTracingShaderGroupHandlesKHR, vkCmdTraceRaysKHR resolve once at device creation. VkPhysicalDeviceRayTracingPipelinePropertiesKHR is cached at the same time for SBT handle size / alignment / base alignment.
  • VKRayTracingPipelineState derives from VulkanPipelineState; an IsRayTracing flag on the base lets the existing Vulkan cast<> path stay polymorphic without adding a new GPUAPI value. The derived class also carries a StringMap<uint32_t> resolving each shader EntryPoint or hit-group Name to its index in the pipeline's group array, plus per-bucket counts so the SBT builder can slice the contiguous handle blob into raygen / miss / hit / callable regions.
  • createPipelineRT builds a single VkShaderModule (the DXIL library compiles to one SPIR-V module with multiple OpEntryPoints), one VkPipelineShaderStageCreateInfo per Shader entry, and one VkRayTracingShaderGroupCreateInfoKHR per general shader / hit group. Pipeline layout uses the same createPipelineLayout helper as the compute path, gated on all six RT stage flags so any binding can be consumed from any RT shader.
  • createShaderBindingTable allocates a host-visible coherent buffer big enough for four regions, then lays out each entry as [handle bytes][LocalRootData bytes][padding-to-stride]. Per-region stride = align(handleSize + max-LocalRootData-in-region, handleAlignment); per-region size = align(count * stride, baseAlignment). LocalRootData support comes for free from PR Add RayTracing pipeline kind, shader stages, and YAML schema #1270's SBT schema; the test doesn't exercise it yet. Each region's VkStridedDeviceAddressRegionKHR derives from the buffer's vkGetBufferDeviceAddress.
  • dispatchRays binds the pipeline at VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, emits a pre-barrier with ACCELERATION_STRUCTURE_READ_BIT_KHR | SHADER_READ_BIT | SHADER_WRITE_BIT dst access into RAY_TRACING_SHADER_BIT_KHR, then calls vkCmdTraceRaysKHR with the SBT's four region structs.
  • createCommands picks the new bind point for RT pipelines so vkCmdBindDescriptorSets binds to the right point. executeProgram's isRayTracing branch builds a RayTracingPipelineCreateDesc from the Pipeline, calls createPipelineRT then createShaderBindingTable, and keeps both on InvocationState for the dispatch.

Test side: raygen-roundtrip.test's XFAIL becomes Clang, DirectX, Metal. On a DXC + Vulkan combo with the device reporting VK_KHR_ray_tracing_pipeline this should PASS; the Clang token still catches the compile failure on the Linux + clang-dxc loop where [shader("raygeneration")] doesn't yet lower to SPIR-V.

Test plan

Local on an NVIDIA RTX 3060:

  • Linux Vulkan (native offloader)
  • Linux D3D12 (Wine + vkd3d-proton + cross-compiled offloader.exe)
  • Windows Vulkan (native offloader.exe)
  • Windows D3D12 (native offloader.exe)

CI (RT-capable runners):

  • windows-nvidia D3D12 (RaytracingTier 1.2)
  • windows-intel VK (VK_KHR_ray_tracing_pipeline)
  • macOS Metal (supportsRaytracing)

MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 8, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 11, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First per-backend bring-up in the PSO raytracing series (llvm#1268). Adds
the API surface (ComputeEncoder::dispatchRays, Device::createPipelineRT,
Device::createShaderBindingTable, RayTracingPipelineCreateDesc) plus the
Vulkan implementation behind it. D3D12 and Metal stub the new methods
with not-yet-supported errors; their bring-up lands in follow-up PRs.

The pre-existing YAML schema struct from PR llvm#1270 is renamed
ShaderBindingTable -> ShaderBindingTableDesc so the bare name is free
for the runtime resource class (parallel to BLASDesc / TLASDesc vs
AccelerationStructure). A new include/API/ShaderBindingTable.h holds
the abstract runtime base; concrete backend SBT classes derive from it
with LLVM-style classof / cast<>.

The VulkanDevice's prior `RaytracingFunctions RT` lumped AS and RT
pipeline entry points together. They split into two structs —
`ASFunctions AS` and `RTPipelineFunctions RT` — matching the actual
feature-gate split (AS+ray-query is a complete configuration on its
own, RT pipeline is layered on top). `HasRayTracingSupport` renames
to `HasASSupport`, and a separate `HasRTPipelineSupport` tracks the
new VK_KHR_ray_tracing_pipeline extension.

Vulkan bring-up:
  - Extension: VK_KHR_ray_tracing_pipeline is requested when reported,
    with VkPhysicalDeviceRayTracingPipelineFeaturesKHR chained into the
    pre-create feature query. After the query the gating
    rayTracingPipeline bool is checked; capture-replay / trace-rays-
    indirect / traversal-primitive-culling sub-features are cleared
    since the tests don't exercise them.
  - Function pointers: vkCreateRayTracingPipelinesKHR,
    vkGetRayTracingShaderGroupHandlesKHR, vkCmdTraceRaysKHR.
  - Properties: VkPhysicalDeviceRayTracingPipelinePropertiesKHR is
    cached at device-create time for SBT handle size / alignment /
    base-alignment.
  - VKRayTracingPipelineState derives from VulkanPipelineState; an
    IsRayTracing flag on the base lets the existing Vulkan cast<>
    path stay polymorphic without adding a new GPUAPI value.
    classof tests both the API and the flag. The derived class also
    carries a StringMap<uint32_t> resolving each shader EntryPoint or
    HitGroup Name to its index in the pipeline's group array, plus
    per-bucket counts so the SBT builder can slice the contiguous
    handle blob into raygen / miss / hit / callable regions.
  - createPipelineRT builds a single VkShaderModule (the DXIL library
    compiles to one SPIR-V module with multiple OpEntryPoints), then
    one VkPipelineShaderStageCreateInfo per Shader entry and one
    VkRayTracingShaderGroupCreateInfoKHR per general shader / hit
    group. Pipeline layout is shared with the compute path via
    createPipelineLayout, gated on all six RT stage flags so any
    binding can be consumed from any RT shader.
  - createShaderBindingTable allocates a host-visible coherent buffer
    big enough for four regions and lays out each entry as
    [handle bytes][localRootData bytes][padding-to-stride]. Per-region
    stride = align(handleSize + max-local-root-data-in-region,
    handleAlignment); per-region size = align(count * stride,
    baseAlignment). LocalRootData support comes free from the PR1 SBT
    schema; the test doesn't exercise it yet. Each region's
    VkStridedDeviceAddressRegionKHR derives from the buffer's
    vkGetBufferDeviceAddress.
  - dispatchRays binds the pipeline at
    VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, emits a pre-barrier with
    AS_READ + SHADER_READ/WRITE dst access into
    RAY_TRACING_SHADER_BIT_KHR, then calls vkCmdTraceRaysKHR with the
    SBT's four region structs.
  - createCommands picks the new bind point for RT pipelines so
    vkCmdBindDescriptorSets binds to the right point. executeProgram's
    isRayTracing branch builds a RayTracingPipelineCreateDesc from the
    YAML, calls createPipelineRT then createShaderBindingTable, and
    keeps both on InvocationState for the dispatch.

raygen-roundtrip.test now expects DirectX/Metal/Clang to XFAIL; on a
DXC + Vulkan combo with VK_KHR_ray_tracing_pipeline supported the test
should PASS via this implementation. On the user's Linux + clang-dxc
loop the test still XFAILs because clang-dxc doesn't yet lower
[shader("raygeneration")] entry points to SPIR-V, so the Clang XFAIL
token catches the compile failure. CI on a working DXC install will
exercise the runtime path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EmilioLaiso EmilioLaiso marked this pull request as ready for review June 17, 2026 09:59
The offloader assumed one compiled object file per pipeline shader stage
and bound them 1:1 to Pipeline::Shaders. That holds for compute, raster,
and mesh pipelines, but ray tracing compiles every entry point (raygen,
miss, closest-hit, ...) into a single lib_6_5 library blob.

@manon-traverse manon-traverse left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question but that might just be because of a badly written comment.

Comment thread include/API/Device.h Outdated
Comment thread tools/offloader/offloader.cpp
@manon-traverse

Copy link
Copy Markdown
Contributor

Only an irrelevant tests are failing, two people approved it, let's merge it.

@manon-traverse manon-traverse merged commit bad853c into llvm:main Jun 18, 2026
22 of 26 checks passed
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 19, 2026
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso added a commit that referenced this pull request Jun 22, 2026
)

Depends on #1273

## Summary

Second per-backend bring-up in the PSO raytracing series (#1268). Stacks
on top of #1273 (the Vulkan bring-up + the API surface). Mirrors that PR
for D3D12: builds an `ID3D12StateObject` from the YAML schema, hands out
shader identifiers via `ID3D12StateObjectProperties`, lays out the SBT
in an upload heap, and routes `DispatchRays` through
`ID3D12GraphicsCommandList4` (same query path the AS-build already
uses).

D3D12 implementation:

- `DXRayTracingPipelineState` derives from `DXPipelineState` with an
`IsRayTracing` flag on the base for `classof` — matching the
`VulkanPipelineState` pattern. It carries the `ID3D12StateObject`, a
cached `ID3D12StateObjectProperties`, and a `StringMap<const void *>`
resolving each raygen / miss / callable shader's `EntryPoint` and each
hit-group's `Name` to its 32-byte shader identifier blob. The
identifiers are driver-owned and stay alive for the `Properties` COM
lifetime, so the PSO keeps `Properties` alive. Closest-hit / any-hit /
intersection shaders are *not* directly addressable via
`GetShaderIdentifier` — they're imported through a hit-group subobject —
so the eager-cache loop skips them.
- `DXShaderBindingTable` holds a single upload-heap buffer plus four
pre-built `D3D12_DISPATCH_RAYS_DESC` ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.
- `createPipelineRT` builds a `CD3DX12_STATE_OBJECT_DESC` with
subobjects for the DXIL library (one export per `Shader` entry),
per-hit-group subobjects with closest-hit / any-hit / intersection
imports, the pipeline shader config (max payload + max attribute bytes),
pipeline config (max recursion depth), and a global root signature
subobject. The root signature comes from the library's embedded `RTS0`
part when present, falling back to the `BindingsDesc` path (matching the
existing compute / raster pipeline behaviour). Wide strings for the
subobject exports live in a `SmallVector` that outlives the `SODesc`,
since the helper classes store pointers into the strings rather than
copying.
- `createShaderBindingTable` lays out each entry as
`[identifier][LocalRootData][padding-to-stride]` with per-region stride
= `align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES +
max-LocalRootData-in-region,
D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT)` and per-region size =
`align(count * stride, D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT)`.
The buffer lives in an upload heap with
`D3D12_RESOURCE_STATE_GENERIC_READ` — PR3 simplification; a staging copy
into a default heap is a follow-up.
- `dispatchRays` queries the underlying `CommandListX` for
`ID3D12GraphicsCommandList4` (matching the AS-build path), binds the
global root signature via `SetComputeRootSignature`, calls
`SetPipelineState1` with the state object, and issues `DispatchRays`
with a `D3D12_DISPATCH_RAYS_DESC` populated from the SBT's four ranges
plus the dispatch dimensions. The descriptor heap + descriptor-table
bindings are set up by the existing `createComputeCommands` helper
before the encoder is created.
- `createComputeCommands` grows an `isRayTracing` branch at the dispatch
point so it calls `dispatchRays` instead of `dispatch`, reusing all of
the descriptor-heap and root-signature wiring. `InvocationState` carries
a `ShaderBindingTable` `unique_ptr` that's only populated for RT
pipelines.
- `executeProgram`'s `isRayTracing` branch builds a
`RayTracingPipelineCreateDesc` from `Pipeline.Shaders` / `HitGroups` /
`RTConfig`, calls `createPipelineRT` then `createShaderBindingTable`,
then re-enters `createComputeCommands` which dispatches via the new RT
path.

Test side: `raygen-roundtrip.test`'s `XFAIL` becomes `Clang, Metal` —
DirectX should PASS via this implementation on Windows CI (and via Wine
+ vkd3d-proton locally on Linux). The Clang token still catches the
compile failure on `clang-dxc` since `[shader("raygeneration")]` doesn't
yet lower to either DXIL libraries or SPIR-V on that path.

## Test plan

Local on an NVIDIA RTX 3060:
- [ ] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants