Skip to content

[DX] Add ray tracing state object, SBT, and DispatchRays bring-up#1275

Merged
EmilioLaiso merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-dx
Jun 22, 2026
Merged

[DX] Add ray tracing state object, SBT, and DispatchRays bring-up#1275
EmilioLaiso merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-dx

Conversation

@MarijnS95

@MarijnS95 MarijnS95 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Depends on #1273

Summary

Second per-backend bring-up in the PSO raytracing series (#1268). Stacks on top of #1273 (the Vulkan bring-up + the API surface). Mirrors that PR for D3D12: builds an ID3D12StateObject from the YAML schema, hands out shader identifiers via ID3D12StateObjectProperties, lays out the SBT in an upload heap, and routes DispatchRays through ID3D12GraphicsCommandList4 (same query path the AS-build already uses).

D3D12 implementation:

  • DXRayTracingPipelineState derives from DXPipelineState with an IsRayTracing flag on the base for classof — matching the VulkanPipelineState pattern. It carries the ID3D12StateObject, a cached ID3D12StateObjectProperties, and a StringMap<const void *> resolving each raygen / miss / callable shader's EntryPoint and each hit-group's Name to its 32-byte shader identifier blob. The identifiers are driver-owned and stay alive for the Properties COM lifetime, so the PSO keeps Properties alive. Closest-hit / any-hit / intersection shaders are not directly addressable via GetShaderIdentifier — they're imported through a hit-group subobject — so the eager-cache loop skips them.
  • DXShaderBindingTable holds a single upload-heap buffer plus four pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group, callable) — RANGE for raygen since it's always one record, and RANGE_AND_STRIDE for the others.
  • createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects for the DXIL library (one export per Shader entry), per-hit-group subobjects with closest-hit / any-hit / intersection imports, the pipeline shader config (max payload + max attribute bytes), pipeline config (max recursion depth), and a global root signature subobject. The root signature comes from the library's embedded RTS0 part when present, falling back to the BindingsDesc path (matching the existing compute / raster pipeline behaviour). Wide strings for the subobject exports live in a SmallVector that outlives the SODesc, since the helper classes store pointers into the strings rather than copying.
  • createShaderBindingTable lays out each entry as [identifier][LocalRootData][padding-to-stride] with per-region stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRootData-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and per-region size = align(count * stride, D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification; a staging copy into a default heap is a follow-up.
  • dispatchRays queries the underlying CommandListX for ID3D12GraphicsCommandList4 (matching the AS-build path), binds the global root signature via SetComputeRootSignature, calls SetPipelineState1 with the state object, and issues DispatchRays with a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus the dispatch dimensions. The descriptor heap + descriptor-table bindings are set up by the existing createComputeCommands helper before the encoder is created.
  • createComputeCommands grows an isRayTracing branch at the dispatch point so it calls dispatchRays instead of dispatch, reusing all of the descriptor-heap and root-signature wiring. InvocationState carries a ShaderBindingTable unique_ptr that's only populated for RT pipelines.
  • executeProgram's isRayTracing branch builds a RayTracingPipelineCreateDesc from Pipeline.Shaders / HitGroups / RTConfig, calls createPipelineRT then createShaderBindingTable, then re-enters createComputeCommands which dispatches via the new RT path.

Test side: raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should PASS via this implementation on Windows CI (and via Wine + vkd3d-proton locally on Linux). The Clang token still catches the compile failure on clang-dxc since [shader("raygeneration")] doesn't yet lower to either DXIL libraries or SPIR-V on that path.

Test plan

Local on an NVIDIA RTX 3060:

  • Linux Vulkan (native offloader)
  • Linux D3D12 (Wine + vkd3d-proton + cross-compiled offloader.exe)
  • Windows Vulkan (native offloader.exe)
  • Windows D3D12 (native offloader.exe)

CI (RT-capable runners):

  • windows-nvidia D3D12 (RaytracingTier 1.2)
  • windows-intel VK (VK_KHR_ray_tracing_pipeline)
  • macOS Metal (supportsRaytracing)

@MarijnS95 MarijnS95 force-pushed the rt-pso-dx branch 2 times, most recently from 7315abc to 06f0ce9 Compare June 3, 2026 09:59
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
…ldRay

Three small PSO RT tests stacked on llvm#1275, each isolating one shader-
observable closest-hit system value from llvm#1268's 👍 list. Same shape as
the prior batch in llvm#1277 — one .test file per behavior, single-purpose
shader, exact buffer comparison.

  - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires
    at a clearly-interior point of the single triangle so the closest-
    hit shader reports a known `BuiltInTriangleIntersectionAttributes
    ::barycentrics` (u, v). Points are picked from the inside of the
    triangle to avoid the watertight-traversal edge-rule lottery you
    hit at edge midpoints / vertices (the first cut of this test used
    midpoint(v0, v1) and one lane silently missed on both backends).
  - `closest-hit-primitive-index.test` — three triangles tiled at x =
    -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at
    each triangle's centroid; the closest-hit reports `PrimitiveIndex()`
    and must match the lane index 0..2.
  - `closest-hit-world-ray.test` — 2-lane dispatch with rays from
    different z heights (1.0 and 2.0). Closest-hit packs
    `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
    through the payload; raygen flattens the float3 into a 6-element
    Float32 buffer. Verifies the system values match the raygen-side
    `RayDesc` and that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal
RT bring-up rebased on top, all three pass natively on Apple Silicon
and Metal is dropped from the XFAIL list.

Locally verified end-to-end on the user's Linux box: all three pass on
Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton +
the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on
macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three
PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery
in compute) physically can't express — they're only reachable through
a DispatchRays-driven RT pipeline.

  - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay
    from an above-the-triangle origin. First-level CH sees payload=0 →
    bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 →
    writes 0x10. Unwinds: first-level OR's in 0x100. Final payload
    0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion-
    Depth: 2` so both TraceRay calls are within budget.
  - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at
    the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes
    0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only —
    inline RT has no equivalent) so CH is skipped and payload keeps its
    initial 0xAAAA. Output [0xBEEF, 0xAAAA].
  - `callable-shader.test` — two callable shaders writing distinct
    sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)`
    so the SBT callable region's per-record routing is exercised
    independently of the hit-group / miss routing already covered in
    llvm#1277. Callable shaders themselves don't exist in inline RT.

    This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv`
    backend lists every callable's IncomingCallableDataKHR variable in
    every callable entry point's interface, violating VUID-Standalone-
    Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk-
    CreateShaderModule. The framework's Vulkan SBT / callable path is
    correct — running `spirv-opt --remove-unused-interface-variables`
    on the DXC output cleans the SPIR-V and the test passes natively.
    Track upstream.

All three pass on Metal once the bring-up PR ahead of this commit sets
the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so
nested TraceRay actually unwinds (with the default of 1, the second
TraceRay was silently dropped and the recursion test produced 0x1
instead of 0x110).

Locally verified on the user's Linux box:
  - Vulkan via the native offloader: recursion + skip-CH PASS;
    callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V
    as documented above).
  - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all
    three PASS.
And on macOS 15 / metal-irconverter 3.1.1 via the native offloader:
  - All three PASS (recursion + skip-CH + callable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 3, 2026
Four small tests stacked on top of llvm#1275, each isolating one
shader-observable PSO raytracing surface. They follow the same shape as
the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test
file per behavior, single-purpose shader, exact buffer comparison.

  - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
    `DispatchRaysIndex().x` into `Output[index]`. Confirms the
    dispatch grid plumbs through to the per-lane system value with no
    BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding).
  - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
    constant `DispatchRaysDimensions()` into one uint per lane.
    Confirms every lane sees the host-side `{W, H, D}` even when only
    one dimension > 1.
  - `miss-shader-index.test` — two miss shaders writing distinct
    sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0
    and 1 respectively; rays start far enough from the geometry that
    every ray misses. Verifies the SBT miss region's per-record
    routing.
  - `ray-contribution-to-hit-group-index.test` — two hit groups with
    distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
    `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
    triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which exercises the minimum viable RT pipeline
shape (one raygen group, zero-sized miss / hit / callable SBT regions).
The latter two reuse the single-triangle BLAS/TLAS from
`raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang`
— Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on
top, all four pass natively on Apple Silicon and Metal is dropped from
the XFAIL list.

Locally verified end-to-end on the user's Linux box:
  - Vulkan via the native offloader against an NVIDIA RTX 3060:
    all four tests PASS.
  - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe
    on the same GPU: all four tests PASS.
And on macOS 15 / metal-irconverter 3.1.1:
  - Metal via the native offloader: all four tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 8, 2026
Four small tests stacked on top of llvm#1275, each isolating one
shader-observable PSO raytracing surface. They follow the same shape as
the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test
file per behavior, single-purpose shader, exact buffer comparison.

  - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
    `DispatchRaysIndex().x` into `Output[index]`. Confirms the
    dispatch grid plumbs through to the per-lane system value with no
    BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding).
  - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
    constant `DispatchRaysDimensions()` into one uint per lane.
    Confirms every lane sees the host-side `{W, H, D}` even when only
    one dimension > 1.
  - `miss-shader-index.test` — two miss shaders writing distinct
    sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0
    and 1 respectively; rays start far enough from the geometry that
    every ray misses. Verifies the SBT miss region's per-record
    routing.
  - `ray-contribution-to-hit-group-index.test` — two hit groups with
    distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
    `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
    triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which exercises the minimum viable RT pipeline
shape (one raygen group, zero-sized miss / hit / callable SBT regions).
The latter two reuse the single-triangle BLAS/TLAS from
`raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang`
— Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on
top, all four pass natively on Apple Silicon and Metal is dropped from
the XFAIL list.

Locally verified end-to-end on the user's Linux box:
  - Vulkan via the native offloader against an NVIDIA RTX 3060:
    all four tests PASS.
  - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe
    on the same GPU: all four tests PASS.
And on macOS 15 / metal-irconverter 3.1.1:
  - Metal via the native offloader: all four tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 8, 2026
…ldRay

Three small PSO RT tests stacked on llvm#1275, each isolating one shader-
observable closest-hit system value from llvm#1268's 👍 list. Same shape as
the prior batch in llvm#1277 — one .test file per behavior, single-purpose
shader, exact buffer comparison.

  - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires
    at a clearly-interior point of the single triangle so the closest-
    hit shader reports a known `BuiltInTriangleIntersectionAttributes
    ::barycentrics` (u, v). Points are picked from the inside of the
    triangle to avoid the watertight-traversal edge-rule lottery you
    hit at edge midpoints / vertices (the first cut of this test used
    midpoint(v0, v1) and one lane silently missed on both backends).
  - `closest-hit-primitive-index.test` — three triangles tiled at x =
    -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at
    each triangle's centroid; the closest-hit reports `PrimitiveIndex()`
    and must match the lane index 0..2.
  - `closest-hit-world-ray.test` — 2-lane dispatch with rays from
    different z heights (1.0 and 2.0). Closest-hit packs
    `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
    through the payload; raygen flattens the float3 into a 6-element
    Float32 buffer. Verifies the system values match the raygen-side
    `RayDesc` and that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal
RT bring-up rebased on top, all three pass natively on Apple Silicon
and Metal is dropped from the XFAIL list.

Locally verified end-to-end on the user's Linux box: all three pass on
Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton +
the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on
macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three
PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 added a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 8, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery
in compute) physically can't express — they're only reachable through
a DispatchRays-driven RT pipeline.

  - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay
    from an above-the-triangle origin. First-level CH sees payload=0 →
    bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 →
    writes 0x10. Unwinds: first-level OR's in 0x100. Final payload
    0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion-
    Depth: 2` so both TraceRay calls are within budget.
  - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at
    the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes
    0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only —
    inline RT has no equivalent) so CH is skipped and payload keeps its
    initial 0xAAAA. Output [0xBEEF, 0xAAAA].
  - `callable-shader.test` — two callable shaders writing distinct
    sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)`
    so the SBT callable region's per-record routing is exercised
    independently of the hit-group / miss routing already covered in
    llvm#1277. Callable shaders themselves don't exist in inline RT.

    This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv`
    backend lists every callable's IncomingCallableDataKHR variable in
    every callable entry point's interface, violating VUID-Standalone-
    Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk-
    CreateShaderModule. The framework's Vulkan SBT / callable path is
    correct — running `spirv-opt --remove-unused-interface-variables`
    on the DXC output cleans the SPIR-V and the test passes natively.
    Track upstream.

All three pass on Metal once the bring-up PR ahead of this commit sets
the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so
nested TraceRay actually unwinds (with the default of 1, the second
TraceRay was silently dropped and the recursion test produced 0x1
instead of 0x110).

Locally verified on the user's Linux box:
  - Vulkan via the native offloader: recursion + skip-CH PASS;
    callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V
    as documented above).
  - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all
    three PASS.
And on macOS 15 / metal-irconverter 3.1.1 via the native offloader:
  - All three PASS (recursion + skip-CH + callable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarijnS95 and others added 2 commits June 19, 2026 09:27
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RayTracing pipelines compile every entry point — raygen, miss,
closest-hit, any-hit, intersection, callable — into a single DXIL
library via `dxc -T lib_6_x` / `clang-dxc -T lib_6_x`. That's the
shape every real DXR app ships: D3D12's CreateStateObject requires a
DXIL-library subobject anyway, and the driver fuses entry points
across the whole library at link time, so writing one .hlsl file and
compiling it once is both idiomatic and the path the framework's
`%dxc_target_lib` substitution emits.

Compute and raster pipelines stay one-to-one (the existing position-
based mapping handles VS+PS, AS+MS+PS, etc.). RT pipelines today need
N positional args even though one library blob holds every entry —
which the foundational `raygen-roundtrip.test` runs straight into:
3 Shaders[] entries vs 1 input file fails the count check before any
GPU work happens.

Detect the RT-pipeline-with-one-input shape and copy the library blob
into every `Shaders[].Shader` slot via `MemoryBuffer::getMemBufferCopy`.
Each entry owns its own buffer copy (DXIL libraries are KBs, no real
memory pressure) keeping the existing `unique_ptr<MemoryBuffer>`
ownership model intact. Non-RT pipelines still go through the
positional path and still enforce the count check.

Verified by re-running `raygen-roundtrip.test`'s pipeline.yaml + the
DXIL library via Wine + vkd3d-proton with a single .o argument — same
0xBEEF result the prior three-arg invocation produced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EmilioLaiso EmilioLaiso marked this pull request as ready for review June 19, 2026 07:41
Comment thread lib/API/DX/Device.cpp
Comment on lines +1597 to +1601
static std::wstring widen(llvm::StringRef S) {
// Entry-point names and hit-group names are ASCII; a straight 1:1 widen
// is sufficient.
return std::wstring(S.begin(), S.end());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a free function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we fix this in a follow-up PR where we also refactor out all the std::wstring calls to this helper.

@manon-traverse manon-traverse left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs to be clang-tidied and pass CI, other than that this looks fine to me.

@EmilioLaiso EmilioLaiso merged commit 1ea067c into llvm:main Jun 22, 2026
21 of 26 checks passed
EmilioLaiso added a commit that referenced this pull request Jun 24, 2026
Depends on #1275

## Summary

Last backend in the PSO RT bring-up stack. DXR-style ray tracing reaches
Metal through `metal_irconverter`: each RT entry point is lowered from
DXIL to a Metal IR function, raygen is emitted as a kernel
(`IRRayGenerationCompilationKernel`) so it can be dispatched directly,
and miss / closest-hit / any-hit / intersection / callable functions are
emitted as visible functions and pulled into a
`MTLVisibleFunctionTable`.

Fills in the three virtuals the foundation PR left stubbed on Metal:

- `MTLDevice::createPipelineRT` compiles every `Shaders[]` entry against
a single `IRRayTracingPipelineConfiguration` (max attribute / recursion
budget from the YAML `RTConfig`), builds one `MTL::Library` per entry,
hands the raygen function to the compute pipeline as the kernel, and
registers the rest as `LinkedFunctions`. The freshly-built pipeline then
mints a `MTLVisibleFunctionTable` and resolves each callable function's
handle into a slot index that the SBT builder reuses.
`setMaxCallStackDepth(MaxTraceRecursionDepth)` is set so nested
`TraceRay` actually unwinds (default of 1 silently drops the second
trace).
- `MTLDevice::createShaderBindingTable` lays the four SBT regions out
via the shared `computeSBTLayout` helper sized for `IRShaderIdentifier`
records, looks up each region entry's `ShaderName` in the pipeline's
name → `IRShaderIdentifier` map, and `memcpy`s the records into a
shared-storage `MTL::Buffer` the runtime dereferences at dispatch.
- `MTLComputeEncoder::dispatchRays` binds the raygen pipeline and runs
`dispatchThreads(Width, Height, Depth)` on the encoder. The caller
(`createRayTracingCommands` in `MTLDevice`) builds the per-dispatch
`IRDispatchRaysArgument` struct (SBT region addresses + sizes, GRS /
`ResDescHeap` GPU pointers, visible / intersection function table
`resourceID`s), parks it in a shared `MTL::Buffer` kept alive on the
command buffer's KeepAlive list, and binds it at
`kIRRayDispatchArgumentsBindPoint` so callees reached via `TraceRay()`
inherit the same dispatch state through that pointer.

Plumbs the existing `executeProgram` RT branch on Metal the same way the
VK / DX backends already do (validate `Shaders` / `SBT` / `RTConfig`,
build `RayTracingPipelineCreateDesc` from the YAML pipeline, create PSO,
build SBT, record commands), and adds the `raytracing-pipeline` lit
feature on Metal so `test/Feature/RT/raygen-roundtrip.test` drops
`Metal` from its `XFAIL` list and passes natively on Apple Silicon.

This bring-up only handles Triangle hit groups whose only member is a
`ClosestHit` shader — any-hit / intersection / procedural / local root
signatures land in follow-ups; `createPipelineRT` returns a clear
unsupported error for those shapes instead of silently producing wrong
output.

## Test plan

Local on an NVIDIA RTX 3060:
- [ ] Linux Vulkan (native `offloader`)
- [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 24, 2026
Four small tests stacked on top of llvm#1275, each isolating one
shader-observable PSO raytracing surface. They follow the same shape as
the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test
file per behavior, single-purpose shader, exact buffer comparison.

  - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
    `DispatchRaysIndex().x` into `Output[index]`. Confirms the
    dispatch grid plumbs through to the per-lane system value with no
    BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding).
  - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
    constant `DispatchRaysDimensions()` into one uint per lane.
    Confirms every lane sees the host-side `{W, H, D}` even when only
    one dimension > 1.
  - `miss-shader-index.test` — two miss shaders writing distinct
    sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0
    and 1 respectively; rays start far enough from the geometry that
    every ray misses. Verifies the SBT miss region's per-record
    routing.
  - `ray-contribution-to-hit-group-index.test` — two hit groups with
    distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
    `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
    triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which exercises the minimum viable RT pipeline
shape (one raygen group, zero-sized miss / hit / callable SBT regions).
The latter two reuse the single-triangle BLAS/TLAS from
`raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang`
— Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on
top, all four pass natively on Apple Silicon and Metal is dropped from
the XFAIL list.

Locally verified end-to-end on the user's Linux box:
  - Vulkan via the native offloader against an NVIDIA RTX 3060:
    all four tests PASS.
  - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe
    on the same GPU: all four tests PASS.
And on macOS 15 / metal-irconverter 3.1.1:
  - Metal via the native offloader: all four tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 24, 2026
…ldRay

Three small PSO RT tests stacked on llvm#1275, each isolating one shader-
observable closest-hit system value from llvm#1268's 👍 list. Same shape as
the prior batch in llvm#1277 — one .test file per behavior, single-purpose
shader, exact buffer comparison.

  - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires
    at a clearly-interior point of the single triangle so the closest-
    hit shader reports a known `BuiltInTriangleIntersectionAttributes
    ::barycentrics` (u, v). Points are picked from the inside of the
    triangle to avoid the watertight-traversal edge-rule lottery you
    hit at edge midpoints / vertices (the first cut of this test used
    midpoint(v0, v1) and one lane silently missed on both backends).
  - `closest-hit-primitive-index.test` — three triangles tiled at x =
    -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at
    each triangle's centroid; the closest-hit reports `PrimitiveIndex()`
    and must match the lane index 0..2.
  - `closest-hit-world-ray.test` — 2-lane dispatch with rays from
    different z heights (1.0 and 2.0). Closest-hit packs
    `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
    through the payload; raygen flattens the float3 into a 6-element
    Float32 buffer. Verifies the system values match the raygen-side
    `RayDesc` and that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal
RT bring-up rebased on top, all three pass natively on Apple Silicon
and Metal is dropped from the XFAIL list.

Locally verified end-to-end on the user's Linux box: all three pass on
Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton +
the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on
macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three
PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 24, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery
in compute) physically can't express — they're only reachable through
a DispatchRays-driven RT pipeline.

  - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay
    from an above-the-triangle origin. First-level CH sees payload=0 →
    bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 →
    writes 0x10. Unwinds: first-level OR's in 0x100. Final payload
    0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion-
    Depth: 2` so both TraceRay calls are within budget.
  - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at
    the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes
    0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only —
    inline RT has no equivalent) so CH is skipped and payload keeps its
    initial 0xAAAA. Output [0xBEEF, 0xAAAA].
  - `callable-shader.test` — two callable shaders writing distinct
    sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)`
    so the SBT callable region's per-record routing is exercised
    independently of the hit-group / miss routing already covered in
    llvm#1277. Callable shaders themselves don't exist in inline RT.

    This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv`
    backend lists every callable's IncomingCallableDataKHR variable in
    every callable entry point's interface, violating VUID-Standalone-
    Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk-
    CreateShaderModule. The framework's Vulkan SBT / callable path is
    correct — running `spirv-opt --remove-unused-interface-variables`
    on the DXC output cleans the SPIR-V and the test passes natively.
    Track upstream.

All three pass on Metal once the bring-up PR ahead of this commit sets
the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so
nested TraceRay actually unwinds (with the default of 1, the second
TraceRay was silently dropped and the recursion test produced 0x1
instead of 0x110).

Locally verified on the user's Linux box:
  - Vulkan via the native offloader: recursion + skip-CH PASS;
    callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V
    as documented above).
  - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all
    three PASS.
And on macOS 15 / metal-irconverter 3.1.1 via the native offloader:
  - All three PASS (recursion + skip-CH + callable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 25, 2026
Four small tests stacked on top of llvm#1275, each isolating one
shader-observable PSO raytracing surface. They follow the same shape as
the inline-RT batch already in llvm#1271 / llvm#1272 / llvm#1274 / llvm#1276 — one .test
file per behavior, single-purpose shader, exact buffer comparison.

  - `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
    `DispatchRaysIndex().x` into `Output[index]`. Confirms the
    dispatch grid plumbs through to the per-lane system value with no
    BLAS / TLAS / hit groups in play (RT-pipeline-only, no AS binding).
  - `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
    constant `DispatchRaysDimensions()` into one uint per lane.
    Confirms every lane sees the host-side `{W, H, D}` even when only
    one dimension > 1.
  - `miss-shader-index.test` — two miss shaders writing distinct
    sentinels (0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0
    and 1 respectively; rays start far enough from the geometry that
    every ray misses. Verifies the SBT miss region's per-record
    routing.
  - `ray-contribution-to-hit-group-index.test` — two hit groups with
    distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
    `RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
    triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which exercises the minimum viable RT pipeline
shape (one raygen group, zero-sized miss / hit / callable SBT regions).
The latter two reuse the single-triangle BLAS/TLAS from
`raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang`
— Clang (`clang-dxc`) doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up rebased on
top, all four pass natively on Apple Silicon and Metal is dropped from
the XFAIL list.

Locally verified end-to-end on the user's Linux box:
  - Vulkan via the native offloader against an NVIDIA RTX 3060:
    all four tests PASS.
  - D3D12 via Wine + vkd3d-proton + the cross-compiled offloader.exe
    on the same GPU: all four tests PASS.
And on macOS 15 / metal-irconverter 3.1.1:
  - Metal via the native offloader: all four tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 26, 2026
Depends on #1281

## Summary

Four small PSO raytracing tests stacked on top of #1275, each isolating
one shader-observable surface from the 👍 list in #1268. Same shape as
the inline-RT batch already in flight in #1271 / #1272 / #1274 / #1276 —
one `.test` file per behavior, single-purpose shader, exact buffer
comparison.

- `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
`DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid
plumbs through to the per-lane system value with no BLAS / TLAS / hit
groups in play (RT-pipeline-only, no AS binding).
- `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
constant `DispatchRaysDimensions()` into one uint per lane. Confirms
every lane sees the host-side `{W, H, D}` even when only one dimension >
1.
- `miss-shader-index.test` — two miss shaders writing distinct sentinels
(0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1
respectively; rays start far enough from the geometry that every ray
misses. Verifies the SBT miss region's per-record routing.
- `ray-contribution-to-hit-group-index.test` — two hit groups with
distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
`RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which doubles as a regression check for the
minimum viable RT pipeline shape (one raygen group, zero-sized miss /
hit / callable SBT regions). The latter two reuse the single-triangle
BLAS / TLAS from `raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL:
Clang` — `clang-dxc` doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up in #1281
rebased underneath this branch, all four pass natively on Apple Silicon
and `Metal` is dropped from the XFAIL list.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 26, 2026
…ldRay

Three small PSO RT tests stacked on llvm#1275, each isolating one shader-
observable closest-hit system value from llvm#1268's 👍 list. Same shape as
the prior batch in llvm#1277 — one .test file per behavior, single-purpose
shader, exact buffer comparison.

  - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires
    at a clearly-interior point of the single triangle so the closest-
    hit shader reports a known `BuiltInTriangleIntersectionAttributes
    ::barycentrics` (u, v). Points are picked from the inside of the
    triangle to avoid the watertight-traversal edge-rule lottery you
    hit at edge midpoints / vertices (the first cut of this test used
    midpoint(v0, v1) and one lane silently missed on both backends).
  - `closest-hit-primitive-index.test` — three triangles tiled at x =
    -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at
    each triangle's centroid; the closest-hit reports `PrimitiveIndex()`
    and must match the lane index 0..2.
  - `closest-hit-world-ray.test` — 2-lane dispatch with rays from
    different z heights (1.0 and 2.0). Closest-hit packs
    `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
    through the payload; raygen flattens the float3 into a 6-element
    Float32 buffer. Verifies the system values match the raygen-side
    `RayDesc` and that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal
RT bring-up rebased on top, all three pass natively on Apple Silicon
and Metal is dropped from the XFAIL list.

Locally verified end-to-end on the user's Linux box: all three pass on
Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton +
the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on
macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three
PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 26, 2026
…ldRay (#1278)

Depends on #1281

## Summary

Three small PSO raytracing tests stacked on #1275, each isolating one
shader-observable closest-hit system value from #1268's 👍 list. Same
shape as the prior batch in #1277 — one `.test` file per behavior,
single-purpose shader, exact buffer comparison.

- `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at
a clearly-interior point of the single triangle so the closest-hit
shader reports a known
`BuiltInTriangleIntersectionAttributes::barycentrics` (u, v). Points are
picked from the inside of the triangle to avoid the watertight-traversal
edge-rule lottery you hit at edge midpoints / vertices (the first cut of
this test used `midpoint(v0, v1)` and one lane silently missed on both
backends).
- `closest-hit-primitive-index.test` — three triangles tiled at x = -3,
0, +3 in a single BLAS. 3-lane dispatch fires straight down at each
triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must
match the lane index 0..2.
- `closest-hit-world-ray.test` — 2-lane dispatch with rays from
different z heights (1.0 and 2.0). Closest-hit packs
`WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
through the payload; raygen flattens the float3 into a 6-element Float32
buffer. Verifies the system values match the raygen-side `RayDesc` and
that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader("…")]` entry points. With the
Metal RT bring-up in #1281 rebased underneath this branch, all three
pass natively on Apple Silicon and `Metal` is dropped from the XFAIL
list.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit to Traverse-Research/offload-test-suite that referenced this pull request Jun 26, 2026
Three tests stacked on llvm#1275 covering features that inline RT (RayQuery
in compute) physically can't express — they're only reachable through
a DispatchRays-driven RT pipeline.

  - `trace-ray-recursion.test` — closest-hit fires a secondary TraceRay
    from an above-the-triangle origin. First-level CH sees payload=0 →
    bumps to 0x1 → calls TraceRay. Second-level CH sees payload!=0 →
    writes 0x10. Unwinds: first-level OR's in 0x100. Final payload
    0x110 (272 decimal). `RayTracingPipelineConfig.MaxTraceRecursion-
    Depth: 2` so both TraceRay calls are within budget.
  - `ray-flag-skip-closest-hit.test` — two lanes fire identical rays at
    the same triangle. Lane 0 uses RAY_FLAG_NONE so CH runs and writes
    0xBEEF. Lane 1 uses RAY_FLAG_SKIP_CLOSEST_HIT_SHADER (PSO-only —
    inline RT has no equivalent) so CH is skipped and payload keeps its
    initial 0xAAAA. Output [0xBEEF, 0xAAAA].
  - `callable-shader.test` — two callable shaders writing distinct
    sentinels (0xAAAA / 0xBBBB). Each lane calls `CallShader(Idx, ...)`
    so the SBT callable region's per-record routing is exercised
    independently of the hit-group / miss routing already covered in
    llvm#1277. Callable shaders themselves don't exist in inline RT.

    This test stays `# XFAIL: Clang, Vulkan` because DXC's `-spirv`
    backend lists every callable's IncomingCallableDataKHR variable in
    every callable entry point's interface, violating VUID-Standalone-
    Spirv-IncomingCallableDataKHR-04706 and getting rejected by vk-
    CreateShaderModule. The framework's Vulkan SBT / callable path is
    correct — running `spirv-opt --remove-unused-interface-variables`
    on the DXC output cleans the SPIR-V and the test passes natively.
    Track upstream.

All three pass on Metal once the bring-up PR ahead of this commit sets
the raygen pipeline's `setMaxCallStackDepth(MaxTraceRecursionDepth)` so
nested TraceRay actually unwinds (with the default of 1, the second
TraceRay was silently dropped and the recursion test produced 0x1
instead of 0x110).

Locally verified on the user's Linux box:
  - Vulkan via the native offloader: recursion + skip-CH PASS;
    callable PASSes after spirv-opt cleanup (XFAILs from raw DXC SPIR-V
    as documented above).
  - D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe: all
    three PASS.
And on macOS 15 / metal-irconverter 3.1.1 via the native offloader:
  - All three PASS (recursion + skip-CH + callable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants