Skip to content

Commit 8cb2da5

Browse files
MarijnS95claude
andauthored
Bind acceleration structures and enable the InlineRT tests (#1245)
Closes #1158 🥳 ## Summary Wire up acceleration-structure descriptor binding end-to-end across all three backends so shaders can actually consume the TLAS that `buildPipelineAccelerationStructures()` produced — completing the stack and promoting the three InlineRT tests from XFAIL to passing. Per-resource AS handling lands in a new per-backend `createAS()` (paired with `createSRV()` / `createUAV()` / `createCBV()`): a pure single-create that queries TLAS sizes via `Dev.getTLASBuildSizes()` and allocates the handle via `Dev.createTLAS()`, returning the `unique_ptr` to the caller. No `InvocationState` or `Pipeline` access — the multi-create (`createBuffers()` / `createResources()`) records the handle in `InvocationState::TLASes` (a `StringMap` keyed by `TLASDesc::Name`) and wires a non-owning AS pointer into the per-resource bundle the binding loop reads. The shared AS-build helper picks up that map and walks `P.AccelStructs.TLAS` to pair each YAML descriptor with its pre-allocated handle by name (TLASes without a map entry are skipped, i.e. declared but unbound). BLAS handles are still allocated by the helper itself since BLASes aren't user-bindable. `executeProgram()` in each backend now runs as: - `createBuffers` / `createResources` (`createAS()` allocates TLAS handles) - open encoder → `buildPipelineAccelerationStructures()` → end - **Vulkan**: `createDescriptorPool()` counts AS descriptors in a separate scalar (the KHR enum value `1000150000` doesn't fit in the indexed array used for the core types) and emits one `VkDescriptorPoolSize` for them. `createDescriptorSets()` reads the resolved `VulkanAccelerationStructure` handle from `ResourceRef.AS` (populated by `createResources()`) and writes it through a `VkWriteDescriptorSetAccelerationStructureKHR` chained on the descriptor write's `pNext`. The dispatch's pre-barrier dst access now includes `VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR` so the prior AS-build's writes are visible to the shader's RayQuery reads. Device creation enables `VK_KHR_ray_query` using the same chain-pre-query + error-on-flag-mismatch pattern that #1232 set up for the AS / BDA extensions — without `VK_KHR_ray_query` enabled the shader's `OpRayQueryProceedKHR` instructions silently no-op and `Output` reads back zero. `copyResourceDataToDevice()` short-circuits AS bundles via a new `ResourceBundle::isAccelerationStructure()` predicate (no host buffer to barrier). - **DX12**: writes a `D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE` SRV with the AS GPU virtual address as `Location` into the heap slot that `createBuffers()` reserved (`CreateShaderResourceView()` with a null resource — the AS data lives in the buffer pointed to by `Location`). - **Metal**: the Metal shader converter doesn't bind the AS directly; the shader reads a buffer containing an `IRRaytracingAccelerationStructureGPUHeader` that holds the AS's `gpuResourceID` plus a pointer to an instance-contributions array. `createBuffers()` allocates and fills both buffers per AS-descriptor entry, then points the descriptor at the header buffer's GPU address. The TLAS itself is built with the `UserID` instance-descriptor variant so HLSL `CommittedInstanceID()` returns the YAML-specified per-instance ID instead of the array index. The three InlineRT tests now actually exercise the AS end-to-end: `TraceRayInline()` issues a RayQuery against `Scene` and writes a hit-dependent value into `Output` (the instance ID for `multi-instance`, 1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang` remains. The test shaders also gain explicit `[[vk::binding]]` annotations because dxc's default HLSL→SPIR-V binding mapping collides `Scene`'s `t0` with `Output`'s `u0` at binding 0, which VVL flags as a descriptor type mismatch. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [ ] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f788760 commit 8cb2da5

8 files changed

Lines changed: 360 additions & 103 deletions

File tree

include/API/Device.h

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "Support/Pipeline.h"
2626

2727
#include "llvm/ADT/ArrayRef.h"
28+
#include "llvm/ADT/StringMap.h"
2829
#include "llvm/ADT/StringRef.h"
2930
#include "llvm/ADT/iterator_range.h"
3031
#include "llvm/Support/Error.h"
@@ -323,23 +324,21 @@ createBufferWithData(Device &Dev, std::string Name,
323324
size_t SizeInBytes, ComputeEncoder *Encoder,
324325
std::unique_ptr<offloadtest::Buffer> *OutUploadBuffer);
325326

326-
// Builds all BLAS / TLAS objects defined in `P.AccelStructs` using the
327-
// supplied compute encoder. Uploads each BLAS's vertex/index data, queries
328-
// sizes via `Dev.getBLASBuildSizes` / `Dev.getTLASBuildSizes`, allocates
329-
// the handles via `Dev.createBLAS` / `Dev.createTLAS`, and records the GPU
330-
// builds via two `Enc.batchBuildAS` calls (BLAS batch then TLAS batch — so
331-
// the AS-build-write barrier between BLAS and TLAS is automatic).
332-
//
333-
// Built AS objects are pushed to `OutAS` (in declaration order: BLASes first,
334-
// then TLASes). Vertex/index buffers used as build inputs are pushed to
335-
// `OutInputBuffers`; both must outlive command-buffer submission.
327+
// TLAS handles come in pre-allocated because the caller's binding loop
328+
// stamps the AS pointer into descriptor bundles before this helper runs;
329+
// BLAS handles are allocated inline since BLASes aren't user-bindable.
330+
// BLAS and TLAS builds get separate `Enc.batchBuildAS()` calls so the
331+
// implicit BLAS-write → TLAS-read barrier sits between them. Outputs
332+
// (`OutBLAS`, `OutInputBuffers`) must outlive command-buffer submission.
336333
//
337334
// TODO: `Pipeline` belongs to the test framework, not the rendering backend
338335
// API. This helper lives here only because `executeProgram` is still on
339336
// `Device` — once that moves out, this helper should follow.
340337
llvm::Error buildPipelineAccelerationStructures(
341338
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
342-
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutAS,
339+
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
340+
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
341+
&PreallocatedTLASes,
343342
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers);
344343

345344
} // namespace offloadtest

lib/API/DX/Device.cpp

Lines changed: 72 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1078,21 +1078,26 @@ class DXDevice : public offloadtest::Device {
10781078
ComPtr<ID3D12Resource> Buffer;
10791079
std::unique_ptr<offloadtest::Buffer> Readback;
10801080
ComPtr<ID3D12Heap> Heap;
1081-
ResourceSet(ComPtr<ID3D12Resource> Upload, ComPtr<ID3D12Resource> Buffer,
1082-
std::unique_ptr<offloadtest::Buffer> Readback,
1083-
ComPtr<ID3D12Heap> Heap = nullptr)
1081+
// AS-only; mutually exclusive with the buffer/heap fields above.
1082+
DXAccelerationStructure *AS = nullptr;
1083+
explicit ResourceSet(ComPtr<ID3D12Resource> Upload,
1084+
ComPtr<ID3D12Resource> Buffer,
1085+
std::unique_ptr<offloadtest::Buffer> Readback,
1086+
ComPtr<ID3D12Heap> Heap = nullptr)
10841087
: Upload(Upload), Buffer(Buffer), Readback(std::move(Readback)),
10851088
Heap(Heap) {}
1089+
explicit ResourceSet(DXAccelerationStructure *AS) : AS(AS) {}
10861090
ResourceSet(const ResourceSet &) = delete;
10871091
ResourceSet(ResourceSet &&A)
10881092
: Upload(A.Upload), Buffer(A.Buffer), Readback(std::move(A.Readback)),
1089-
Heap(A.Heap) {}
1093+
Heap(A.Heap), AS(A.AS) {}
10901094
ResourceSet &operator=(const ResourceSet &) = delete;
10911095
ResourceSet &operator=(ResourceSet &&A) {
10921096
Upload = A.Upload;
10931097
Buffer = A.Buffer;
10941098
Readback = std::move(A.Readback);
10951099
Heap = A.Heap;
1100+
AS = A.AS;
10961101
return *this;
10971102
}
10981103
};
@@ -1121,9 +1126,11 @@ class DXDevice : public offloadtest::Device {
11211126
llvm::SmallVector<DescriptorTable> DescTables;
11221127
llvm::SmallVector<ResourcePair> RootResources;
11231128

1124-
// Built acceleration structures, kept alive for the pipeline lifetime.
1129+
// Parallel-indexed to `P.AccelStructs.BLAS`.
11251130
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>
1126-
AccelStructs;
1131+
BLASes;
1132+
// Keyed by `TLASDesc::Name`.
1133+
llvm::StringMap<std::unique_ptr<offloadtest::AccelerationStructure>> TLASes;
11271134
// Vertex/index buffers consumed during AS builds; must outlive submission.
11281135
llvm::SmallVector<std::unique_ptr<offloadtest::Buffer>> ASInputBuffers;
11291136
};
@@ -2007,21 +2014,40 @@ class DXDevice : public offloadtest::Device {
20072014
// returns the next available HeapIdx
20082015
uint32_t bindSRV(Resource &R, InvocationState &IS, uint32_t HeapIdx,
20092016
const ResourceBundle &ResBundle) {
2010-
const uint32_t EltSize = R.getElementSize();
2011-
const uint32_t NumElts = R.size() / EltSize;
2012-
const D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = getSRVDescription(R);
20132017
const uint32_t DescHandleIncSize = Device->GetDescriptorHandleIncrementSize(
20142018
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
20152019
const D3D12_CPU_DESCRIPTOR_HANDLE SRVHandleHeapStart =
20162020
IS.DescHeap->GetCPUDescriptorHandleForHeapStart();
20172021

2018-
for (const ResourceSet &RS : ResBundle) {
2019-
llvm::outs() << "SRV: HeapIdx = " << HeapIdx << " EltSize = " << EltSize
2020-
<< " NumElts = " << NumElts << "\n";
2021-
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2022-
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2023-
Device->CreateShaderResourceView(RS.Buffer.Get(), &SRVDesc, SRVHandle);
2024-
HeapIdx++;
2022+
if (R.isAccelerationStructure()) {
2023+
// AS SRVs are created with a null resource; the AS lives in the
2024+
// buffer referenced by Location.
2025+
D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = {};
2026+
SRVDesc.Format = DXGI_FORMAT_UNKNOWN;
2027+
SRVDesc.ViewDimension =
2028+
D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE;
2029+
SRVDesc.Shader4ComponentMapping =
2030+
D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
2031+
for (const ResourceSet &RS : ResBundle) {
2032+
SRVDesc.RaytracingAccelerationStructure.Location =
2033+
RS.AS->getGPUVirtualAddress();
2034+
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2035+
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2036+
Device->CreateShaderResourceView(nullptr, &SRVDesc, SRVHandle);
2037+
HeapIdx++;
2038+
}
2039+
} else {
2040+
const uint32_t EltSize = R.getElementSize();
2041+
const uint32_t NumElts = R.size() / EltSize;
2042+
const D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = getSRVDescription(R);
2043+
for (const ResourceSet &RS : ResBundle) {
2044+
llvm::outs() << "SRV: HeapIdx = " << HeapIdx << " EltSize = " << EltSize
2045+
<< " NumElts = " << NumElts << "\n";
2046+
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2047+
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2048+
Device->CreateShaderResourceView(RS.Buffer.Get(), &SRVDesc, SRVHandle);
2049+
HeapIdx++;
2050+
}
20252051
}
20262052
return HeapIdx;
20272053
}
@@ -2228,11 +2254,35 @@ class DXDevice : public offloadtest::Device {
22282254
return HeapIdx;
22292255
}
22302256

2257+
llvm::Expected<std::unique_ptr<AccelerationStructure>> createAS(Resource &R) {
2258+
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
2259+
assert(R.getArraySize() == 1 && "AS arrays not yet supported");
2260+
auto SizesOrErr =
2261+
getTLASBuildSizes(static_cast<uint32_t>(R.TLASPtr->Instances.size()));
2262+
if (!SizesOrErr)
2263+
return SizesOrErr.takeError();
2264+
return createTLAS(*SizesOrErr);
2265+
}
2266+
22312267
llvm::Error createBuffers(Pipeline &P, InvocationState &IS) {
22322268
auto CreateBuffer =
22332269
[&IS,
22342270
this](Resource &R,
22352271
llvm::SmallVectorImpl<ResourcePair> &Resources) -> llvm::Error {
2272+
if (R.isAccelerationStructure()) {
2273+
auto ASOrErr = createAS(R);
2274+
if (!ASOrErr)
2275+
return ASOrErr.takeError();
2276+
ResourceBundle Bundle;
2277+
Bundle.emplace_back(
2278+
llvm::cast<DXAccelerationStructure>(ASOrErr->get()));
2279+
auto Inserted =
2280+
IS.TLASes.try_emplace(R.TLASPtr->Name, std::move(*ASOrErr));
2281+
assert(Inserted.second && "TLAS bound to multiple resources NYI");
2282+
(void)Inserted;
2283+
Resources.push_back(std::make_pair(&R, std::move(Bundle)));
2284+
return llvm::Error::success();
2285+
}
22362286
switch (getDescriptorKind(R.Kind)) {
22372287
case DescriptorKind::SRV: {
22382288
auto ExRes = createSRV(R, IS);
@@ -2723,20 +2773,21 @@ class DXDevice : public offloadtest::Device {
27232773
State.CB->Dev = this;
27242774
llvm::outs() << "Command buffer created.\n";
27252775

2776+
if (auto Err = createBuffers(P, State))
2777+
return Err;
2778+
llvm::outs() << "Buffers created.\n";
2779+
27262780
if (!P.AccelStructs.BLAS.empty() || !P.AccelStructs.TLAS.empty()) {
27272781
auto EncOrErr = State.CB->createComputeEncoder();
27282782
if (!EncOrErr)
27292783
return EncOrErr.takeError();
27302784
if (auto Err = offloadtest::buildPipelineAccelerationStructures(
2731-
*this, **EncOrErr, P, State.AccelStructs, State.ASInputBuffers))
2785+
*this, **EncOrErr, P, State.BLASes, State.TLASes,
2786+
State.ASInputBuffers))
27322787
return Err;
27332788
(*EncOrErr)->endEncoding();
27342789
}
27352790

2736-
if (auto Err = createBuffers(P, State))
2737-
return Err;
2738-
llvm::outs() << "Buffers created.\n";
2739-
27402791
BindingsDesc BndDesc = {};
27412792
for (auto &S : P.Sets) {
27422793
DescriptorSetLayoutDesc Layout;

lib/API/Device.cpp

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,9 @@ offloadtest::createRenderTargetFromCPUBuffer(Device &Dev,
9797

9898
llvm::Error offloadtest::buildPipelineAccelerationStructures(
9999
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
100-
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutAS,
100+
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
101+
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
102+
&PreallocatedTLASes,
101103
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers) {
102104
if (P.AccelStructs.BLAS.empty() && P.AccelStructs.TLAS.empty())
103105
return llvm::Error::success();
@@ -113,7 +115,7 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
113115
// them through pointers stored in ASBuildItem.
114116
llvm::SmallVector<BLASBuildRequest> BLASRequests;
115117
BLASRequests.reserve(P.AccelStructs.BLAS.size());
116-
llvm::StringMap<size_t> BLASIndex;
118+
llvm::StringMap<AccelerationStructure *> BLASesByName;
117119

118120
for (const auto &BD : P.AccelStructs.BLAS) {
119121
llvm::SmallVector<TriangleGeometryDesc> Triangles;
@@ -161,8 +163,8 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
161163
Req.AS = ASOrErr->get();
162164
Req.Geometry = std::move(Triangles);
163165

164-
BLASIndex[BD.Name] = OutAS.size();
165-
OutAS.push_back(std::move(*ASOrErr));
166+
BLASesByName[BD.Name] = ASOrErr->get();
167+
OutBLAS.push_back(std::move(*ASOrErr));
166168
BLASRequests.push_back(std::move(Req));
167169
}
168170

@@ -174,16 +176,20 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
174176
if (auto Err = Enc.batchBuildAS(BLASBatch))
175177
return Err;
176178

177-
// TLAS pass — references BLASes built in the previous batch.
179+
// Separate `batchBuildAS()` from the BLAS batch so the BLAS-write →
180+
// TLAS-read barrier between them is implicit.
178181
llvm::SmallVector<TLASBuildRequest> TLASRequests;
179-
TLASRequests.reserve(P.AccelStructs.TLAS.size());
180-
181-
for (const auto &TD : P.AccelStructs.TLAS) {
182+
TLASRequests.reserve(PreallocatedTLASes.size());
183+
for (const TLASDesc &TD : P.AccelStructs.TLAS) {
184+
auto ASIt = PreallocatedTLASes.find(TD.Name);
185+
if (ASIt == PreallocatedTLASes.end())
186+
continue; // TLAS declared but not bound to any resource.
182187
TLASBuildRequest Req;
188+
Req.AS = ASIt->second.get();
183189
Req.Instances.reserve(TD.Instances.size());
184190
for (const auto &I : TD.Instances) {
185-
auto It = BLASIndex.find(I.BLAS);
186-
if (It == BLASIndex.end())
191+
auto It = BLASesByName.find(I.BLAS);
192+
if (It == BLASesByName.end())
187193
return llvm::createStringError(std::errc::invalid_argument,
188194
"TLAS '%s' references unknown BLAS '%s'",
189195
TD.Name.c_str(), I.BLAS.c_str());
@@ -194,21 +200,11 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
194200
memcpy(Inst.Transform, I.Transform, sizeof(I.Transform));
195201
Inst.InstanceID = I.InstanceID;
196202
Inst.InstanceMask = I.InstanceMask;
197-
Inst.BLAS = OutAS[It->second].get();
203+
Inst.BLAS = It->second;
198204
Req.Instances.push_back(Inst);
199205
}
200206
if (auto Err = validateTLASBuildRequest(Req))
201207
return Err;
202-
auto SizesOrErr =
203-
Dev.getTLASBuildSizes(static_cast<uint32_t>(Req.Instances.size()));
204-
if (!SizesOrErr)
205-
return SizesOrErr.takeError();
206-
auto ASOrErr = Dev.createTLAS(*SizesOrErr);
207-
if (!ASOrErr)
208-
return ASOrErr.takeError();
209-
210-
Req.AS = ASOrErr->get();
211-
OutAS.push_back(std::move(*ASOrErr));
212208
TLASRequests.push_back(std::move(Req));
213209
}
214210

0 commit comments

Comments
 (0)