Skip to content

Commit 2fd3384

Browse files
MarijnS95claude
andcommitted
Bind acceleration structures and enable the InlineRT tests
Wire up acceleration-structure descriptor binding end-to-end across all three backends so shaders can actually consume the TLAS that buildPipelineAccelerationStructures() produced — completing the stack and promoting the three InlineRT tests from XFAIL to passing. Per-resource AS handling lands in a new per-backend createAS() (paired with createSRV() / createUAV() / createCBV()): a pure single-create that queries TLAS sizes via Dev.getTLASBuildSizes() and allocates the handle via Dev.createTLAS(), returning the unique_ptr to the caller. No InvocationState or Pipeline access — the multi-create (createBuffers() / createResources()) records the handle in InvocationState::TLASes (a StringMap keyed by TLASDesc::Name) and wires a non-owning AS pointer into the per-resource bundle the binding loop reads. The shared AS-build helper picks up that map and walks P.AccelStructs.TLAS to pair each YAML descriptor with its pre-allocated handle by name (TLASes without a map entry are skipped, i.e. declared but unbound). BLAS handles are still allocated by the helper itself since BLASes aren't user-bindable. executeProgram() in each backend now runs as: createBuffers / createResources (createAS() allocates TLAS handles) open encoder → buildPipelineAccelerationStructures() → end Vulkan: createDescriptorPool() counts AS descriptors in a separate scalar (the KHR enum value 1000150000 doesn't fit in the indexed array used for the core types) and emits one VkDescriptorPoolSize for them. createDescriptorSets() reads the resolved VulkanAccelerationStructure handle from ResourceRef.AS (populated by createResources()) and writes it through a VkWriteDescriptorSetAccelerationStructureKHR chained on the descriptor write's pNext. The dispatch's pre-barrier dst access now includes VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR so the prior AS-build's writes are made visible to the shader's RayQuery reads. Device creation also enables VK_KHR_ray_query when supported so the RayQuery shader instructions actually function. copyResourceDataToDevice() short- circuits AS bundles (no host buffer to barrier) via a new ResourceBundle::isAccelerationStructure() predicate. DX12: writes a D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE SRV with the AS GPU virtual address as Location into the heap slot that createBuffers() reserved (CreateShaderResourceView() with a null resource — the AS data lives in the buffer pointed to by Location). Metal: the Metal shader converter doesn't bind the AS directly; the shader reads a buffer containing an IRRaytracingAccelerationStructure- GPUHeader that holds the AS's gpuResourceID plus a pointer to an instance-contributions array. createBuffers() allocates and fills both buffers per AS-descriptor entry, then points the descriptor at the header buffer's GPU address. The TLAS itself is built with the UserID instance-descriptor variant so HLSL CommittedInstanceID() returns the YAML-specified per-instance ID instead of the array index. The three InlineRT tests now actually exercise the AS end-to-end: TraceRayInline() issues a RayQuery against `Scene` and writes a hit-dependent value into `Output` (the instance ID for multi-instance, 1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang` remains. The test shaders gain explicit `[[vk::binding]]` annotations since their `t0`/`u0` registers would otherwise collide under the default dxc HLSL→SPIR-V mapping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4c91450 commit 2fd3384

8 files changed

Lines changed: 360 additions & 103 deletions

File tree

include/API/Device.h

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "Support/Pipeline.h"
2626

2727
#include "llvm/ADT/ArrayRef.h"
28+
#include "llvm/ADT/StringMap.h"
2829
#include "llvm/ADT/StringRef.h"
2930
#include "llvm/ADT/iterator_range.h"
3031
#include "llvm/Support/Error.h"
@@ -323,23 +324,21 @@ createBufferWithData(Device &Dev, std::string Name,
323324
size_t SizeInBytes, ComputeEncoder *Encoder,
324325
std::unique_ptr<offloadtest::Buffer> *OutUploadBuffer);
325326

326-
// Builds all BLAS / TLAS objects defined in `P.AccelStructs` using the
327-
// supplied compute encoder. Uploads each BLAS's vertex/index data, queries
328-
// sizes via `Dev.getBLASBuildSizes` / `Dev.getTLASBuildSizes`, allocates
329-
// the handles via `Dev.createBLAS` / `Dev.createTLAS`, and records the GPU
330-
// builds via two `Enc.batchBuildAS` calls (BLAS batch then TLAS batch — so
331-
// the AS-build-write barrier between BLAS and TLAS is automatic).
332-
//
333-
// Built AS objects are pushed to `OutAS` (in declaration order: BLASes first,
334-
// then TLASes). Vertex/index buffers used as build inputs are pushed to
335-
// `OutInputBuffers`; both must outlive command-buffer submission.
327+
// TLAS handles come in pre-allocated because the caller's binding loop
328+
// stamps the AS pointer into descriptor bundles before this helper runs;
329+
// BLAS handles are allocated inline since BLASes aren't user-bindable.
330+
// BLAS and TLAS builds get separate `Enc.batchBuildAS()` calls so the
331+
// implicit BLAS-write → TLAS-read barrier sits between them. Outputs
332+
// (`OutBLAS`, `OutInputBuffers`) must outlive command-buffer submission.
336333
//
337334
// TODO: `Pipeline` belongs to the test framework, not the rendering backend
338335
// API. This helper lives here only because `executeProgram` is still on
339336
// `Device` — once that moves out, this helper should follow.
340337
llvm::Error buildPipelineAccelerationStructures(
341338
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
342-
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutAS,
339+
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
340+
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
341+
&PreallocatedTLASes,
343342
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers);
344343

345344
} // namespace offloadtest

lib/API/DX/Device.cpp

Lines changed: 72 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1078,21 +1078,26 @@ class DXDevice : public offloadtest::Device {
10781078
ComPtr<ID3D12Resource> Buffer;
10791079
std::unique_ptr<offloadtest::Buffer> Readback;
10801080
ComPtr<ID3D12Heap> Heap;
1081-
ResourceSet(ComPtr<ID3D12Resource> Upload, ComPtr<ID3D12Resource> Buffer,
1082-
std::unique_ptr<offloadtest::Buffer> Readback,
1083-
ComPtr<ID3D12Heap> Heap = nullptr)
1081+
// AS-only; mutually exclusive with the buffer/heap fields above.
1082+
DXAccelerationStructure *AS = nullptr;
1083+
explicit ResourceSet(ComPtr<ID3D12Resource> Upload,
1084+
ComPtr<ID3D12Resource> Buffer,
1085+
std::unique_ptr<offloadtest::Buffer> Readback,
1086+
ComPtr<ID3D12Heap> Heap = nullptr)
10841087
: Upload(Upload), Buffer(Buffer), Readback(std::move(Readback)),
10851088
Heap(Heap) {}
1089+
explicit ResourceSet(DXAccelerationStructure *AS) : AS(AS) {}
10861090
ResourceSet(const ResourceSet &) = delete;
10871091
ResourceSet(ResourceSet &&A)
10881092
: Upload(A.Upload), Buffer(A.Buffer), Readback(std::move(A.Readback)),
1089-
Heap(A.Heap) {}
1093+
Heap(A.Heap), AS(A.AS) {}
10901094
ResourceSet &operator=(const ResourceSet &) = delete;
10911095
ResourceSet &operator=(ResourceSet &&A) {
10921096
Upload = A.Upload;
10931097
Buffer = A.Buffer;
10941098
Readback = std::move(A.Readback);
10951099
Heap = A.Heap;
1100+
AS = A.AS;
10961101
return *this;
10971102
}
10981103
};
@@ -1121,9 +1126,11 @@ class DXDevice : public offloadtest::Device {
11211126
llvm::SmallVector<DescriptorTable> DescTables;
11221127
llvm::SmallVector<ResourcePair> RootResources;
11231128

1124-
// Built acceleration structures, kept alive for the pipeline lifetime.
1129+
// Parallel-indexed to `P.AccelStructs.BLAS`.
11251130
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>
1126-
AccelStructs;
1131+
BLASes;
1132+
// Keyed by `TLASDesc::Name`.
1133+
llvm::StringMap<std::unique_ptr<offloadtest::AccelerationStructure>> TLASes;
11271134
// Vertex/index buffers consumed during AS builds; must outlive submission.
11281135
llvm::SmallVector<std::unique_ptr<offloadtest::Buffer>> ASInputBuffers;
11291136
};
@@ -2007,21 +2014,40 @@ class DXDevice : public offloadtest::Device {
20072014
// returns the next available HeapIdx
20082015
uint32_t bindSRV(Resource &R, InvocationState &IS, uint32_t HeapIdx,
20092016
const ResourceBundle &ResBundle) {
2010-
const uint32_t EltSize = R.getElementSize();
2011-
const uint32_t NumElts = R.size() / EltSize;
2012-
const D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = getSRVDescription(R);
20132017
const uint32_t DescHandleIncSize = Device->GetDescriptorHandleIncrementSize(
20142018
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
20152019
const D3D12_CPU_DESCRIPTOR_HANDLE SRVHandleHeapStart =
20162020
IS.DescHeap->GetCPUDescriptorHandleForHeapStart();
20172021

2018-
for (const ResourceSet &RS : ResBundle) {
2019-
llvm::outs() << "SRV: HeapIdx = " << HeapIdx << " EltSize = " << EltSize
2020-
<< " NumElts = " << NumElts << "\n";
2021-
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2022-
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2023-
Device->CreateShaderResourceView(RS.Buffer.Get(), &SRVDesc, SRVHandle);
2024-
HeapIdx++;
2022+
if (R.isAccelerationStructure()) {
2023+
// AS SRVs are created with a null resource; the AS lives in the
2024+
// buffer referenced by Location.
2025+
D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = {};
2026+
SRVDesc.Format = DXGI_FORMAT_UNKNOWN;
2027+
SRVDesc.ViewDimension =
2028+
D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE;
2029+
SRVDesc.Shader4ComponentMapping =
2030+
D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
2031+
for (const ResourceSet &RS : ResBundle) {
2032+
SRVDesc.RaytracingAccelerationStructure.Location =
2033+
RS.AS->getGPUVirtualAddress();
2034+
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2035+
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2036+
Device->CreateShaderResourceView(nullptr, &SRVDesc, SRVHandle);
2037+
HeapIdx++;
2038+
}
2039+
} else {
2040+
const uint32_t EltSize = R.getElementSize();
2041+
const uint32_t NumElts = R.size() / EltSize;
2042+
const D3D12_SHADER_RESOURCE_VIEW_DESC SRVDesc = getSRVDescription(R);
2043+
for (const ResourceSet &RS : ResBundle) {
2044+
llvm::outs() << "SRV: HeapIdx = " << HeapIdx << " EltSize = " << EltSize
2045+
<< " NumElts = " << NumElts << "\n";
2046+
D3D12_CPU_DESCRIPTOR_HANDLE SRVHandle = SRVHandleHeapStart;
2047+
SRVHandle.ptr += HeapIdx * DescHandleIncSize;
2048+
Device->CreateShaderResourceView(RS.Buffer.Get(), &SRVDesc, SRVHandle);
2049+
HeapIdx++;
2050+
}
20252051
}
20262052
return HeapIdx;
20272053
}
@@ -2228,11 +2254,35 @@ class DXDevice : public offloadtest::Device {
22282254
return HeapIdx;
22292255
}
22302256

2257+
llvm::Expected<std::unique_ptr<AccelerationStructure>> createAS(Resource &R) {
2258+
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
2259+
assert(R.getArraySize() == 1 && "AS arrays not yet supported");
2260+
auto SizesOrErr =
2261+
getTLASBuildSizes(static_cast<uint32_t>(R.TLASPtr->Instances.size()));
2262+
if (!SizesOrErr)
2263+
return SizesOrErr.takeError();
2264+
return createTLAS(*SizesOrErr);
2265+
}
2266+
22312267
llvm::Error createBuffers(Pipeline &P, InvocationState &IS) {
22322268
auto CreateBuffer =
22332269
[&IS,
22342270
this](Resource &R,
22352271
llvm::SmallVectorImpl<ResourcePair> &Resources) -> llvm::Error {
2272+
if (R.isAccelerationStructure()) {
2273+
auto ASOrErr = createAS(R);
2274+
if (!ASOrErr)
2275+
return ASOrErr.takeError();
2276+
ResourceBundle Bundle;
2277+
Bundle.emplace_back(
2278+
llvm::cast<DXAccelerationStructure>(ASOrErr->get()));
2279+
auto Inserted =
2280+
IS.TLASes.try_emplace(R.TLASPtr->Name, std::move(*ASOrErr));
2281+
assert(Inserted.second && "TLAS bound to multiple resources NYI");
2282+
(void)Inserted;
2283+
Resources.push_back(std::make_pair(&R, std::move(Bundle)));
2284+
return llvm::Error::success();
2285+
}
22362286
switch (getDescriptorKind(R.Kind)) {
22372287
case DescriptorKind::SRV: {
22382288
auto ExRes = createSRV(R, IS);
@@ -2723,20 +2773,21 @@ class DXDevice : public offloadtest::Device {
27232773
State.CB->Dev = this;
27242774
llvm::outs() << "Command buffer created.\n";
27252775

2776+
if (auto Err = createBuffers(P, State))
2777+
return Err;
2778+
llvm::outs() << "Buffers created.\n";
2779+
27262780
if (!P.AccelStructs.BLAS.empty() || !P.AccelStructs.TLAS.empty()) {
27272781
auto EncOrErr = State.CB->createComputeEncoder();
27282782
if (!EncOrErr)
27292783
return EncOrErr.takeError();
27302784
if (auto Err = offloadtest::buildPipelineAccelerationStructures(
2731-
*this, **EncOrErr, P, State.AccelStructs, State.ASInputBuffers))
2785+
*this, **EncOrErr, P, State.BLASes, State.TLASes,
2786+
State.ASInputBuffers))
27322787
return Err;
27332788
(*EncOrErr)->endEncoding();
27342789
}
27352790

2736-
if (auto Err = createBuffers(P, State))
2737-
return Err;
2738-
llvm::outs() << "Buffers created.\n";
2739-
27402791
BindingsDesc BndDesc = {};
27412792
for (auto &S : P.Sets) {
27422793
DescriptorSetLayoutDesc Layout;

lib/API/Device.cpp

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,9 @@ offloadtest::createRenderTargetFromCPUBuffer(Device &Dev,
9797

9898
llvm::Error offloadtest::buildPipelineAccelerationStructures(
9999
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
100-
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutAS,
100+
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
101+
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
102+
&PreallocatedTLASes,
101103
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers) {
102104
if (P.AccelStructs.BLAS.empty() && P.AccelStructs.TLAS.empty())
103105
return llvm::Error::success();
@@ -113,7 +115,7 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
113115
// them through pointers stored in ASBuildItem.
114116
llvm::SmallVector<BLASBuildRequest> BLASRequests;
115117
BLASRequests.reserve(P.AccelStructs.BLAS.size());
116-
llvm::StringMap<size_t> BLASIndex;
118+
llvm::StringMap<AccelerationStructure *> BLASesByName;
117119

118120
for (const auto &BD : P.AccelStructs.BLAS) {
119121
llvm::SmallVector<TriangleGeometryDesc> Triangles;
@@ -161,8 +163,8 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
161163
Req.AS = ASOrErr->get();
162164
Req.Geometry = std::move(Triangles);
163165

164-
BLASIndex[BD.Name] = OutAS.size();
165-
OutAS.push_back(std::move(*ASOrErr));
166+
BLASesByName[BD.Name] = ASOrErr->get();
167+
OutBLAS.push_back(std::move(*ASOrErr));
166168
BLASRequests.push_back(std::move(Req));
167169
}
168170

@@ -174,16 +176,20 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
174176
if (auto Err = Enc.batchBuildAS(BLASBatch))
175177
return Err;
176178

177-
// TLAS pass — references BLASes built in the previous batch.
179+
// Separate `batchBuildAS()` from the BLAS batch so the BLAS-write →
180+
// TLAS-read barrier between them is implicit.
178181
llvm::SmallVector<TLASBuildRequest> TLASRequests;
179-
TLASRequests.reserve(P.AccelStructs.TLAS.size());
180-
181-
for (const auto &TD : P.AccelStructs.TLAS) {
182+
TLASRequests.reserve(PreallocatedTLASes.size());
183+
for (const TLASDesc &TD : P.AccelStructs.TLAS) {
184+
auto ASIt = PreallocatedTLASes.find(TD.Name);
185+
if (ASIt == PreallocatedTLASes.end())
186+
continue; // TLAS declared but not bound to any resource.
182187
TLASBuildRequest Req;
188+
Req.AS = ASIt->second.get();
183189
Req.Instances.reserve(TD.Instances.size());
184190
for (const auto &I : TD.Instances) {
185-
auto It = BLASIndex.find(I.BLAS);
186-
if (It == BLASIndex.end())
191+
auto It = BLASesByName.find(I.BLAS);
192+
if (It == BLASesByName.end())
187193
return llvm::createStringError(std::errc::invalid_argument,
188194
"TLAS '%s' references unknown BLAS '%s'",
189195
TD.Name.c_str(), I.BLAS.c_str());
@@ -194,21 +200,11 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
194200
memcpy(Inst.Transform, I.Transform, sizeof(I.Transform));
195201
Inst.InstanceID = I.InstanceID;
196202
Inst.InstanceMask = I.InstanceMask;
197-
Inst.BLAS = OutAS[It->second].get();
203+
Inst.BLAS = It->second;
198204
Req.Instances.push_back(Inst);
199205
}
200206
if (auto Err = validateTLASBuildRequest(Req))
201207
return Err;
202-
auto SizesOrErr =
203-
Dev.getTLASBuildSizes(static_cast<uint32_t>(Req.Instances.size()));
204-
if (!SizesOrErr)
205-
return SizesOrErr.takeError();
206-
auto ASOrErr = Dev.createTLAS(*SizesOrErr);
207-
if (!ASOrErr)
208-
return ASOrErr.takeError();
209-
210-
Req.AS = ASOrErr->get();
211-
OutAS.push_back(std::move(*ASOrErr));
212208
TLASRequests.push_back(std::move(Req));
213209
}
214210

0 commit comments

Comments
 (0)