Skip to content

Commit c2ae187

Browse files
MarijnS95claude
andcommitted
Support arrays of TLAS bindings
Mirror the existing buffer/texture array pattern (`ArraySize` plus an ArraySize-driven YAML field) for top-level acceleration structures so shaders can declare `RaytracingAccelerationStructure Scenes[N]` and bind N distinct TLASes through a single resource entry. Schema (`include/Support/Pipeline.h`, `lib/Support/Pipeline.cpp`): - `TLASDesc` gains `ArraySize` (default 1) and reshapes `Instances` from `SmallVector<InstanceDesc>` to `SmallVector<SmallVector<InstanceDesc>>` — outer vector indexed by array element, inner vector lists instances for that element. - `MappingTraits<TLASDesc>` dispatches on `ArraySize` the same way `setData()` does for CPUBuffer: flat `Instances: [...]` for the scalar case, list-of-lists for arrays, with an `ActualSize != ArraySize` validation error on mismatch. - `Resource::getArraySize()` returns `TLASPtr->ArraySize` for AS resources; moved out-of-line so it can dereference the (still forward-declared at the Resource definition) `TLASDesc`. - BLAS-name resolution in the pipeline post-process descends through the extra layer of nesting. Backend plumbing (VK / DX / MTL): - `createAS()` now takes `uint32_t InstanceCount` directly (no Resource / Pipeline / InvocationState access) — pure single-create that just sizes and allocates one TLAS. - The multi-create (`createBuffers` / `createResources`) loops `TD.ArraySize` and pushes one bundle entry per element plus N handles into `InvocationState::TLASes`, which becomes `StringMap<SmallVector<unique_ptr<AccelerationStructure>>>` (one vector per `TLASDesc::Name`, sized to `ArraySize`). - `buildPipelineAccelerationStructures()` walks `P.AccelStructs.TLAS` and, for each name with a pre-allocated vector, builds one `TLASBuildRequest` per element using `TD.Instances[Elt]` and `Handles[Elt]`. - Vulkan descriptor write iterates the bundle's `ResourceRefs` to fill N entries of `pAccelerationStructures` and sets `descriptorCount = R.getArraySize()` so the descriptor set sees the full array. - DX's `bindSRV` AS branch already loops `ResBundle`; with the new multi-entry bundle it now writes N RAYTRACING_ACCELERATION_STRUCTURE SRVs into consecutive heap slots automatically. Heap sizing already uses `getDescriptorCountWithFlattenedArrays()`. - Metal's AS descriptor-binding loop now builds one `IRRaytracingAccelerationStructureGPUHeader` + instance-contributions buffer pair per array element and writes a descriptor entry per element via `IRDescriptorTableSetAccelerationStructure`. `MarkASResident` descends into the per-name vector. Test: `test/Feature/InlineRT/tlas-array.test` declares `Scenes[2]`, each TLAS carries one triangle instance with a distinct `InstanceID` (10 and 20), and the shader writes each `CommittedInstanceID()` into `Output[i]`. Gated on `acceleration-structure`, `XFAIL: Clang`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2fd3384 commit c2ae187

8 files changed

Lines changed: 305 additions & 124 deletions

File tree

include/API/Device.h

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -327,17 +327,21 @@ createBufferWithData(Device &Dev, std::string Name,
327327
// TLAS handles come in pre-allocated because the caller's binding loop
328328
// stamps the AS pointer into descriptor bundles before this helper runs;
329329
// BLAS handles are allocated inline since BLASes aren't user-bindable.
330-
// BLAS and TLAS builds get separate `Enc.batchBuildAS()` calls so the
331-
// implicit BLAS-write → TLAS-read barrier sits between them. Outputs
332-
// (`OutBLAS`, `OutInputBuffers`) must outlive command-buffer submission.
330+
// `PreallocatedTLASes` is keyed by `TLASDesc::Name`; each map value is a
331+
// vector of `TLASDesc::ArraySize` handles (one per descriptor-array
332+
// element). BLAS and TLAS builds get separate `Enc.batchBuildAS()` calls
333+
// so the implicit BLAS-write → TLAS-read barrier sits between them.
334+
// Outputs (`OutBLAS`, `OutInputBuffers`) must outlive command-buffer
335+
// submission.
333336
//
334337
// TODO: `Pipeline` belongs to the test framework, not the rendering backend
335338
// API. This helper lives here only because `executeProgram` is still on
336339
// `Device` — once that moves out, this helper should follow.
337340
llvm::Error buildPipelineAccelerationStructures(
338341
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
339342
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
340-
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
343+
const llvm::StringMap<
344+
llvm::SmallVector<std::unique_ptr<AccelerationStructure>>>
341345
&PreallocatedTLASes,
342346
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers);
343347

include/Support/Pipeline.h

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -338,11 +338,7 @@ struct Resource {
338338
return isByteAddressBuffer() ? 4 : BufferPtr->getElementSize();
339339
}
340340

341-
uint32_t getArraySize() const {
342-
if (isSampler() || isAccelerationStructure())
343-
return 1;
344-
return BufferPtr->ArraySize;
345-
}
341+
uint32_t getArraySize() const; // out-of-line: needs complete TLASDesc.
346342

347343
uint32_t size() const {
348344
assert(!isSampler() && !isAccelerationStructure() &&
@@ -519,14 +515,26 @@ struct InstanceDesc {
519515

520516
struct TLASDesc {
521517
std::string Name;
522-
llvm::SmallVector<InstanceDesc> Instances;
518+
uint32_t ArraySize = 1;
519+
// Outer vector has ArraySize entries (one per descriptor-array element);
520+
// inner vector lists the instances for that element. Mirrors
521+
// CPUBuffer::Data's ArraySize-driven layout.
522+
llvm::SmallVector<llvm::SmallVector<InstanceDesc>, 1> Instances;
523523
};
524524

525525
struct AccelerationStructureDescs {
526526
llvm::SmallVector<BLASDesc, 1> BLAS;
527527
llvm::SmallVector<TLASDesc, 1> TLAS;
528528
};
529529

530+
inline uint32_t Resource::getArraySize() const {
531+
if (isSampler())
532+
return 1;
533+
if (isAccelerationStructure())
534+
return TLASPtr->ArraySize;
535+
return BufferPtr->ArraySize;
536+
}
537+
530538
struct Pipeline {
531539
ShaderPipelineKind Kind;
532540
llvm::SmallVector<Shader> Shaders;

lib/API/DX/Device.cpp

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1129,8 +1129,11 @@ class DXDevice : public offloadtest::Device {
11291129
// Parallel-indexed to `P.AccelStructs.BLAS`.
11301130
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>
11311131
BLASes;
1132-
// Keyed by `TLASDesc::Name`.
1133-
llvm::StringMap<std::unique_ptr<offloadtest::AccelerationStructure>> TLASes;
1132+
// Keyed by `TLASDesc::Name`; each value holds `TLASDesc::ArraySize`
1133+
// handles (one per descriptor-array element).
1134+
llvm::StringMap<
1135+
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>>
1136+
TLASes;
11341137
// Vertex/index buffers consumed during AS builds; must outlive submission.
11351138
llvm::SmallVector<std::unique_ptr<offloadtest::Buffer>> ASInputBuffers;
11361139
};
@@ -2254,30 +2257,35 @@ class DXDevice : public offloadtest::Device {
22542257
return HeapIdx;
22552258
}
22562259

2257-
llvm::Expected<std::unique_ptr<AccelerationStructure>> createAS(Resource &R) {
2258-
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
2259-
assert(R.getArraySize() == 1 && "AS arrays not yet supported");
2260-
auto SizesOrErr =
2261-
getTLASBuildSizes(static_cast<uint32_t>(R.TLASPtr->Instances.size()));
2260+
llvm::Expected<std::unique_ptr<AccelerationStructure>>
2261+
createAS(uint32_t InstanceCount) {
2262+
auto SizesOrErr = getTLASBuildSizes(InstanceCount);
22622263
if (!SizesOrErr)
22632264
return SizesOrErr.takeError();
22642265
return createTLAS(*SizesOrErr);
22652266
}
22662267

22672268
llvm::Error createBuffers(Pipeline &P, InvocationState &IS) {
22682269
auto CreateBuffer =
2269-
[&IS,
2270+
[&P, &IS,
22702271
this](Resource &R,
22712272
llvm::SmallVectorImpl<ResourcePair> &Resources) -> llvm::Error {
22722273
if (R.isAccelerationStructure()) {
2273-
auto ASOrErr = createAS(R);
2274-
if (!ASOrErr)
2275-
return ASOrErr.takeError();
2274+
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
2275+
const TLASDesc &TD = *R.TLASPtr;
22762276
ResourceBundle Bundle;
2277-
Bundle.emplace_back(
2278-
llvm::cast<DXAccelerationStructure>(ASOrErr->get()));
2279-
auto Inserted =
2280-
IS.TLASes.try_emplace(R.TLASPtr->Name, std::move(*ASOrErr));
2277+
llvm::SmallVector<std::unique_ptr<AccelerationStructure>> Handles;
2278+
Handles.reserve(TD.ArraySize);
2279+
for (uint32_t Elt = 0; Elt < TD.ArraySize; ++Elt) {
2280+
auto ASOrErr =
2281+
createAS(static_cast<uint32_t>(TD.Instances[Elt].size()));
2282+
if (!ASOrErr)
2283+
return ASOrErr.takeError();
2284+
Bundle.emplace_back(
2285+
llvm::cast<DXAccelerationStructure>(ASOrErr->get()));
2286+
Handles.push_back(std::move(*ASOrErr));
2287+
}
2288+
auto Inserted = IS.TLASes.try_emplace(TD.Name, std::move(Handles));
22812289
assert(Inserted.second && "TLAS bound to multiple resources NYI");
22822290
(void)Inserted;
22832291
Resources.push_back(std::make_pair(&R, std::move(Bundle)));

lib/API/Device.cpp

Lines changed: 32 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ offloadtest::createRenderTargetFromCPUBuffer(Device &Dev,
9898
llvm::Error offloadtest::buildPipelineAccelerationStructures(
9999
Device &Dev, ComputeEncoder &Enc, Pipeline &P,
100100
llvm::SmallVectorImpl<std::unique_ptr<AccelerationStructure>> &OutBLAS,
101-
const llvm::StringMap<std::unique_ptr<AccelerationStructure>>
101+
const llvm::StringMap<
102+
llvm::SmallVector<std::unique_ptr<AccelerationStructure>>>
102103
&PreallocatedTLASes,
103104
llvm::SmallVectorImpl<std::unique_ptr<Buffer>> &OutInputBuffers) {
104105
if (P.AccelStructs.BLAS.empty() && P.AccelStructs.TLAS.empty())
@@ -179,33 +180,41 @@ llvm::Error offloadtest::buildPipelineAccelerationStructures(
179180
// Separate `batchBuildAS()` from the BLAS batch so the BLAS-write →
180181
// TLAS-read barrier between them is implicit.
181182
llvm::SmallVector<TLASBuildRequest> TLASRequests;
182-
TLASRequests.reserve(PreallocatedTLASes.size());
183183
for (const TLASDesc &TD : P.AccelStructs.TLAS) {
184184
auto ASIt = PreallocatedTLASes.find(TD.Name);
185185
if (ASIt == PreallocatedTLASes.end())
186186
continue; // TLAS declared but not bound to any resource.
187-
TLASBuildRequest Req;
188-
Req.AS = ASIt->second.get();
189-
Req.Instances.reserve(TD.Instances.size());
190-
for (const auto &I : TD.Instances) {
191-
auto It = BLASesByName.find(I.BLAS);
192-
if (It == BLASesByName.end())
193-
return llvm::createStringError(std::errc::invalid_argument,
194-
"TLAS '%s' references unknown BLAS '%s'",
195-
TD.Name.c_str(), I.BLAS.c_str());
196-
197-
AccelerationStructureInstance Inst;
198-
static_assert(sizeof(Inst.Transform) == sizeof(I.Transform),
199-
"Transform layout mismatch");
200-
memcpy(Inst.Transform, I.Transform, sizeof(I.Transform));
201-
Inst.InstanceID = I.InstanceID;
202-
Inst.InstanceMask = I.InstanceMask;
203-
Inst.BLAS = It->second;
204-
Req.Instances.push_back(Inst);
187+
const auto &Handles = ASIt->second;
188+
assert(Handles.size() == TD.ArraySize &&
189+
"PreallocatedTLASes entry size must equal TLASDesc::ArraySize");
190+
assert(TD.Instances.size() == TD.ArraySize &&
191+
"TLASDesc::Instances must have ArraySize entries (one per element)");
192+
for (uint32_t Elt = 0; Elt < TD.ArraySize; ++Elt) {
193+
TLASBuildRequest Req;
194+
Req.AS = Handles[Elt].get();
195+
const auto &EltInstances = TD.Instances[Elt];
196+
Req.Instances.reserve(EltInstances.size());
197+
for (const auto &I : EltInstances) {
198+
auto It = BLASesByName.find(I.BLAS);
199+
if (It == BLASesByName.end())
200+
return llvm::createStringError(
201+
std::errc::invalid_argument,
202+
"TLAS '%s' element %u references unknown BLAS '%s'",
203+
TD.Name.c_str(), Elt, I.BLAS.c_str());
204+
205+
AccelerationStructureInstance Inst;
206+
static_assert(sizeof(Inst.Transform) == sizeof(I.Transform),
207+
"Transform layout mismatch");
208+
memcpy(Inst.Transform, I.Transform, sizeof(I.Transform));
209+
Inst.InstanceID = I.InstanceID;
210+
Inst.InstanceMask = I.InstanceMask;
211+
Inst.BLAS = It->second;
212+
Req.Instances.push_back(Inst);
213+
}
214+
if (auto Err = validateTLASBuildRequest(Req))
215+
return Err;
216+
TLASRequests.push_back(std::move(Req));
205217
}
206-
if (auto Err = validateTLASBuildRequest(Req))
207-
return Err;
208-
TLASRequests.push_back(std::move(Req));
209218
}
210219

211220
llvm::SmallVector<ASBuildItem> TLASBatch;

lib/API/MTL/MTLDevice.cpp

Lines changed: 64 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -962,8 +962,11 @@ class MTLDevice : public offloadtest::Device {
962962
// Parallel-indexed to `P.AccelStructs.BLAS`.
963963
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>
964964
BLASes;
965-
// Keyed by `TLASDesc::Name`.
966-
llvm::StringMap<std::unique_ptr<offloadtest::AccelerationStructure>> TLASes;
965+
// Keyed by `TLASDesc::Name`; each value holds `TLASDesc::ArraySize`
966+
// handles (one per descriptor-array element).
967+
llvm::StringMap<
968+
llvm::SmallVector<std::unique_ptr<offloadtest::AccelerationStructure>>>
969+
TLASes;
967970
// Vertex/index buffers consumed during AS builds; must outlive submission.
968971
llvm::SmallVector<std::unique_ptr<offloadtest::Buffer>> ASInputBuffers;
969972
// Per-AS header + contributions buffers; resident at dispatch.
@@ -1302,30 +1305,35 @@ class MTLDevice : public offloadtest::Device {
13021305
return HeapIdx;
13031306
}
13041307

1305-
llvm::Expected<std::unique_ptr<AccelerationStructure>> createAS(Resource &R) {
1306-
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
1307-
assert(R.getArraySize() == 1 && "AS arrays not yet supported");
1308-
auto SizesOrErr =
1309-
getTLASBuildSizes(static_cast<uint32_t>(R.TLASPtr->Instances.size()));
1308+
llvm::Expected<std::unique_ptr<AccelerationStructure>>
1309+
createAS(uint32_t InstanceCount) {
1310+
auto SizesOrErr = getTLASBuildSizes(InstanceCount);
13101311
if (!SizesOrErr)
13111312
return SizesOrErr.takeError();
13121313
return createTLAS(*SizesOrErr);
13131314
}
13141315

13151316
llvm::Error createBuffers(Pipeline &P, InvocationState &IS) {
13161317
auto CreateBuffer =
1317-
[&IS,
1318+
[&P, &IS,
13181319
this](Resource &R,
13191320
llvm::SmallVectorImpl<ResourcePair> &Resources) -> llvm::Error {
13201321
if (R.isAccelerationStructure()) {
1321-
auto ASOrErr = createAS(R);
1322-
if (!ASOrErr)
1323-
return ASOrErr.takeError();
1322+
assert(R.TLASPtr && "AS resource must be resolved to a TLAS");
1323+
const TLASDesc &TD = *R.TLASPtr;
13241324
ResourceBundle Bundle;
1325-
Bundle.emplace_back(
1326-
llvm::cast<MetalAccelerationStructure>(ASOrErr->get()));
1327-
auto Inserted =
1328-
IS.TLASes.try_emplace(R.TLASPtr->Name, std::move(*ASOrErr));
1325+
llvm::SmallVector<std::unique_ptr<AccelerationStructure>> Handles;
1326+
Handles.reserve(TD.ArraySize);
1327+
for (uint32_t Elt = 0; Elt < TD.ArraySize; ++Elt) {
1328+
auto ASOrErr =
1329+
createAS(static_cast<uint32_t>(TD.Instances[Elt].size()));
1330+
if (!ASOrErr)
1331+
return ASOrErr.takeError();
1332+
Bundle.emplace_back(
1333+
llvm::cast<MetalAccelerationStructure>(ASOrErr->get()));
1334+
Handles.push_back(std::move(*ASOrErr));
1335+
}
1336+
auto Inserted = IS.TLASes.try_emplace(TD.Name, std::move(Handles));
13291337
assert(Inserted.second && "TLAS bound to multiple resources NYI");
13301338
(void)Inserted;
13311339
Resources.emplace_back(&R, std::move(Bundle));
@@ -1373,43 +1381,50 @@ class MTLDevice : public offloadtest::Device {
13731381
uint32_t HeapIndex = 0;
13741382
for (auto &T : IS.DescTables) {
13751383
for (auto &R : T.Resources) {
1376-
if (MetalAccelerationStructure *MTLAS = R.second[0].AS) {
1384+
if (R.first->isAccelerationStructure()) {
13771385
// The Metal shader converter binds the AS indirectly through an
13781386
// `IRRaytracingAccelerationStructureGPUHeader` buffer carrying the
13791387
// AS's `gpuResourceID` and a pointer to an instance-contributions
13801388
// array (one `uint32` per instance, equivalent to D3D12's
13811389
// `InstanceContributionToHitGroupIndex`).
1382-
const uint32_t InstCount =
1383-
static_cast<uint32_t>(R.first->TLASPtr->Instances.size());
1384-
llvm::SmallVector<uint32_t> Contributions(InstCount, 0);
1385-
const BufferCreateDesc Desc{MemoryLocation::GpuToCpu,
1386-
BufferUsage::Storage};
1387-
auto ContribBufOrErr = createBufferWithData(
1388-
*IS.CB->Dev, "AS-Contributions", Desc, Contributions.data(),
1389-
InstCount * sizeof(uint32_t), nullptr, nullptr);
1390-
if (!ContribBufOrErr)
1391-
return ContribBufOrErr.takeError();
1392-
auto *MTLContrib = llvm::cast<MTLBuffer>(ContribBufOrErr->get());
1393-
auto HeaderBufOrErr = IS.CB->Dev->createBuffer(
1394-
"AS-Header", Desc,
1395-
sizeof(IRRaytracingAccelerationStructureGPUHeader));
1396-
if (!HeaderBufOrErr)
1397-
return HeaderBufOrErr.takeError();
1398-
auto *MTLHeader = llvm::cast<MTLBuffer>(HeaderBufOrErr->get());
1399-
IRRaytracingSetAccelerationStructure(
1400-
static_cast<uint8_t *>(MTLHeader->Buf->contents()),
1401-
MTLAS->AccelStruct->gpuResourceID(),
1402-
static_cast<uint8_t *>(MTLContrib->Buf->contents()),
1403-
MTLContrib->Buf->gpuAddress(), Contributions.data(), InstCount);
1404-
1405-
IRDescriptorTableSetAccelerationStructure(
1406-
IS.DescHeap->getEntryHandle(HeapIndex),
1407-
MTLHeader->Buf->gpuAddress());
1408-
1409-
// The shader dereferences the contributions buffer through the
1410-
// header, so both must be resident at dispatch.
1411-
IS.ASDescriptorBuffers.push_back(std::move(*HeaderBufOrErr));
1412-
IS.ASDescriptorBuffers.push_back(std::move(*ContribBufOrErr));
1390+
const TLASDesc &TD = *R.first->TLASPtr;
1391+
assert(R.second.size() == TD.ArraySize &&
1392+
"AS bundle must hold one ResourceSet per array element");
1393+
for (uint32_t Elt = 0; Elt < TD.ArraySize; ++Elt) {
1394+
auto *MTLAS =
1395+
llvm::cast<MetalAccelerationStructure>(R.second[Elt].AS);
1396+
const uint32_t InstCount =
1397+
static_cast<uint32_t>(TD.Instances[Elt].size());
1398+
llvm::SmallVector<uint32_t> Contributions(InstCount, 0);
1399+
const BufferCreateDesc Desc{MemoryLocation::GpuToCpu,
1400+
BufferUsage::Storage};
1401+
auto ContribBufOrErr = createBufferWithData(
1402+
*IS.CB->Dev, "AS-Contributions", Desc, Contributions.data(),
1403+
InstCount * sizeof(uint32_t), nullptr, nullptr);
1404+
if (!ContribBufOrErr)
1405+
return ContribBufOrErr.takeError();
1406+
auto *MTLContrib = llvm::cast<MTLBuffer>(ContribBufOrErr->get());
1407+
auto HeaderBufOrErr = IS.CB->Dev->createBuffer(
1408+
"AS-Header", Desc,
1409+
sizeof(IRRaytracingAccelerationStructureGPUHeader));
1410+
if (!HeaderBufOrErr)
1411+
return HeaderBufOrErr.takeError();
1412+
auto *MTLHeader = llvm::cast<MTLBuffer>(HeaderBufOrErr->get());
1413+
IRRaytracingSetAccelerationStructure(
1414+
static_cast<uint8_t *>(MTLHeader->Buf->contents()),
1415+
MTLAS->AccelStruct->gpuResourceID(),
1416+
static_cast<uint8_t *>(MTLContrib->Buf->contents()),
1417+
MTLContrib->Buf->gpuAddress(), Contributions.data(), InstCount);
1418+
1419+
IRDescriptorTableSetAccelerationStructure(
1420+
IS.DescHeap->getEntryHandle(HeapIndex + Elt),
1421+
MTLHeader->Buf->gpuAddress());
1422+
1423+
// The shader dereferences the contributions buffer through the
1424+
// header, so both must be resident at dispatch.
1425+
IS.ASDescriptorBuffers.push_back(std::move(*HeaderBufOrErr));
1426+
IS.ASDescriptorBuffers.push_back(std::move(*ContribBufOrErr));
1427+
}
14131428
HeapIndex += R.first->getArraySize();
14141429
continue;
14151430
}
@@ -1481,7 +1496,8 @@ class MTLDevice : public offloadtest::Device {
14811496
for (auto &AS : IS.BLASes)
14821497
MarkASResident(AS);
14831498
for (auto &Entry : IS.TLASes)
1484-
MarkASResident(Entry.second);
1499+
for (auto &AS : Entry.second)
1500+
MarkASResident(AS);
14851501
for (auto &B : IS.ASDescriptorBuffers)
14861502
NativeEncoder->useResource(llvm::cast<MTLBuffer>(B.get())->Buf,
14871503
MTL::ResourceUsageRead);

0 commit comments

Comments
 (0)