Skip to content

Support arrays of TLAS bindings#1305

Open
MarijnS95 wants to merge 1 commit into
llvm:mainfrom
Traverse-Research:tlas-array-bindings
Open

Support arrays of TLAS bindings#1305
MarijnS95 wants to merge 1 commit into
llvm:mainfrom
Traverse-Research:tlas-array-bindings

Conversation

@MarijnS95

Copy link
Copy Markdown
Collaborator

Depends on #1245

Summary

Mirror the existing buffer/texture array pattern (ArraySize plus an ArraySize-driven YAML field) for top-level acceleration structures so shaders can declare RaytracingAccelerationStructure Scenes[N] and bind N distinct TLASes through a single resource entry.

  • Schema (Support/Pipeline.{h,cpp}): TLASDesc gains ArraySize (default 1) and reshapes Instances from SmallVector<InstanceDesc> to SmallVector<SmallVector<InstanceDesc>>. MappingTraits<TLASDesc> dispatches on ArraySize exactly like setData() does for CPUBuffer: flat Instances: [...] for the scalar case, list-of-lists for arrays, with an ActualSize != ArraySize validation error on mismatch. Resource::getArraySize() returns TLASPtr->ArraySize for AS resources (moved out-of-line so it can dereference the still-forward-declared TLASDesc).
  • createAS(): takes uint32_t InstanceCount directly — pure single-create that sizes and allocates one TLAS. No Pipeline / Resource / InvocationState parameters. The multi-create (createBuffers / createResources) loops TD.ArraySize and pushes one bundle entry per element plus N handles into InvocationState::TLASes, which becomes StringMap<SmallVector<unique_ptr<AccelerationStructure>>> (one vector per TLASDesc::Name, sized to ArraySize).
  • buildPipelineAccelerationStructures() walks P.AccelStructs.TLAS and, for each name with a pre-allocated vector, builds one TLASBuildRequest per element using TD.Instances[Elt] and Handles[Elt].
  • Vulkan: descriptor write iterates the bundle's ResourceRefs to fill N entries of pAccelerationStructures and sets descriptorCount = R.getArraySize() so the descriptor set sees the full array. Pool sizing (ASDescriptorCount) and binding layout (ResourceBinding.DescriptorCount) already used R.getArraySize() for AS resources — they just start returning the right number once TLASDesc::ArraySize flows through.
  • DX12: bindSRV()'s AS branch already loops ResBundle; with the new multi-entry bundle it now writes N RAYTRACING_ACCELERATION_STRUCTURE SRVs into consecutive heap slots automatically. Heap sizing already uses getDescriptorCountWithFlattenedArrays().
  • Metal: the AS descriptor-binding loop now builds one IRRaytracingAccelerationStructureGPUHeader + instance-contributions buffer pair per array element and writes one descriptor entry per element via IRDescriptorTableSetAccelerationStructure(). MarkASResident descends into the per-name vector.

New test test/Feature/InlineRT/tlas-array.test declares Scenes[2], each TLAS carries one triangle instance with a distinct InstanceID (10 and 20), and the shader writes each CommittedInstanceID() into Output[i]. Gated on acceleration-structure, XFAIL: Clang.

Test plan

Local on an NVIDIA RTX 3060:

  • Linux Vulkan (native offloader)
  • Linux D3D12 (Wine + vkd3d-proton + cross-compiled offloader.exe)
  • Windows Vulkan (native offloader.exe)
  • Windows D3D12 (native offloader.exe)

CI (RT-capable runners):

  • windows-nvidia D3D12 (RaytracingTier 1.2)
  • windows-intel VK (VK_KHR_ray_tracing_pipeline)
  • macOS Metal (supportsRaytracing)

Mirror the existing buffer/texture array pattern (`ArraySize` plus an
ArraySize-driven YAML field) for top-level acceleration structures so
shaders can declare `RaytracingAccelerationStructure Scenes[N]` and bind
N distinct TLASes through a single resource entry.

Schema (`include/Support/Pipeline.h`, `lib/Support/Pipeline.cpp`):
- `TLASDesc` gains `ArraySize` (default 1) and reshapes `Instances` from
  `SmallVector<InstanceDesc>` to `SmallVector<SmallVector<InstanceDesc>>`
  — outer vector indexed by array element, inner vector lists instances
  for that element.
- `MappingTraits<TLASDesc>` dispatches on `ArraySize` the same way
  `setData()` does for CPUBuffer: flat `Instances: [...]` for the
  scalar case, list-of-lists for arrays, with an `ActualSize !=
  ArraySize` validation error on mismatch.
- `Resource::getArraySize()` returns `TLASPtr->ArraySize` for AS
  resources; moved out-of-line so it can dereference the (still
  forward-declared at the Resource definition) `TLASDesc`.
- BLAS-name resolution in the pipeline post-process descends through
  the extra layer of nesting.

Backend plumbing (VK / DX / MTL):
- `createAS()` now takes `uint32_t InstanceCount` directly (no Resource
  / Pipeline / InvocationState access) — pure single-create that just
  sizes and allocates one TLAS.
- The multi-create (`createBuffers` / `createResources`) loops
  `TD.ArraySize` and pushes one bundle entry per element plus N handles
  into `InvocationState::TLASes`, which becomes
  `StringMap<SmallVector<unique_ptr<AccelerationStructure>>>` (one
  vector per `TLASDesc::Name`, sized to `ArraySize`).
- `buildPipelineAccelerationStructures()` walks `P.AccelStructs.TLAS`
  and, for each name with a pre-allocated vector, builds one
  `TLASBuildRequest` per element using `TD.Instances[Elt]` and
  `Handles[Elt]`.
- Vulkan descriptor write iterates the bundle's `ResourceRefs` to fill
  N entries of `pAccelerationStructures` and sets `descriptorCount =
  R.getArraySize()` so the descriptor set sees the full array.
- DX's `bindSRV` AS branch already loops `ResBundle`; with the new
  multi-entry bundle it now writes N RAYTRACING_ACCELERATION_STRUCTURE
  SRVs into consecutive heap slots automatically. Heap sizing already
  uses `getDescriptorCountWithFlattenedArrays()`.
- Metal's AS descriptor-binding loop now builds one
  `IRRaytracingAccelerationStructureGPUHeader` + instance-contributions
  buffer pair per array element and writes a descriptor entry per
  element via `IRDescriptorTableSetAccelerationStructure`.
  `MarkASResident` descends into the per-name vector.

Test: `test/Feature/InlineRT/tlas-array.test` declares `Scenes[2]`,
each TLAS carries one triangle instance with a distinct `InstanceID`
(10 and 20), and the shader writes each `CommittedInstanceID()` into
`Output[i]`. Gated on `acceleration-structure`, `XFAIL: Clang`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MarijnS95 MarijnS95 force-pushed the tlas-array-bindings branch from c2ae187 to 368c470 Compare June 11, 2026 16:57
@MarijnS95 MarijnS95 marked this pull request as ready for review June 11, 2026 16:57
return 1;
return BufferPtr->ArraySize;
}
uint32_t getArraySize() const; // out-of-line: needs complete TLASDesc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining why this is out of line isn't terribly useful I think. If somebody for some reason really wanted it to be inline they'd figure it out in a hurry.

Suggested change
uint32_t getArraySize() const; // out-of-line: needs complete TLASDesc.
uint32_t getArraySize() const;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants