Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions include/API/CommandBuffer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
//===- CommandBuffer.h - Offload Command Buffer API -----------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
//
//===----------------------------------------------------------------------===//
Comment on lines +8 to +10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of header comment isn't really necessary if we don't have any description of the module inside of it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I saw it in the other PRs as well and thought that was introduced by clang-tidy as a default. Will remove it or add a description in the next PR.


#ifndef OFFLOADTEST_API_COMMANDBUFFER_H
#define OFFLOADTEST_API_COMMANDBUFFER_H

#include "API/API.h"

#include <cassert>

namespace offloadtest {

class CommandBuffer {
GPUAPI API;

public:
explicit CommandBuffer(GPUAPI API) : API(API) {}
virtual ~CommandBuffer() = default;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes with vtables should have an anchor to avoid object size and relocation bloat. See https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers

CommandBuffer(const CommandBuffer &) = delete;
CommandBuffer &operator=(const CommandBuffer &) = delete;

GPUAPI getAPI() const { return API; }

template <typename T> T &as() {
assert(API == T::BackendAPI && "CommandBuffer backend mismatch");
return static_cast<T &>(*this);
}
template <typename T> const T &as() const {
assert(API == T::BackendAPI && "CommandBuffer backend mismatch");
return static_cast<const T &>(*this);
}
Comment on lines +32 to +39
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just set these classes up using LLVM-style RTTI rather than hand rolling it here: https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html

};

} // namespace offloadtest

#endif // OFFLOADTEST_API_COMMANDBUFFER_H
5 changes: 5 additions & 0 deletions include/API/Device.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,10 @@

#include "API/API.h"
#include "API/Capabilities.h"
#include "API/CommandBuffer.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Error.h"

#include <memory>
#include <string>
Expand Down Expand Up @@ -99,6 +101,9 @@ class Device {
size_t SizeInBytes) = 0;
virtual void printExtra(llvm::raw_ostream &OS) {}

virtual llvm::Expected<std::unique_ptr<CommandBuffer>>
createCommandBuffer() = 0;
Comment on lines +104 to +105
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late to do anything about it, but note that the commit message claims:

Device::createCommandBuffer() returns Expected<unique_ptr> with a default "not supported" implementation.

However, this is a pure virtual function - there is no default implementation at all.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I fixed this last minute after realizing there were not actually any backends that leave this unimplemented, but forgot to update the commit message.


virtual ~Device() = 0;

llvm::StringRef getDescription() const { return Description; }
Expand Down
130 changes: 76 additions & 54 deletions lib/API/DX/Device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,36 @@ class DXQueue : public offloadtest::Queue {
}
};

class DXCommandBuffer : public offloadtest::CommandBuffer {
public:
static constexpr GPUAPI BackendAPI = GPUAPI::DirectX;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a false positive, but clang is emitting a warning for these:

E:/code/HLSLTest/lib/API/DX/Device.cpp(395,27): error: unused variable 'BackendAPI' [-Werror,-Wunused-const-variable]
  395 |   static constexpr GPUAPI BackendAPI = GPUAPI::DirectX;
      |                           ^~~~~~~~~~

Apparently we don't have -Werror for the pre-checkin CI jobs here, because it also showed up in those runs: https://github.com/llvm/offload-test-suite/actions/runs/24345003263/job/71083354829#step:11:4368

In any case, switching to LLVM-style RTTI will avoid the issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was mostly to have backend-specific downcasting readily available when needed, for special-case tests as well as backend logic like Queue receiving generic CommandBuffer structures.

I'll look at LLVM-style RTTI as well as checking if the CI jobs can be extended with -Werror going forward as the build was always warning-free until this PR 😅


ComPtr<ID3D12CommandAllocator> Allocator;
ComPtr<ID3D12GraphicsCommandList> CmdList;

static llvm::Expected<std::unique_ptr<DXCommandBuffer>>
create(ComPtr<ID3D12Device> Device) {
auto CB = std::unique_ptr<DXCommandBuffer>(new DXCommandBuffer());
if (auto Err = HR::toError(
Device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT,
IID_PPV_ARGS(&CB->Allocator)),
"Failed to create command allocator."))
return Err;
if (auto Err = HR::toError(
Device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT,
CB->Allocator.Get(), nullptr,
IID_PPV_ARGS(&CB->CmdList)),
"Failed to create command list."))
return Err;
return CB;
}

~DXCommandBuffer() override = default;

private:
DXCommandBuffer() : CommandBuffer(GPUAPI::DirectX) {}
};

class DXDevice : public offloadtest::Device {
private:
ComPtr<IDXCoreAdapter> Adapter;
Expand Down Expand Up @@ -420,8 +450,7 @@ class DXDevice : public offloadtest::Device {
ComPtr<ID3D12RootSignature> RootSig;
ComPtr<ID3D12DescriptorHeap> DescHeap;
ComPtr<ID3D12PipelineState> PSO;
ComPtr<ID3D12CommandAllocator> Allocator;
ComPtr<ID3D12GraphicsCommandList> CmdList;
std::unique_ptr<DXCommandBuffer> CB;
std::unique_ptr<offloadtest::Fence> Fence;

// Resources for graphics pipelines.
Expand Down Expand Up @@ -683,19 +712,9 @@ class DXDevice : public offloadtest::Device {
return llvm::Error::success();
}

llvm::Error createCommandStructures(InvocationState &IS) {
if (auto Err = HR::toError(
Device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT,
IID_PPV_ARGS(&IS.Allocator)),
"Failed to create command allocator."))
return Err;
if (auto Err = HR::toError(
Device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT,
IS.Allocator.Get(), nullptr,
IID_PPV_ARGS(&IS.CmdList)),
"Failed to create command list."))
return Err;
return llvm::Error::success();
llvm::Expected<std::unique_ptr<offloadtest::CommandBuffer>>
createCommandBuffer() override {
return DXCommandBuffer::create(Device);
}

void addResourceUploadCommands(Resource &R, InvocationState &IS,
Expand All @@ -712,10 +731,10 @@ class DXDevice : public offloadtest::Device {
const CD3DX12_TEXTURE_COPY_LOCATION DstLoc(Destination.Get(), 0);
const CD3DX12_TEXTURE_COPY_LOCATION SrcLoc(Source.Get(), Footprint);

IS.CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
IS.CB->CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
} else
IS.CmdList->CopyBufferRegion(Destination.Get(), 0, Source.Get(), 0,
R.size());
IS.CB->CmdList->CopyBufferRegion(Destination.Get(), 0, Source.Get(), 0,
R.size());
addUploadEndBarrier(IS, Destination, R.isReadWrite());
}

Expand Down Expand Up @@ -1182,7 +1201,7 @@ class DXDevice : public offloadtest::Device {
{D3D12_RESOURCE_TRANSITION_BARRIER{
R.Get(), D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_STATE_COPY_DEST}}};
IS.CmdList->ResourceBarrier(1, &Barrier);
IS.CB->CmdList->ResourceBarrier(1, &Barrier);
}

void addUploadEndBarrier(InvocationState &IS, ComPtr<ID3D12Resource> R,
Expand All @@ -1195,21 +1214,21 @@ class DXDevice : public offloadtest::Device {
D3D12_RESOURCE_STATE_COPY_DEST,
IsUAV ? D3D12_RESOURCE_STATE_UNORDERED_ACCESS
: D3D12_RESOURCE_STATE_GENERIC_READ}}};
IS.CmdList->ResourceBarrier(1, &Barrier);
IS.CB->CmdList->ResourceBarrier(1, &Barrier);
}

void addReadbackBeginBarrier(InvocationState &IS, ComPtr<ID3D12Resource> R) {
const D3D12_RESOURCE_BARRIER Barrier = CD3DX12_RESOURCE_BARRIER::Transition(
R.Get(), D3D12_RESOURCE_STATE_UNORDERED_ACCESS,
D3D12_RESOURCE_STATE_COPY_SOURCE);
IS.CmdList->ResourceBarrier(1, &Barrier);
IS.CB->CmdList->ResourceBarrier(1, &Barrier);
}

void addReadbackEndBarrier(InvocationState &IS, ComPtr<ID3D12Resource> R) {
const D3D12_RESOURCE_BARRIER Barrier = CD3DX12_RESOURCE_BARRIER::Transition(
R.Get(), D3D12_RESOURCE_STATE_COPY_SOURCE,
D3D12_RESOURCE_STATE_UNORDERED_ACCESS);
IS.CmdList->ResourceBarrier(1, &Barrier);
IS.CB->CmdList->ResourceBarrier(1, &Barrier);
}

llvm::Error waitForSignal(InvocationState &IS) {
Expand All @@ -1231,11 +1250,11 @@ class DXDevice : public offloadtest::Device {
}

llvm::Error executeCommandList(InvocationState &IS) {
if (auto Err =
HR::toError(IS.CmdList->Close(), "Failed to close command list."))
if (auto Err = HR::toError(IS.CB->CmdList->Close(),
"Failed to close command list."))
return Err;

ID3D12CommandList *const CmdLists[] = {IS.CmdList.Get()};
ID3D12CommandList *const CmdLists[] = {IS.CB->CmdList.Get()};
GraphicsQueue.Queue->ExecuteCommandLists(1, CmdLists);

return waitForSignal(IS);
Expand All @@ -1245,11 +1264,11 @@ class DXDevice : public offloadtest::Device {
CD3DX12_GPU_DESCRIPTOR_HANDLE Handle;
if (IS.DescHeap) {
ID3D12DescriptorHeap *const Heaps[] = {IS.DescHeap.Get()};
IS.CmdList->SetDescriptorHeaps(1, Heaps);
IS.CB->CmdList->SetDescriptorHeaps(1, Heaps);
Handle = IS.DescHeap->GetGPUDescriptorHandleForHeapStart();
}
IS.CmdList->SetComputeRootSignature(IS.RootSig.Get());
IS.CmdList->SetPipelineState(IS.PSO.Get());
IS.CB->CmdList->SetComputeRootSignature(IS.RootSig.Get());
IS.CB->CmdList->SetPipelineState(IS.PSO.Get());

const uint32_t Inc = Device->GetDescriptorHandleIncrementSize(
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
Expand All @@ -1269,14 +1288,15 @@ class DXDevice : public offloadtest::Device {
"Root constant cannot refer to resource arrays.");
const uint32_t NumValues =
Constant.BufferPtr->size() / sizeof(uint32_t);
IS.CmdList->SetComputeRoot32BitConstants(
IS.CB->CmdList->SetComputeRoot32BitConstants(
RootParamIndex++, NumValues,
Constant.BufferPtr->Data.back().get(), ConstantOffset);
ConstantOffset += NumValues;
break;
}
case dx::RootParamKind::DescriptorTable:
IS.CmdList->SetComputeRootDescriptorTable(RootParamIndex++, Handle);
IS.CB->CmdList->SetComputeRootDescriptorTable(RootParamIndex++,
Handle);
Handle.Offset(P.Sets[DescriptorTableIndex++].Resources.size(), Inc);
break;
case dx::RootParamKind::RootDescriptor:
Expand All @@ -1287,17 +1307,17 @@ class DXDevice : public offloadtest::Device {
"Root descriptor cannot refer to resource arrays.");
switch (getDXKind(RootDescIt->first->Kind)) {
case SRV:
IS.CmdList->SetComputeRootShaderResourceView(
IS.CB->CmdList->SetComputeRootShaderResourceView(
RootParamIndex++,
RootDescIt->second.back().Buffer->GetGPUVirtualAddress());
break;
case UAV:
IS.CmdList->SetComputeRootUnorderedAccessView(
IS.CB->CmdList->SetComputeRootUnorderedAccessView(
RootParamIndex++,
RootDescIt->second.back().Buffer->GetGPUVirtualAddress());
break;
case CBV:
IS.CmdList->SetComputeRootConstantBufferView(
IS.CB->CmdList->SetComputeRootConstantBufferView(
RootParamIndex++,
RootDescIt->second.back().Buffer->GetGPUVirtualAddress());
break;
Expand All @@ -1313,15 +1333,15 @@ class DXDevice : public offloadtest::Device {
// descriptor set layout. This is to make it easier to write tests that
// don't need complicated root signatures.
for (uint32_t Idx = 0u; Idx < P.Sets.size(); ++Idx) {
IS.CmdList->SetComputeRootDescriptorTable(Idx, Handle);
IS.CB->CmdList->SetComputeRootDescriptorTable(Idx, Handle);
Handle.Offset(P.Sets[Idx].Resources.size(), Inc);
}
}

const llvm::ArrayRef<int> DispatchSize =
llvm::ArrayRef<int>(P.Shaders[0].DispatchSize);

IS.CmdList->Dispatch(DispatchSize[0], DispatchSize[1], DispatchSize[2]);
IS.CB->CmdList->Dispatch(DispatchSize[0], DispatchSize[1], DispatchSize[2]);

auto CopyBackResource = [&IS, this](ResourcePair &R) {
if (R.first->isTexture()) {
Expand All @@ -1338,7 +1358,7 @@ class DXDevice : public offloadtest::Device {
const CD3DX12_TEXTURE_COPY_LOCATION DstLoc(RS.Readback.Get(),
Footprint);
const CD3DX12_TEXTURE_COPY_LOCATION SrcLoc(RS.Buffer.Get(), 0);
IS.CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
IS.CB->CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
addReadbackEndBarrier(IS, RS.Buffer);
}
return;
Expand All @@ -1347,7 +1367,7 @@ class DXDevice : public offloadtest::Device {
if (RS.Readback == nullptr)
continue;
addReadbackBeginBarrier(IS, RS.Buffer);
IS.CmdList->CopyResource(RS.Readback.Get(), RS.Buffer.Get());
IS.CB->CmdList->CopyResource(RS.Readback.Get(), RS.Buffer.Get());
addReadbackEndBarrier(IS, RS.Buffer);
}
};
Expand Down Expand Up @@ -1527,8 +1547,8 @@ class DXDevice : public offloadtest::Device {
VBView.SizeInBytes = static_cast<UINT>(VBSize);
VBView.StrideInBytes = P.Bindings.getVertexStride();

IS.CmdList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
IS.CmdList->IASetVertexBuffers(0, 1, &VBView);
IS.CB->CmdList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
IS.CB->CmdList->IASetVertexBuffers(0, 1, &VBView);

return llvm::Error::success();
}
Expand Down Expand Up @@ -1606,16 +1626,16 @@ class DXDevice : public offloadtest::Device {
IS.RTVHeap->GetCPUDescriptorHandleForHeapStart();
Device->CreateRenderTargetView(IS.RT.Get(), nullptr, RTVHandle);

IS.CmdList->SetGraphicsRootSignature(IS.RootSig.Get());
IS.CB->CmdList->SetGraphicsRootSignature(IS.RootSig.Get());
if (IS.DescHeap) {
ID3D12DescriptorHeap *const Heaps[] = {IS.DescHeap.Get()};
IS.CmdList->SetDescriptorHeaps(1, Heaps);
IS.CmdList->SetGraphicsRootDescriptorTable(
IS.CB->CmdList->SetDescriptorHeaps(1, Heaps);
IS.CB->CmdList->SetGraphicsRootDescriptorTable(
0, IS.DescHeap->GetGPUDescriptorHandleForHeapStart());
}
IS.CmdList->SetPipelineState(IS.PSO.Get());
IS.CB->CmdList->SetPipelineState(IS.PSO.Get());

IS.CmdList->OMSetRenderTargets(1, &RTVHandle, false, nullptr);
IS.CB->CmdList->OMSetRenderTargets(1, &RTVHandle, false, nullptr);

D3D12_VIEWPORT VP = {};
VP.Width =
Expand All @@ -1626,19 +1646,19 @@ class DXDevice : public offloadtest::Device {
VP.MaxDepth = 1.0f;
VP.TopLeftX = 0.0f;
VP.TopLeftY = 0.0f;
IS.CmdList->RSSetViewports(1, &VP);
IS.CB->CmdList->RSSetViewports(1, &VP);
const D3D12_RECT Scissor = {0, 0, static_cast<LONG>(VP.Width),
static_cast<LONG>(VP.Height)};
IS.CmdList->RSSetScissorRects(1, &Scissor);
IS.CB->CmdList->RSSetScissorRects(1, &Scissor);

IS.CmdList->DrawInstanced(P.Bindings.getVertexCount(), 1, 0, 0);
IS.CB->CmdList->DrawInstanced(P.Bindings.getVertexCount(), 1, 0, 0);

// Transition the render target to copy source and copy to the readback
// buffer.
const D3D12_RESOURCE_BARRIER Barrier = CD3DX12_RESOURCE_BARRIER::Transition(
IS.RT.Get(), D3D12_RESOURCE_STATE_RENDER_TARGET,
D3D12_RESOURCE_STATE_COPY_SOURCE);
IS.CmdList->ResourceBarrier(1, &Barrier);
IS.CB->CmdList->ResourceBarrier(1, &Barrier);

const CPUBuffer &B = *P.Bindings.RTargetBufferPtr;
const D3D12_PLACED_SUBRESOURCE_FOOTPRINT Footprint{
Expand All @@ -1649,7 +1669,7 @@ class DXDevice : public offloadtest::Device {
const CD3DX12_TEXTURE_COPY_LOCATION DstLoc(IS.RTReadback.Get(), Footprint);
const CD3DX12_TEXTURE_COPY_LOCATION SrcLoc(IS.RT.Get(), 0);

IS.CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
IS.CB->CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);

auto CopyBackResource = [&IS, this](ResourcePair &R) {
if (R.first->isTexture()) {
Expand All @@ -1666,7 +1686,7 @@ class DXDevice : public offloadtest::Device {
const CD3DX12_TEXTURE_COPY_LOCATION DstLoc(RS.Readback.Get(),
Footprint);
const CD3DX12_TEXTURE_COPY_LOCATION SrcLoc(RS.Buffer.Get(), 0);
IS.CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
IS.CB->CmdList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
addReadbackEndBarrier(IS, RS.Buffer);
}
return;
Expand All @@ -1675,7 +1695,7 @@ class DXDevice : public offloadtest::Device {
if (RS.Readback == nullptr)
continue;
addReadbackBeginBarrier(IS, RS.Buffer);
IS.CmdList->CopyResource(RS.Readback.Get(), RS.Buffer.Get());
IS.CB->CmdList->CopyResource(RS.Readback.Get(), RS.Buffer.Get());
addReadbackEndBarrier(IS, RS.Buffer);
}
};
Expand Down Expand Up @@ -1726,9 +1746,11 @@ class DXDevice : public offloadtest::Device {
return Err;
llvm::outs() << "Descriptor heap created.\n";

if (auto Err = createCommandStructures(State))
return Err;
llvm::outs() << "Command structures created.\n";
auto CBOrErr = DXCommandBuffer::create(Device);
if (!CBOrErr)
return CBOrErr.takeError();
State.CB = std::move(*CBOrErr);
llvm::outs() << "Command buffer created.\n";

auto FenceOrErr = createFence("Fence");
if (!FenceOrErr)
Expand Down
Loading
Loading