An implementation of MLP (Multi-Layer Perceptron) inference using DirectX 12 Cooperative Vector. This library demonstrates GPU-accelerated neural network inference with cutting-edge shader features.
Key Features:
- 🚀 High Performance: GPU-accelerated inference using DirectX 12 Cooperative Vector
- 🎯 Flexible Architecture: Configurable layer dimensions, activation functions, and data types
- 🔧 Single-header HLSL: Easy integration into existing DirectX 12 projects
For an overview of Cooperative Vector, please read D3D12 Cooperative Vector.
# Clone with submodules
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDxNn.git
cd MiniDxNn
# Build library and examples (examples are ON by default)
cmake -B build
cmake --build build --config Release
# Build with tests enabled
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release
# Run unit tests (from build/unittest as CWD)
cd build/unittest
ctest -C Release- Operating System: Windows 10/11 with Developer Mode enabled
- GPU: Supports Shader Model 6.9 and Cooperative Vector in D3D12 (AMD Radeon™ RX 9000 Series GPUs or equivalent NVIDIA)
- Build Tools:
- CMake 3.21.0 or higher
- Visual Studio 2022 (or compatible C++ compiler with C++20 support)
- Windows SDK with DirectX 12
- Python (for training): Python 3.8+ with PyTorch (optional, for example training)
- Install Cooperative Vector supported driver
- Enable Windows Developer Mode
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDxNn.git
cd MiniDxNn--recursive flag to include submodules (gfx, GoogleTest, CLI11).
By default, the core library and examples are built (MINIDXNN_BUILD_EXAMPLES=ON). Tests are off by default.
# Configure and build (library + examples)
cmake -B build
cmake --build build --config Release
# Build with tests enabled
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release
# Run tests (from build/unittest as CWD)
cd build/unittest
ctest -C ReleaseThe example executables will be built in the build/example/Release/ directory (or build/example/Debug/ for Debug builds). Run examples from the build/example directory as working directory. See example/README.md for details on running the examples.
Unit tests are run from the build/unittest directory as working directory:
cd build/unittest
ctest -C ReleasePlease follow the installation instruction to enable experimental feature support on your system.
- Use Direct 12 Agility SDK Version 1.717.1 preview and DirectX Shader Compiler Version v1.8.2505.1 for DX12 runtime
- Enable Experimental Shader Model using D3D12EnableExperimentalFeatures before creating the D3D12 device
- Compile shaders with Shader Model 6.9
The core MLP inference functionality is provided as a header-only HLSL library. Include it in your compute shaders:
#include <hlsl/mlp.hlsl>
// Define your network architecture
static const uint NUM_HIDDEN_LAYERS = 2;
static const int INPUT_DIM = 2;
static const int HIDDEN_DIM = 64;
static const int OUTPUT_DIM = 3;
// Create layer data reference using convenience alias (ByteAddressBuffer, with bias)
using MlpLayerData = mininn::InferenceLayerDataRef<
NUM_HIDDEN_LAYERS,
HIDDEN_DIM,
half, // staging element type
dx::linalg::DATA_TYPE_FLOAT16,
dx::linalg::MATRIX_LAYOUT_ROW_MAJOR,
dx::linalg::DATA_TYPE_FLOAT16, // bias type
dx::linalg::DATA_TYPE_FLOAT16, // accumulator type
mininn::ReluActivation, // hidden layer activation
mininn::SigmoidActivation // output layer activation
>;
ByteAddressBuffer g_weights : register(t0);
ByteAddressBuffer g_biases : register(t1);
[numthreads(8, 8, 1)]
void main(uint3 threadId : SV_DispatchThreadID)
{
MlpLayerData layerData;
layerData.setWeightData(g_weights);
layerData.setBiasData(g_biases);
vector<float, INPUT_DIM> input = ...; // your input
vector<float, OUTPUT_DIM> output;
mininn::forward(output, input, layerData);
// Use output...
}See the MLP HLSL API Documentation for complete details.
MiniDxNn/
├── include/hlsl/ # HLSL shader library (header-only)
│ └── mlp.hlsl # MLP forward pass with Cooperative Vector
├── docs/ # Documentation
│ └── mlp_hlsl.md # HLSL API documentation
├── example/ # Example applications
│ ├── common/ # Shared C++ utilities
│ ├── kernel/ # HLSL compute shaders for examples
│ ├── 01_texture_inference/ # Texture synthesis example
│ └── README.md # Example documentation
├── scripts/ # Training and utilities
│ └── pyreference/ # Python reference implementations
│ └── texture_reconstruction_mlp.py # MLP training script
├── unittest/ # Unit tests
│ ├── kernel/ # HLSL compute shaders for tests
│ ├── test.cpp/hpp # Test infrastructure
│ └── unittest.cpp # GoogleTest main
├── third_party/ # External dependencies
│ ├── gfx/ # DirectX 12 framework (submodule)
│ ├── googletest/ # Unit testing framework (submodule)
│ ├── CLI11/ # Command line parser (submodule)
│ └── half-2.2.1.zip # Half-precision float library
├── cmake/ # CMake build scripts
├── tools/ #
├── CMakeLists.txt # Root build configuration
├── LICENSE # MIT License
└── NOTICE.md # Third-party notices
- Multi-Layer Perceptron (MLP): 0 to N hidden layers with configurable dimensions
- Flexible Layer Sizes: Independent input, hidden, and output dimensions
- Identity: Linear pass-through (f(x) = x)
- Sigmoid: Numerically stable sigmoid (f(x) = 1/(1+e^(-x)))
- ReLU: Rectified Linear Unit (f(x) = max(0, x))
- Leaky ReLU: Leaky ReLU with configurable slope (f(x) = max(0.01x, x))
- float16 (half): Currently only half precision (float16) is supported for MLP computation
- Row-Major: Rows contiguous in memory
See the example directory for complete working examples:
- 01_texture_inference: Texture synthesis using MLP
- Train a network with PyTorch, export to binary format
- GPU inference with Cooperative Vector
- Image generation and export
For detailed usage, see example/README.md.
The library is designed for high-performance inference:
- GPU Acceleration: Parallel execution across all threads
- Cooperative Vector: Leverages DirectX 12's advanced shader features for efficient SIMD operations
- Memory Efficiency: Optimized buffer layouts with configurable alignment
Benchmark results vary by GPU architecture, network size, and data types. See the examples for practical performance characteristics.
- MLP HLSL API Reference: Complete HLSL API documentation
- Example Guide: Example applications and usage patterns
- Cooperative Vector Spec: HLSL specification
- D3D12 Cooperative Vector Blog: Overview and getting started
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.
This project uses the following third-party libraries:
- Half-precision floating-point library - MIT License
- gfx - Direct3D12 framework - MIT License
- CLI11 - Command line parser for C++ - BSD-3-Clause License
- GoogleTest - Testing framework - BSD-3-Clause License
See NOTICE.md for complete third-party notices.
- Cooperative Vector Specification - HLSL Spec Proposal
- D3D12 Cooperative Vector Overview - DirectX Developer Blog
Developed by Advanced Micro Devices, Inc. as a demonstration of DirectX 12 Cooperative Vector capabilities for GPU-accelerated machine learning workloads.
