MiniDxNn - GPU‑accelerated MLP inference on DirectX 12 with Cooperative Vector

An implementation of MLP (Multi-Layer Perceptron) inference using DirectX 12 Cooperative Vector. This library demonstrates GPU-accelerated neural network inference with cutting-edge shader features.

Key Features:

🚀 High Performance: GPU-accelerated inference using DirectX 12 Cooperative Vector
🎯 Flexible Architecture: Configurable layer dimensions, activation functions, and data types
🔧 Single-header HLSL: Easy integration into existing DirectX 12 projects

For an overview of Cooperative Vector, please read D3D12 Cooperative Vector.

Quick Start

# Clone with submodules
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDxNn.git
cd MiniDxNn

# Build library and examples (examples are ON by default)
cmake -B build
cmake --build build --config Release

# Build with tests enabled
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release

# Run unit tests (from build/unittest as CWD)
cd build/unittest
ctest -C Release

Requirements

Operating System: Windows 10/11 with Developer Mode enabled
GPU: Supports Shader Model 6.9 and Cooperative Vector in D3D12 (AMD Radeon™ RX 9000 Series GPUs or equivalent NVIDIA)
Build Tools:
- CMake 3.21.0 or higher
- Visual Studio 2022 (or compatible C++ compiler with C++20 support)
- Windows SDK with DirectX 12
Python (for training): Python 3.8+ with PyTorch (optional, for example training)

Installation

1. Enable Experimental Features

⚠️ Important: As of early 2026, Cooperative Vector requires experimental feature support:

System setup

Install Cooperative Vector supported driver
Enable Windows Developer Mode

2. Clone the Repository

git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDxNn.git
cd MiniDxNn

⚠️ Important: Use --recursive flag to include submodules (gfx, GoogleTest, CLI11).

Build and Run

Basic Build (Library + Examples)

By default, the core library and examples are built (MINIDXNN_BUILD_EXAMPLES=ON). Tests are off by default.

# Configure and build (library + examples)
cmake -B build
cmake --build build --config Release

# Build with tests enabled
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release

# Run tests (from build/unittest as CWD)
cd build/unittest
ctest -C Release

The example executables will be built in the build/example/Release/ directory (or build/example/Debug/ for Debug builds). Run examples from the build/example directory as working directory. See example/README.md for details on running the examples.

Unit tests are run from the build/unittest directory as working directory:

cd build/unittest
ctest -C Release

Usage

Please follow the installation instruction to enable experimental feature support on your system.

DX12 setup

Use Direct 12 Agility SDK Version 1.717.1 preview and DirectX Shader Compiler Version v1.8.2505.1 for DX12 runtime
Enable Experimental Shader Model using D3D12EnableExperimentalFeatures before creating the D3D12 device
Compile shaders with Shader Model 6.9

HLSL Integration

The core MLP inference functionality is provided as a header-only HLSL library. Include it in your compute shaders:

#include <hlsl/mlp.hlsl>

// Define your network architecture
static const uint NUM_HIDDEN_LAYERS = 2;
static const int INPUT_DIM = 2;
static const int HIDDEN_DIM = 64;
static const int OUTPUT_DIM = 3;

// Create layer data reference using convenience alias (ByteAddressBuffer, with bias)
using MlpLayerData = mininn::InferenceLayerDataRef<
    NUM_HIDDEN_LAYERS,
    HIDDEN_DIM,
    half,  // staging element type
    dx::linalg::DATA_TYPE_FLOAT16,
    dx::linalg::MATRIX_LAYOUT_ROW_MAJOR,
    dx::linalg::DATA_TYPE_FLOAT16,  // bias type
    dx::linalg::DATA_TYPE_FLOAT16,  // accumulator type
    mininn::ReluActivation,          // hidden layer activation
    mininn::SigmoidActivation        // output layer activation
>;

ByteAddressBuffer g_weights : register(t0);
ByteAddressBuffer g_biases  : register(t1);

[numthreads(8, 8, 1)]
void main(uint3 threadId : SV_DispatchThreadID)
{
    MlpLayerData layerData;
    layerData.setWeightData(g_weights);
    layerData.setBiasData(g_biases);

    vector<float, INPUT_DIM> input = ...;  // your input
    vector<float, OUTPUT_DIM> output;

    mininn::forward(output, input, layerData);

    // Use output...
}

See the MLP HLSL API Documentation for complete details.

Project Structure

MiniDxNn/
├── include/hlsl/            # HLSL shader library (header-only)
│   └── mlp.hlsl            # MLP forward pass with Cooperative Vector
├── docs/                 # Documentation
│   └── mlp_hlsl.md       # HLSL API documentation
├── example/                 # Example applications
│   ├── common/             # Shared C++ utilities
│   ├── kernel/             # HLSL compute shaders for examples
│   ├── 01_texture_inference/  # Texture synthesis example
│   └── README.md           # Example documentation
├── scripts/                 # Training and utilities
│   └── pyreference/        # Python reference implementations
│       └── texture_reconstruction_mlp.py  # MLP training script
├── unittest/               # Unit tests
│   ├── kernel/            # HLSL compute shaders for tests
│   ├── test.cpp/hpp       # Test infrastructure
│   └── unittest.cpp       # GoogleTest main
├── third_party/           # External dependencies
│   ├── gfx/              # DirectX 12 framework (submodule)
│   ├── googletest/       # Unit testing framework (submodule)
│   ├── CLI11/            # Command line parser (submodule)
│   └── half-2.2.1.zip    # Half-precision float library
├── cmake/                # CMake build scripts
├── tools/                # 
├── CMakeLists.txt        # Root build configuration
├── LICENSE               # MIT License
└── NOTICE.md             # Third-party notices

Features

Supported Architectures

Multi-Layer Perceptron (MLP): 0 to N hidden layers with configurable dimensions
Flexible Layer Sizes: Independent input, hidden, and output dimensions

Activation Functions

Identity: Linear pass-through (f(x) = x)
Sigmoid: Numerically stable sigmoid (f(x) = 1/(1+e^(-x)))
ReLU: Rectified Linear Unit (f(x) = max(0, x))
Leaky ReLU: Leaky ReLU with configurable slope (f(x) = max(0.01x, x))

Data Types

float16 (half): Currently only half precision (float16) is supported for MLP computation

Matrix Layouts

Row-Major: Rows contiguous in memory

Examples

See the example directory for complete working examples:

01_texture_inference: Texture synthesis using MLP
- Train a network with PyTorch, export to binary format
- GPU inference with Cooperative Vector
- Image generation and export

For detailed usage, see example/README.md.

Performance

The library is designed for high-performance inference:

GPU Acceleration: Parallel execution across all threads
Cooperative Vector: Leverages DirectX 12's advanced shader features for efficient SIMD operations
Memory Efficiency: Optimized buffer layouts with configurable alignment

Benchmark results vary by GPU architecture, network size, and data types. See the examples for practical performance characteristics.

Documentation

MLP HLSL API Reference: Complete HLSL API documentation
Example Guide: Example applications and usage patterns
Cooperative Vector Spec: HLSL specification
D3D12 Cooperative Vector Blog: Overview and getting started

License

This project is licensed under the MIT License - see the LICENSE file for details.

Third-Party Notices

This project uses the following third-party libraries:

Half-precision floating-point library - MIT License
gfx - Direct3D12 framework - MIT License
CLI11 - Command line parser for C++ - BSD-3-Clause License
GoogleTest - Testing framework - BSD-3-Clause License

See NOTICE.md for complete third-party notices.

References

Cooperative Vector Specification - HLSL Spec Proposal
D3D12 Cooperative Vector Overview - DirectX Developer Blog

Acknowledgments

Developed by Advanced Micro Devices, Inc. as a demonstration of DirectX 12 Cooperative Vector capabilities for GPU-accelerated machine learning workloads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniDxNn - GPU‑accelerated MLP inference on DirectX 12 with Cooperative Vector

Quick Start

Requirements

Installation

1. Enable Experimental Features

System setup

2. Clone the Repository

Build and Run

Basic Build (Library + Examples)

Usage

DX12 setup

HLSL Integration

Project Structure

Features

Supported Architectures

Activation Functions

Data Types

Matrix Layouts

Examples

Performance

Documentation

License

Third-Party Notices

References

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MiniDxNn - GPU‑accelerated MLP inference on DirectX 12 with Cooperative Vector

Quick Start

Requirements

Installation

1. Enable Experimental Features

System setup

2. Clone the Repository

Build and Run

Basic Build (Library + Examples)

Usage

DX12 setup

HLSL Integration

Project Structure

Features

Supported Architectures

Activation Functions

Data Types

Matrix Layouts

Examples

Performance

Documentation

License

Third-Party Notices

References

Acknowledgments