Name	Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt	CMakeLists.txt
README.md	README.md
geadd.cu	geadd.cu
gemm.cu	gemm.cu
gemr2d.cu	gemr2d.cu
helpers.h	helpers.h
matmul_ag.cu	matmul_ag.cu
matmul_ar.cu	matmul_ar.cu
matmul_rs.cu	matmul_rs.cu
matrix_generator.hxx	matrix_generator.hxx
syrk.cu	syrk.cu
tp_matmul.cu	tp_matmul.cu
tradd.cu	tradd.cu
trsm.cu	trsm.cu

Name

Last commit message

Last commit date

README.md

cuBLASMp Library API examples

Description

This folder demonstrates cuBLASMp library API usage.

Samples

Supported OSes

Linux

Supported CPU Architectures

x86_64
arm64-sbsa

Supported Compute Capabilities

Documentation

cuBLASMp documentation

Usage

Prerequisites

cuBLASMp is distributed through NVIDIA Developer Zone, PyPI (CUDA 12, CUDA 13), Conda and HPC SDK. cuBLASMp requires CUDA Toolkit and NCCL to be installed on the system. The samples require C++11 compatible compiler and MPI (used from HPC-X in the Build Steps).

Build Steps

git clone https://github.com/NVIDIA/CUDALibrarySamples.git
cd CUDALibrarySamples/cuBLASMp
mkdir build
cd build
export HPCXROOT=<path/to/hpcx>
export CUBLASMP_HOME=<path/to/cublasmp>
export NCCL_HOME=<path/to/nccl>
source ${HPCXROOT}/hpcx-mt-init-ompi.sh
hpcx_load
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES="75;80;90;100;120" -DCUBLASMP_INCLUDE_DIRECTORIES=${CUBLASMP_HOME}/include -DCUBLASMP_LIBRARIES=${CUBLASMP_HOME}/lib/libcublasmp.so -DNCCL_INCLUDE_DIRECTORIES=${NCCL_HOME}/include -DNCCL_LIBRARIES=${NCCL_HOME}/lib/libnccl.so
make -j

Running

Run examples with mpirun command and number of processes according to process grid values, i.e.

mpirun -n 2 ./tp_matmul

mpirun -n 2 ./matmul_ag -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./matmul_rs -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./matmul_ar -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./gemm -p 2 -q 1

mpirun -n 2 ./trsm -p 2 -q 1

mpirun -n 2 ./syrk -p 2 -q 1

mpirun -n 2 ./geadd -p 2 -q 1

mpirun -n 2 ./tradd -p 2 -q 1

mpirun -n 2 ./gemr2d -p 2 -q 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

cuBLASMp Library API examples

Description

Samples

Supported OSes

Supported CPU Architectures

Supported Compute Capabilities

Documentation

Usage

Prerequisites

Build Steps

Running

FilesExpand file tree

cuBLASMp

Directory actions

More options

Directory actions

More options

Latest commit

History

cuBLASMp

Folders and files

parent directory

README.md

cuBLASMp Library API examples

Description

Samples

Supported OSes

Supported CPU Architectures

Supported Compute Capabilities

Documentation

Usage

Prerequisites

Build Steps

Running