Skip to content

Latest commit

 

History

History

README.md

cuBLASMp Library API examples

Description

This folder demonstrates cuBLASMp library API usage.

Samples

Supported OSes

  • Linux

Supported CPU Architectures

  • x86_64
  • arm64-sbsa

Supported Compute Capabilities

Documentation

cuBLASMp documentation

Usage

Prerequisites

cuBLASMp is distributed through NVIDIA Developer Zone, PyPI (CUDA 12, CUDA 13), Conda and HPC SDK. cuBLASMp requires CUDA Toolkit and NCCL to be installed on the system. The samples require C++11 compatible compiler and MPI (used from HPC-X in the Build Steps).

Build Steps

git clone https://github.com/NVIDIA/CUDALibrarySamples.git
cd CUDALibrarySamples/cuBLASMp
mkdir build
cd build
export HPCXROOT=<path/to/hpcx>
export CUBLASMP_HOME=<path/to/cublasmp>
export NCCL_HOME=<path/to/nccl>
source ${HPCXROOT}/hpcx-mt-init-ompi.sh
hpcx_load
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES="75;80;90;100;120" -DCUBLASMP_INCLUDE_DIRECTORIES=${CUBLASMP_HOME}/include -DCUBLASMP_LIBRARIES=${CUBLASMP_HOME}/lib/libcublasmp.so -DNCCL_INCLUDE_DIRECTORIES=${NCCL_HOME}/include -DNCCL_LIBRARIES=${NCCL_HOME}/lib/libnccl.so
make -j

Running

Run examples with mpirun command and number of processes according to process grid values, i.e.

mpirun -n 2 ./tp_matmul

mpirun -n 2 ./matmul_ag -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./matmul_rs -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./matmul_ar -typeA fp16 -typeB fp16 -typeD fp16 -transA t -transB n

mpirun -n 2 ./gemm -p 2 -q 1

mpirun -n 2 ./trsm -p 2 -q 1

mpirun -n 2 ./syrk -p 2 -q 1

mpirun -n 2 ./geadd -p 2 -q 1

mpirun -n 2 ./tradd -p 2 -q 1

mpirun -n 2 ./gemr2d -p 2 -q 1