Skip to content

Add optimizations to collision kernel for AMD GPUs#14

Open
gsitaram wants to merge 1 commit into
UCL-CCS:HIP-CUDA-ROCM_SLfrom
gsitaram:amd_optimizations
Open

Add optimizations to collision kernel for AMD GPUs#14
gsitaram wants to merge 1 commit into
UCL-CCS:HIP-CUDA-ROCM_SLfrom
gsitaram:amd_optimizations

Conversation

@gsitaram

Copy link
Copy Markdown

Change block size and use template parameter instead of _NUMVECTORS in GPU_CollideStream_mMidFluidCollision_mWallCollision_sBB_WallShearStress kernel to improve achieved memory bandwidth in the kernel. The larger block size helps lower TLB misses and the template parameter makes the number of vectors argument a compile time constant that helps the compiler avoid spilling local arrays to slower scratch memory.

Please test and accept these changes to significantly improve HemeLB performance on AMD GPUs.

…n GPU_CollideStream_mMidFluidCollision_mWallCollision_sBB_WallShearStress
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant