Add optimizations to collision kernel for AMD GPUs by gsitaram · Pull Request #14 · UCL-CCS/HemePure-GPU

gsitaram · 2026-03-26T15:52:54Z

Change block size and use template parameter instead of _NUMVECTORS in GPU_CollideStream_mMidFluidCollision_mWallCollision_sBB_WallShearStress kernel to improve achieved memory bandwidth in the kernel. The larger block size helps lower TLB misses and the template parameter makes the number of vectors argument a compile time constant that helps the compiler avoid spilling local arrays to slower scratch memory.

Please test and accept these changes to significantly improve HemeLB performance on AMD GPUs.

…n GPU_CollideStream_mMidFluidCollision_mWallCollision_sBB_WallShearStress

Change block size and use template parameter instead of _NUMVECTORS i…

821e6a6

…n GPU_CollideStream_mMidFluidCollision_mWallCollision_sBB_WallShearStress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimizations to collision kernel for AMD GPUs#14

Add optimizations to collision kernel for AMD GPUs#14
gsitaram wants to merge 1 commit into
UCL-CCS:HIP-CUDA-ROCM_SLfrom
gsitaram:amd_optimizations

gsitaram commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gsitaram commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant