Skip to content

Cleanup CUDA implementation a bit#199

Closed
gonzalobg wants to merge 6 commits into
UoB-HPC:developfrom
gonzalobg:cuda_cleanup
Closed

Cleanup CUDA implementation a bit#199
gonzalobg wants to merge 6 commits into
UoB-HPC:developfrom
gonzalobg:cuda_cleanup

Conversation

@gonzalobg

Copy link
Copy Markdown
Contributor
  • Refactor all kernels into a generic "parallel for" algorithm
    • Supports grid-stride and block-stride loops, configurable with model flag
    • Handles devices of different sizes via occupancy APIs
  • Refactor memory allocation APIs
  • Prints more GPU details, in particular, the theoretical peak BW in GB/s of the current device, using the NVML library (which is part of the CUDA Toolkit and always available)
  • Fixes 2 bugs:
    • Prints the "order" used to run the benchmarks (e.g. classic vs isolated)
    • Fixes a division by zero bug in the solution checking

@gonzalobg

Copy link
Copy Markdown
Contributor Author

This was passing. Seems like this and other PRs are spuriously failing due to some cache issue @tom91136 @tomdeakin

@gonzalobg

Copy link
Copy Markdown
Contributor Author

Closing for #202

@gonzalobg gonzalobg closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant