Improved CUDA performance through pipelined reads by rietmann-nv · Pull Request #222 · UoB-HPC/BabelStream

rietmann-nv · 2026-06-11T14:30:52Z

This PR improves performance of all CUDA stream examples implemented through pipelined reads, particularly on Blackwell compute GPUs like GB200. For large enough arrays, we see performance of well over 7TB/s, which is much closer to the theoretical bandwidth available. I increased the default array size, to better saturate modern devices.

In this PR, I also fixed the compilation of the thrust version, which didn't get updated when the Stream interface changed.

I also made a small fix for compilation on CUDA 13.2, which can't have -DDEFAULT due to a compatibility issue with CCCL. I changed it to -DBABEL_DEFAULT.

Very open to feedback, thanks!

"DEFAULT" define messes with a CCCL internal define. Switched to BABEL_DEFAULT. The actual "DEFAULT" isn't used in the code, it just represents the "#else" case.

bernhardmgruber · 2026-06-11T14:48:31Z

+template <class T>
+struct ThrustStream<T>::h_Impl{
+  thrust::host_vector<T> a, b, c;
+};


Suggestion: let's just add the new vectors to Impl:

Suggested change

};

template <class T>

struct ThrustStream<T>::Impl{

vector<T> a, b, c;

#if !(defined(PAGEFAULT) || defined(MANAGED))

// we need separate host allocations to hold the data for get_arrays()

thrust::host_vector<T> host_a, host_b, host_c;

#endif

};

bernhardmgruber · 2026-06-11T14:49:02Z

+    struct h_Impl;
    std::unique_ptr<Impl> impl; // avoid thrust vectors leaking into non-CUDA translation units
+    std::unique_ptr<h_Impl> h_impl; // If UVM is disabled, host arrays for verification purposes


Not needed, if the host vectors are moved into Impl

bernhardmgruber · 2026-06-11T14:49:47Z

+# "DEFAULT" define causes a compile error in newer cuda CCCL, so we change to BABEL_DEFAULT
 register_flag_optional(MEM "Device memory mode:
-        DEFAULT   - allocate host and device memory pointers.
+        BABEL_DEFAULT   - allocate host and device memory pointers.
        MANAGED   - use CUDA Managed Memory.
        PAGEFAULT - shared memory, only host pointers allocated."
-        "DEFAULT")
+        "BABEL_DEFAULT")


Important: That's a bug in CCCL, if we are senstive to that macro. Please file an issue. Maybe @miscco can have a look at that.

That's resolved now in upstream CCCL: NVIDIA/cccl#9406

rietmann-nv added 6 commits June 5, 2026 08:35

WIP Commit: Mostly working updated faster stream benchmarks

bfd23ea

Fixed dot product (it was using a smaller threads/block)

1d3ed8f

Update read_arrays to get_arrays for thrust code

de22e7f

Improved thrust impl to handle non-managed pointers

b44a30f

Fix uninitialized object

f0ff6f2

Fix compile issue with newer cuda versions and CCCL

0961041

"DEFAULT" define messes with a CCCL internal define. Switched to BABEL_DEFAULT. The actual "DEFAULT" isn't used in the code, it just represents the "#else" case.

bernhardmgruber reviewed Jun 11, 2026

View reviewed changes

rietmann-nv added 2 commits June 11, 2026 16:50

Put host_vectors into same Impl as device/managed vector

4fd37be

Slight tweak to host_vector impl

83b190d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved CUDA performance through pipelined reads#222

Improved CUDA performance through pipelined reads#222
rietmann-nv wants to merge 8 commits into
UoB-HPC:developfrom
rietmann-nv:mr/cuda_pipeline

rietmann-nv commented Jun 11, 2026

Uh oh!

bernhardmgruber Jun 11, 2026

Uh oh!

bernhardmgruber Jun 11, 2026

Uh oh!

bernhardmgruber Jun 11, 2026

Uh oh!

bernhardmgruber Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-};
+template <class T>
+struct ThrustStream<T>::Impl{
+  vector<T> a, b, c;
+#if !(defined(PAGEFAULT) || defined(MANAGED))
+  // we need separate host allocations to hold the data for get_arrays()
+  thrust::host_vector<T> host_a, host_b, host_c;
+#endif
+};

Uh oh!

Conversation

rietmann-nv commented Jun 11, 2026

Uh oh!

bernhardmgruber Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants