Replace `detail::merge_sort::dispatch` by CUB's public API by bernhardmgruber · Pull Request #8473 · NVIDIA/cccl

bernhardmgruber · 2026-04-15T22:16:11Z

Merge before: Replace detail::merge::dispatch by CUB's public API #8381
No SASS changes in thrust.test.sort on SM120
No SASS changes in cub.bench.merge_sort.pairs.base on SM120

oleksandr-pavlyk · 2026-04-15T22:58:58Z

+  }
+};
+
+struct block_size_compare_t


This operator assumes that only one CTA is going to be launched. Perhaps an assertion in the if branch should be added to make that assumption explicit

Right. I added atomicMax to reduce the blockDim in case there would be multiple blocks.

bernhardmgruber · 2026-04-16T10:13:07Z

+  unsigned int* ptr;

  __device__ int operator()(int a, int b)
  {
    if (threadIdx.x == 0)
    {
-      *ptr = blockDim.x;
+      // use an atomic operation to write the block dim in case multiple blocks are launched
+      atomicMax(ptr, blockDim.x);
    }
    return a + b;
  }


Remark: This is a drive-by fix for consistency.

bernhardmgruber · 2026-04-16T10:36:08Z

+  if constexpr (cuda::std::execution::__queryable_with<env_t, get_expected_allocation_size_t>)
+  {
+    const size_t expected_bytes_allocated = fixed_env.query(get_expected_allocation_size_t{});
+    REQUIRE(expected_bytes_allocated == bytes_allocated);
+  }


By making the expected_allocation_size check in the launch wrapper optional, we can also use the launch wrappers to test whether e.g. tunings were applied. At least for TEST_LAUNCH != 1.

@gonidelis I think you ran into the same issue, which is why many tests are guarded by #if TEST_LAUNCH == 0. Just FYI.

Fixes: NVIDIA#8376

github-actions · 2026-04-20T12:49:15Z

🥳 CI Workflow Results

🟩 Finished in 1h 37m: Pass: 100%/323 | Total: 4d 22h | Max: 1h 09m | Hits: 92%/297570

See results here.

bernhardmgruber requested review from a team as code owners April 15, 2026 22:16

bernhardmgruber requested a review from oleksandr-pavlyk April 15, 2026 22:16

github-project-automation Bot added this to CCCL Apr 15, 2026

bernhardmgruber requested review from elstehle and pauleonix April 15, 2026 22:16

github-project-automation Bot moved this to Todo in CCCL Apr 15, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Apr 15, 2026

oleksandr-pavlyk approved these changes Apr 15, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

bernhardmgruber force-pushed the use_merge_sort_public branch 3 times, most recently from 6447c37 to 4eb7bcc Compare April 16, 2026 10:08

bernhardmgruber commented Apr 16, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

bernhardmgruber mentioned this pull request Apr 16, 2026

Replace detail::adjacent_difference::dispatch by CUB's public API #8492

Merged

1 task

This comment has been minimized.

Sign in to view

bernhardmgruber mentioned this pull request Apr 16, 2026

Replace detail::scan::dispatch by CUB's public API #8495

Merged

3 tasks

bernhardmgruber force-pushed the use_merge_sort_public branch from 323a1d7 to 0aa5860 Compare April 16, 2026 23:15

This comment has been minimized.

Sign in to view

bernhardmgruber added 7 commits April 20, 2026 13:09

Drive-by

ac89a45

Replace detail::merge_sort::dispatch by CUB's public API

0bb8474

Fixes: NVIDIA#8376

Refactor

babda00

FIx

a43def0

Cleanup and more launch types

7baf2ff

FIx SASS

7336b67

Fix warning on MSVC

0c99200

bernhardmgruber force-pushed the use_merge_sort_public branch from 0aa5860 to 0c99200 Compare April 20, 2026 11:09

miscco approved these changes Apr 20, 2026

View reviewed changes

bernhardmgruber enabled auto-merge (squash) April 20, 2026 11:38

bernhardmgruber merged commit 68c6260 into NVIDIA:main Apr 20, 2026
343 checks passed

github-project-automation Bot moved this from In Review to Done in CCCL Apr 20, 2026

bernhardmgruber deleted the use_merge_sort_public branch April 20, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `detail::merge_sort::dispatch` by CUB's public API#8473

Replace `detail::merge_sort::dispatch` by CUB's public API#8473
bernhardmgruber merged 7 commits intoNVIDIA:mainfrom
bernhardmgruber:use_merge_sort_public

bernhardmgruber commented Apr 15, 2026 •

edited

Loading

Uh oh!

oleksandr-pavlyk Apr 15, 2026

Uh oh!

bernhardmgruber Apr 16, 2026

Uh oh!

This comment has been minimized.

bernhardmgruber Apr 16, 2026

Uh oh!

bernhardmgruber Apr 16, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bernhardmgruber commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oleksandr-pavlyk Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

bernhardmgruber Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 20, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 37m: Pass: 100%/323 | Total: 4d 22h | Max: 1h 09m | Hits: 92%/297570

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bernhardmgruber commented Apr 15, 2026 •

edited

Loading

bernhardmgruber Apr 16, 2026 •

edited

Loading