Replace detail::merge_sort::dispatch by CUB's public API#8473
Replace detail::merge_sort::dispatch by CUB's public API#8473bernhardmgruber merged 7 commits intoNVIDIA:mainfrom
detail::merge_sort::dispatch by CUB's public API#8473Conversation
| } | ||
| }; | ||
|
|
||
| struct block_size_compare_t |
There was a problem hiding this comment.
This operator assumes that only one CTA is going to be launched. Perhaps an assertion in the if branch should be added to make that assumption explicit
There was a problem hiding this comment.
Right. I added atomicMax to reduce the blockDim in case there would be multiple blocks.
This comment has been minimized.
This comment has been minimized.
6447c37 to
4eb7bcc
Compare
| unsigned int* ptr; | ||
|
|
||
| __device__ int operator()(int a, int b) | ||
| { | ||
| if (threadIdx.x == 0) | ||
| { | ||
| *ptr = blockDim.x; | ||
| // use an atomic operation to write the block dim in case multiple blocks are launched | ||
| atomicMax(ptr, blockDim.x); | ||
| } | ||
| return a + b; | ||
| } |
There was a problem hiding this comment.
Remark: This is a drive-by fix for consistency.
| if constexpr (cuda::std::execution::__queryable_with<env_t, get_expected_allocation_size_t>) | ||
| { | ||
| const size_t expected_bytes_allocated = fixed_env.query(get_expected_allocation_size_t{}); | ||
| REQUIRE(expected_bytes_allocated == bytes_allocated); | ||
| } |
There was a problem hiding this comment.
By making the expected_allocation_size check in the launch wrapper optional, we can also use the launch wrappers to test whether e.g. tunings were applied. At least for TEST_LAUNCH != 1.
@gonidelis I think you ran into the same issue, which is why many tests are guarded by #if TEST_LAUNCH == 0. Just FYI.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
323a1d7 to
0aa5860
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
0aa5860 to
0c99200
Compare
🥳 CI Workflow Results🟩 Finished in 1h 37m: Pass: 100%/323 | Total: 4d 22h | Max: 1h 09m | Hits: 92%/297570See results here. |
Fixes: #8376
detail::merge::dispatchby CUB's public API #8381thrust.test.sorton SM120cub.bench.merge_sort.pairs.baseon SM120