Skip to content

Revert PR #580 streaming workaround (CCCL #1422 resolved)#810

Open
PointKernel wants to merge 1 commit intoNVIDIA:devfrom
PointKernel:revert-pr-580-streaming
Open

Revert PR #580 streaming workaround (CCCL #1422 resolved)#810
PointKernel wants to merge 1 commit intoNVIDIA:devfrom
PointKernel:revert-pr-580-streaming

Conversation

@PointKernel
Copy link
Copy Markdown
Member

This PR reverts the #580 streaming workaround as large size type is now supported by CUB.

@PointKernel PointKernel requested a review from sleeepyjack as a code owner May 1, 2026 16:57
@PointKernel PointKernel force-pushed the revert-pr-580-streaming branch from a4b8172 to 7e00710 Compare May 1, 2026 17:00
@NVIDIA NVIDIA deleted a comment from copy-pr-bot Bot May 1, 2026
@PointKernel PointKernel added the type: improvement Improvement / enhancement to an existing function label May 1, 2026
Comment on lines +844 to +862
CUCO_CUDA_TRY(cub::DeviceSelect::If(nullptr,
temp_storage_bytes,
begin,
output_begin,
d_num_out,
this->capacity(),
is_filled,
stream.get()));

auto d_temp_storage = temp_allocator.allocate(temp_storage_bytes, stream);

CUCO_CUDA_TRY(cub::DeviceSelect::If(d_temp_storage,
temp_storage_bytes,
begin,
output_begin,
d_num_out,
this->capacity(),
is_filled,
stream.get()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of the classical two-phase approach, we could use CUB's new single-phase API that takes an allocator: https://nvidia.github.io/cccl/unstable/cub/api_docs/device_wide.html#environment-api-single-phase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants