Skip to content

Use PDL in DeviceRadixSort #8765

@gevtushenko

Description

@gevtushenko

The cub::DeviceRadixSort algorithm consists of many kernels. On small problem sizes, gaps between these kernels constitute significant portion of elapsed time.
Image

nsys profile ./bin/cub.bench.radix_sort.keys.base --profile -a 'T{ct}=I32' -a 'OffsetT{ct}=I32' -a 'Elements{io}[pow2]=16' on A6000 Ada

We should try using PDL to accelerate small problem sizes.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions