Commit f3c5102
authored
Replace custom size kernel with cub::DeviceReduce::TransformReduce (#811)
This PR replaces the custom size kernel with the corresponding cub util
and the performance has been improved significantly with large
capacities.
Microbenchmark timing of `static_map::size()` only. NVIDIA GH200 480GB,
CUDA 13.1 / GCC 14.3. Map populated with `NumInputs` unique I64 keys at
0.5 occupancy (so capacity = 2 × NumInputs).
| Capacity (slots) | Before (custom kernel) | After
(`cub::DeviceReduce::TransformReduce`) | Speedup |
|---:|---:|---:|---:|
| 2,000,000 | 47.03 us | 43.74 us | 1.08x |
| 20,000,000 | 232.11 us | 122.31 us | **1.90x** |
| 200,000,000 | 2,360 us | 869.70 us | **2.71x** |
| 1,000,000,000 | 11,927 us | 4,212 us | **2.83x** |1 parent b748b9d commit f3c5102
2 files changed
Lines changed: 46 additions & 47 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
707 | 707 | | |
708 | 708 | | |
709 | 709 | | |
710 | | - | |
711 | | - | |
712 | | - | |
713 | | - | |
714 | | - | |
715 | | - | |
716 | | - | |
717 | | - | |
718 | | - | |
719 | | - | |
720 | | - | |
721 | | - | |
722 | | - | |
723 | | - | |
724 | | - | |
725 | | - | |
726 | | - | |
727 | | - | |
728 | | - | |
729 | | - | |
730 | | - | |
731 | | - | |
732 | | - | |
733 | | - | |
734 | | - | |
735 | | - | |
736 | | - | |
737 | | - | |
738 | | - | |
739 | | - | |
740 | | - | |
741 | | - | |
742 | | - | |
743 | | - | |
744 | | - | |
745 | | - | |
746 | | - | |
747 | 710 | | |
748 | 711 | | |
749 | 712 | | |
| |||
Lines changed: 46 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
958 | 959 | | |
959 | 960 | | |
960 | 961 | | |
961 | | - | |
962 | | - | |
963 | | - | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
964 | 968 | | |
965 | | - | |
966 | 969 | | |
967 | 970 | | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
968 | 1007 | | |
969 | | - | |
970 | | - | |
971 | | - | |
972 | | - | |
973 | | - | |
| 1008 | + | |
| 1009 | + | |
974 | 1010 | | |
975 | | - | |
| 1011 | + | |
976 | 1012 | | |
977 | 1013 | | |
978 | 1014 | | |
| |||
0 commit comments