Commit 1f708c9
committed
This increases achieved occupancy
This increases achieved occupancy by almost a factor four.
For the "dev_evaluate_gravity" kernel, the occupancy is increased from 16.66% to 61.65%. The grid size was 16, which yielded the following recommendation from profiling
"
The grid for this launch is configured to execute only 16 blocks, which is less than the GPU's 64 multiprocessors. This can underutilize some multiprocessors."
Now it is 256.
The improvement in occupancy was investigated using ncu:
"
ncu --kernel-name dev_evaluate_gravity -c 100 --print-summary per-kernel -o dev_evaluate_gravity.prof python scripts/Erwan_1207_imp.py
ncu -i dev_evaluate_gravity.prof.ncu-rep | less
"1 parent b726e4f commit 1f708c9
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
0 commit comments