You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -290,6 +289,11 @@ is about `315` μs, which still 160x faster.
290
289
In the output of the profiler we see that there is a lot of overhead caused by launching the kernel itself and then, the execution is relatively fast.
291
290
292
291
While Julia's JAoT greatly enhances the power of prepared kernels, you might quickly run into a case, when you are able to perform the operation on GPU, but it is very slow. Sometimes, it might be just faster to move the array to CPU, perform the operation there and move it back to GPU. Although this sounds like a pretty bad idea, it actually works very well see below.
0 commit comments