This task should be simple:
- Allow users to opt-in and internally pass
cudaGraphInstantiateFlagDeviceLaunch to cudaGraphInstantiate()
- Add code samples to showcase how this can done
The majority of work would happen on the JIT compiler side, aka numba-cuda; cuda-core will just set up the infrastructure.
This task should be simple:
cudaGraphInstantiateFlagDeviceLaunchtocudaGraphInstantiate()The majority of work would happen on the JIT compiler side, aka numba-cuda; cuda-core will just set up the infrastructure.