Dual test for NumPy and CuPy in tests#165
Conversation
This looks like our usual problem with CuPy+MPI... the infinity norms use pylops-mpi/pylops_mpi/DistributedArray.py Line 698 in b5a4c54 recv_buf from the _allreduce_subcomm calls, but this leads to deadlock also for Numpy+MPI (which is probably the reason the code was written like this in first place)... the issue with using recv_buf is that the _allreduce_subcomm method uses self.sub_comm.Allreduce that we know does not play well with CuPy arrays for now as we are not doing any syncronization.
Found issue in PyLops - fixed here PyLops/pylops#689. For now (until the next PyLops release) we can easily fix the test by using |
|
Also I think for all tests we need to add some logic like to have different ranks use different GPUs otherwise they will all run on the same GPU (default=0) |
mrava87
left a comment
There was a problem hiding this comment.
@tharittk good job!
I think this is nearly ready to go. I would maybe add some targets in the Makefile like we have in pylops https://github.com/PyLops/pylops/blob/a94ea8eae3b9c06bf39637b2e29f6a45a0e7766f/Makefile#L54 and in the contributing part of the documentation (see again what we have in pylops and maybe also add something about the NCCL tests and examples which I just realized is missing)
|
@hongyx11 I think this is pretty much ready and a great addition to our test suite as we move forwards trying to change MPI methods from objects to buffers… do you think we can put this into a self-hosted runner like we did for Pylops… I think a single node with even just 2 GPUs would be enough as if will guarantee that we can do some checks on any change we make in the communication bits of our library 😀 |
|
it's doable, let me give it a try, we need to use srun |
Some bugs were found on