Thanks for your implementation of SPOS by MXNET^_^. But I found the supernet training was too slow when I trained my own network. I profiled the training procedure and found some problems as follows.
At first, the imperative mode is slower than hybrid mode so much. Then I tried to use more GPUs to train, however, get no acceleration. Instead, the GPU utility decreased dramatically when GPU numbers increase. I guess the calculation in different GPUs is serial but not parallel in imperative mode. Have you ever encountered these problems above?
Furthermore, anything can be improved to accelerate the training? Could we set the mode to be imperative when sampling subnet, then change the mode to be hybrid when training subnet?
Waiting for your reply!
Thanks for your implementation of SPOS by MXNET^_^. But I found the supernet training was too slow when I trained my own network. I profiled the training procedure and found some problems as follows.
At first, the
imperativemode is slower thanhybridmode so much. Then I tried to use more GPUs to train, however, get no acceleration. Instead, the GPU utility decreased dramatically when GPU numbers increase. I guess the calculation in different GPUs is serial but not parallel inimperativemode. Have you ever encountered these problems above?Furthermore, anything can be improved to accelerate the training? Could we set the mode to be
imperativewhen sampling subnet, then change the mode to behybridwhen training subnet?Waiting for your reply!