I noticed that both the covariance and the Frobenius norm are computed differently in your implementation. You compute the Frobenius norm as below: ` # frobenius norm between source and target` ` loss = torch.mean(torch.mul((xc - xct), (xc - xct)))` However as stated here http://mathworld.wolfram.com/FrobeniusNorm.html , after squaring each element and summing them, should be computed the square root of the sum not the mean of the squared elements. In the original paper the covariances are computed as below : https://arxiv.org/abs/1607.01719 <img width="817" alt="screen shot 2018-07-15 at 20 04 29" src="https://user-images.githubusercontent.com/22859687/42736673-5b97bf16-886a-11e8-90ce-fbc80894465d.png"> While in your implementation: ``` # source covariance xm = torch.mean(source, 0, keepdim=True) - source xc = xm.t() @ xm # target covariance xmt = torch.mean(target, 0, keepdim=True) - target xct = xmt.t() @ xmt ```