the loss of pretrain

When I used the HyperGlobal-450K dataset to reproduce the pre training process of the code in your paper, during the training of the spatial subnet, the loss rapidly decreased to around 0.005 after the first two epochs. I want to know if this is normal？