I have observed that the pre-trained checkpoints for Pythia-2.8B are identical across the initial training range, specifically from Step 0 to Step 6300.
After performing a direct tensor comparison between these checkpoints, the weights appear to be byte-for-byte identical. This suggests that there was an issue with how these specific checkpoints were uploaded.
I have observed that the pre-trained checkpoints for Pythia-2.8B are identical across the initial training range, specifically from Step 0 to Step 6300.
After performing a direct tensor comparison between these checkpoints, the weights appear to be byte-for-byte identical. This suggests that there was an issue with how these specific checkpoints were uploaded.