You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Makes a number of updates in preparation for llama3 context-parallel
training. It's still not currently working, need to further update the
model to handle the `cu_seq_lens_q_padded` kwargs and would like to add
a single-GPU CP test that uses BSHD inputs to at least exercise this
code in CI.
This PR:
* Only materializes the dataloader on the cp_rank=0, and returns None on
other ranks.
* Uses the scatter operation in the dataloader to synchronize
`StopIteration` exceptions
* Adds tests for the CP dataloader on 1 and 2-gpu machines
* moves llama3 to use DLCM data as the sanity dataset, turns off some
genome collation options by default. This is larger than the dummy
sequences currently used in training, and will make sure we can fill a
few batches in CP testing. We may want to revert this eventually once
we're done bringing up llama3; since it does trigger the tokenizer
download during testing.
* removes `lazy tokenization` from llama3, this wont work. See
https://nvidia.slack.com/archives/C074Z808N05/p1767818883160949
* starts adding CP files for llama3
Closes BIO-11
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
0 commit comments