Commit 271825c
Fix masked token loss refactor from NeMo bump. (#855)
### Description
<!-- Provide a detailed description of the changes in this PR -->
- Fixes the error caused by NVIDIA-NeMo/NeMo#12459
refactoring the definition of `masked_token_loss` and
`masked_token_loss_context_parallel` into a single function with a
`cp_size` argument that no longer divides the loss by the number of
"valid" (i.e. non-masked) tokens. So it returns a CP-reduced loss sum.
- Specifically, this breaks one of our golden value tests in
`bionemo-llm`:
`sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch`,
and this fixes it with no behavior change to the LLM model `forward()`,
i.e. we perform the normalization on valid tokens on our side now.
### Details
- Bump NeMo to a version greater than:
NVIDIA-NeMo/NeMo#12856 or matching this:
#798
- Update: Need to migrate to `inference_context` in NeMo:
https://github.com/NVIDIA/NeMo/tree/cye/hyena-gpt-infer-context
- Bump Megatron to support new imports in the NeMo bump. Found a commit
that bisects the new Megatron inference engine and the new NeMo imports
to prevent breakage of our inference tests.
- Use a backend version of RoPE for the Amplify Megatron vs. PyTorch/HF
parity test to avoid the CP process group requirement.
- `MaskedTokenLossReduction.forward()` return API changed.
- Added commentary for future devs to understand the code.
#### Appendix
- NeMo Fork Hotfix Patch: Safe import of a future module in Megatron to
avoid upgrading.
```
get_gpt_heterogeneous_layer_spec, HAVE_GPT_HETEROGENEOUS = safe_import("megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs")
```
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):
### Usage / Testing
<!--- How does a user interact with the changed code -->
- Tested against the commit specified in this PR:
#798
```python
cd 3rdparty/NeMo
git checkout c998e273f9cd23e36d7348fa27d0c2692efd87c8
pytest -s sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch
```
---------
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: cspades <cory0ye@gmail.com>
Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Signed-off-by: Danny <dreidenbach@nvidia.com>
Signed-off-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Signed-off-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com>
Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>
Signed-off-by: Steven <skothenhill@nvidia.com>
Co-authored-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
Co-authored-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
Co-authored-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>
Co-authored-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com>
Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: polinabinder1 <pbinder@nvidia.com>
Co-authored-by: Truong Nguyen <tgnguyen@nvidia.com>
Co-authored-by: jomitchellnv <148147880+jomitchellnv@users.noreply.github.com>
Co-authored-by: lvojtku <lvojtku@nvidia.com>1 parent dc69464 commit 271825c
13 files changed
Lines changed: 109 additions & 25 deletions
File tree
- 3rdparty
- docs/docs
- models
- user-guide/appendix
- sub-packages
- bionemo-amplify/tests/bionemo/amplify
- bionemo-esm2/tests/bionemo/esm2/scripts
- bionemo-evo2
- src/bionemo/evo2/run
- tests/bionemo/evo2/run
- bionemo-llm
- src/bionemo/llm
- model
- utils
- tests/bionemo/llm/model
Submodule Megatron-LM updated 969 files
Submodule NeMo updated from c998e27 to 42d2b55
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
195 | 195 | | |
196 | 196 | | |
197 | 197 | | |
198 | | - | |
| 198 | + | |
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
| 210 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
3 | 15 | | |
4 | 16 | | |
5 | 17 | | |
| |||
12 | 24 | | |
13 | 25 | | |
14 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
15 | 30 | | |
16 | 31 | | |
17 | 32 | | |
| |||
23 | 38 | | |
24 | 39 | | |
25 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
26 | 44 | | |
27 | 45 | | |
28 | 46 | | |
| |||
Lines changed: 15 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
51 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
52 | 64 | | |
53 | 65 | | |
54 | 66 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | | - | |
| 330 | + | |
331 | 331 | | |
332 | 332 | | |
333 | 333 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
161 | 163 | | |
162 | 164 | | |
163 | 165 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
149 | | - | |
| 149 | + | |
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
| |||
0 commit comments