Update EVO2 tests according to Hyena arch changes#798
Conversation
fca02bc to
58706fe
Compare
|
/ok to test |
1 similar comment
|
/ok to test |
4c5ac7d to
c14f433
Compare
|
LGTM but will let John verify: |
|
/ok to test |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #798 +/- ##
==========================================
+ Coverage 84.37% 84.42% +0.05%
==========================================
Files 138 138
Lines 8690 8686 -4
==========================================
+ Hits 7332 7333 +1
+ Misses 1358 1353 -5
🚀 New features to boost your workflow:
|
jstjohn
left a comment
There was a problem hiding this comment.
Approved but see my comment in line about manual verification of tensor parallel correctness. Ideally the same could be done for CP=2, but I am not 100% that we have that working in the predict script.
|
Need to bump NeMo to get the changes in NVIDIA-NeMo/NeMo#12988 after its merged |
|
/ok to test b950c28 |
|
/ok to test 6e23633 |
|
/ok to test a395a8b |
|
/ok to test 5f7f9ed |
|
/ok to test f25057b |
Previously if pre-commit failed, it would cause `run-tests` to be skipped, which would then mean that `verify-tests-status` would give the PR the green light. That's obviously a problem, so we need to make sure all these tests are added to the verify-tests-status check. Ideally we could put some more logic in that if block to check if we're in a merge queue, whether tests were intentionally skipped, etc. --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
PEFT checkpointing and inference for esm2. --------- Signed-off-by: Polina Binder <pbinder@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description Profiling for LoRA additions to ESM2. --------- Signed-off-by: Polina Binder <pbinder@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description Fixes a recent CRIT vulnerability in h11 ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description
### For the Geneformer documentation:
1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
- Fixed "children" (previously "chidlren")
2. **Formatting improvements**:
- Properly formatted model version bullet points with nesting
- Added proper headings for property categories
- Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections
3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
- Made chart descriptions more professional and consistent
4. **Grammatical improvements**:
- Fixed article usage and punctuation
- Improved sentence structure and clarity
- Fixed section headings capitalization and consistency
5. **Fixed broken notes**:
- Corrected `!! note` to `!!! note` for proper rendering
### For the ESM-2 pretraining documentation:
1. **Grammar and clarity improvements**:
- Fixed article usage ("a ESM-2" → "an ESM-2")
- Fixed formatting of numeric values (e.g., "1." → "1.0")
- Fixed typos ("depreciation" → "deprecation")
- Fixed "trainiing" → "training"
2. **Consistency in terminology**:
- Standardized "BioNeMo" capitalization
- Ensured consistent treatment of "ESM-2" references
3. **Structure and formatting**:
- Improved spacing and paragraph breaks
- Fixed section formatting and readability
### For the training-models documentation:
1. **Capitalization and consistency**:
- Standardized capitalization of model sizes (8M, 650M, 3B)
- Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
- Changed "WandB" to "Weights and Biases" consistently
2. **Formatting improvements**:
- Changed code blocks consistently to include language tags
- Added proper spacing and improved paragraph formatting
- Fixed punctuation in lists and note sections
3. **Grammar and clarity**:
- Added missing commas after introductory phrases
- Fixed formatting of lists for better readability
- Made bulleted explanations more consistent
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [x] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing
> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
---------
Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>
Co-authored-by: lvojtku <lvojtku@nvidia.com>
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description <!-- Provide a detailed description of the changes in this PR --> - Fixes the error caused by NVIDIA-NeMo/NeMo#12459 refactoring the definition of `masked_token_loss` and `masked_token_loss_context_parallel` into a single function with a `cp_size` argument that no longer divides the loss by the number of "valid" (i.e. non-masked) tokens. So it returns a CP-reduced loss sum. - Specifically, this breaks one of our golden value tests in `bionemo-llm`: `sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch`, and this fixes it with no behavior change to the LLM model `forward()`, i.e. we perform the normalization on valid tokens on our side now. ### Details - Bump NeMo to a version greater than: NVIDIA-NeMo/NeMo#12856 or matching this: #798 - Update: Need to migrate to `inference_context` in NeMo: https://github.com/NVIDIA/NeMo/tree/cye/hyena-gpt-infer-context - Bump Megatron to support new imports in the NeMo bump. Found a commit that bisects the new Megatron inference engine and the new NeMo imports to prevent breakage of our inference tests. - Use a backend version of RoPE for the Amplify Megatron vs. PyTorch/HF parity test to avoid the CP process group requirement. - `MaskedTokenLossReduction.forward()` return API changed. - Added commentary for future devs to understand the code. #### Appendix - NeMo Fork Hotfix Patch: Safe import of a future module in Megatron to avoid upgrading. ``` get_gpt_heterogeneous_layer_spec, HAVE_GPT_HETEROGENEOUS = safe_import("megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs") ``` ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Usage / Testing <!--- How does a user interact with the changed code --> - Tested against the commit specified in this PR: #798 ```python cd 3rdparty/NeMo git checkout c998e273f9cd23e36d7348fa27d0c2692efd87c8 pytest -s sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch ``` --------- Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: cspades <cory0ye@gmail.com> Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Danny <dreidenbach@nvidia.com> Signed-off-by: Cory Ye <44509866+cspades@users.noreply.github.com> Signed-off-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Polina Binder <pbinder@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com> Signed-off-by: dorotat <dorotat@nvidia.com> Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Signed-off-by: Steven <skothenhill@nvidia.com> Co-authored-by: Farhad Ramezanghorbani <farhadr@nvidia.com> Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com> Co-authored-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com> Co-authored-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <pbinder@nvidia.com> Co-authored-by: Truong Nguyen <tgnguyen@nvidia.com> Co-authored-by: jomitchellnv <148147880+jomitchellnv@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
271825c to
917c9c4
Compare
|
/ok to test 917c9c4 |
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
|
/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32 |
@farhadrgh, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 1550e85 |
### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: cspades <cory0ye@gmail.com> Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Danny <dreidenbach@nvidia.com> Signed-off-by: Cory Ye <44509866+cspades@users.noreply.github.com> Signed-off-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Polina Binder <pbinder@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com> Signed-off-by: dorotat <dorotat@nvidia.com> Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Signed-off-by: Steven <skothenhill@nvidia.com> Co-authored-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com> Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com> Co-authored-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <pbinder@nvidia.com> Co-authored-by: Truong Nguyen <tgnguyen@nvidia.com> Co-authored-by: jomitchellnv <148147880+jomitchellnv@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: dorotat <dorotat@nvidia.com>
…d inference. (#861) ### Description <!-- Provide a detailed description of the changes in this PR --> - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <cye@nvidia.com>
### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: cspades <cory0ye@gmail.com> Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Danny <dreidenbach@nvidia.com> Signed-off-by: Cory Ye <44509866+cspades@users.noreply.github.com> Signed-off-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Polina Binder <pbinder@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com> Signed-off-by: dorotat <dorotat@nvidia.com> Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Signed-off-by: Steven <skothenhill@nvidia.com> Co-authored-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com> Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com> Co-authored-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: nvdreidenbach <97637601+nvdreidenbach@users.noreply.github.com> Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <pbinder@nvidia.com> Co-authored-by: Truong Nguyen <tgnguyen@nvidia.com> Co-authored-by: jomitchellnv <148147880+jomitchellnv@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Ubuntu <camirr@nvidia.com>
…d inference. (#861) ### Description <!-- Provide a detailed description of the changes in this PR --> - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: Ubuntu <camirr@nvidia.com>
Description
NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes,
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
Note
By default, the notebooks validation tests are skipped unless explicitly enabled.
Authorizing CI Runs
We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
/ok to testcomment on the pull request to trigger CI. This will need to be done for each new commit.Usage
Pre-submit Checklist