You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- Provide a detailed description of the changes in this PR -->
3
4
4
5
### Type of changes
6
+
5
7
<!-- Mark the relevant option with an [x] -->
6
8
7
-
-[ ]Bug fix (non-breaking change which fixes an issue)
8
-
-[ ]New feature (non-breaking change which adds functionality)
9
-
-[ ]Refactor
10
-
-[ ]Documentation update
11
-
-[ ]Other (please describe):
9
+
-[ ] Bug fix (non-breaking change which fixes an issue)
10
+
-[ ] New feature (non-breaking change which adds functionality)
11
+
-[ ] Refactor
12
+
-[ ] Documentation update
13
+
-[ ] Other (please describe):
12
14
13
15
### CI Pipeline Configuration
16
+
14
17
Configure CI behavior by applying the relevant labels:
15
18
16
19
-[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests
17
20
-[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest
18
21
-[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing
19
22
20
-
> [!NOTE]
23
+
> \[!NOTE\]
21
24
> By default, the notebooks validation tests are skipped unless explicitly enabled.
22
25
23
26
#### Authorizing CI Runs
24
27
25
28
We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI
26
29
runs on NVIDIA's compute resources.
27
30
28
-
* If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
31
+
- If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
29
32
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
30
-
* If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
33
+
- If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
31
34
`/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit.
32
35
33
36
### Usage
37
+
34
38
<!--- How does a user interact with the changed code -->
39
+
35
40
```python
36
-
TODO: Add code snippet
41
+
#TODO: Add code snippet
37
42
```
38
43
39
44
### Pre-submit Checklist
45
+
40
46
<!--- Ensure all items are completed before submitting -->
-**Official Documentation:** For user guides, API references, and troubleshooting, visit our [official documentation](https://docs.nvidia.com/bionemo-framework/latest/).
Different branches of the repo can have different pinned versions of these third-party submodules. Ensure submodules are automatically updated after switching branches or pulling updates by configuring git with:
64
63
65
-
66
64
```bash
67
65
git config submodule.recurse true
68
66
```
@@ -72,21 +70,19 @@ You will have to run the full `git submodule update --init --recursive` command
72
70
73
71
#### Build the Docker Image Locally
74
72
75
-
76
73
With a locally cloned repository and initialized submodules, build the BioNeMo container using:
77
74
78
75
```bash
79
76
docker buildx build . -t my-container-tag
80
77
```
81
78
82
-
83
79
#### VSCode Devcontainer for Interactive Debugging
84
80
85
81
We distribute a [development container](https://devcontainers.github.io/) configuration for vscode
86
82
(`.devcontainer/devcontainer.json`) that simplifies the process of local testing and development. Opening the
87
83
bionemo-framework folder with VSCode should prompt you to re-open the folder inside the devcontainer environment.
88
84
89
-
> [!NOTE]
85
+
> \[!NOTE\]
90
86
> The first time you launch the devcontainer, it may take a long time to build the image. Building the image locally
91
87
> (using the command shown above) will ensure that most of the layers are present in the local docker cache.
- We encourage you to use the following PGP key for secure email communication: [NVIDIA public PGP Key for communication](https://www.nvidia.com/en-us/security/pgp-key)
13
-
- Please include the following information:
14
-
- Product/Driver name and version/branch that contains the vulnerability
15
-
- Type of vulnerability (code execution, denial of service, buffer overflow, etc.)
16
-
- Instructions to reproduce the vulnerability
17
-
- Proof-of-concept or exploit code
18
-
- Potential impact of the vulnerability, including how an attacker could exploit the vulnerability
13
+
- We encourage you to use the following PGP key for secure email communication: [NVIDIA public PGP Key for communication](https://www.nvidia.com/en-us/security/pgp-key)
14
+
- Please include the following information:
15
+
- Product/Driver name and version/branch that contains the vulnerability
16
+
- Type of vulnerability (code execution, denial of service, buffer overflow, etc.)
17
+
- Instructions to reproduce the vulnerability
18
+
- Proof-of-concept or exploit code
19
+
- Potential impact of the vulnerability, including how an attacker could exploit the vulnerability
19
20
20
21
While NVIDIA currently does not have a bug bounty program, we do offer acknowledgement when an externally reported security issue is addressed under our coordinated vulnerability disclosure policy. Please visit our [Product Security Incident Response Team (PSIRT)](https://www.nvidia.com/en-us/security/psirt-policies/) policies page for more information.
for sample in MultiEpochDatasetResampler(dataset, num_epochs=3, shuffle=True):
50
51
...
51
52
```
52
53
53
54
## Training Resumption
54
-
To ensure identical behavior with and without job interruption, BioNeMo provides [MegatronDataModule][bionemo.llm.data.datamodule.MegatronDataModule] to save and load state dict for training resumption, and provides [WrappedDataLoader][nemo.lightning.data.WrappedDataLoader] to add a `mode` attribute to [DataLoader][torch.utils.data.DataLoader].
55
+
56
+
To ensure identical behavior with and without job interruption, BioNeMo provides \[MegatronDataModule\]\[bionemo.llm.data.datamodule.MegatronDataModule\] to save and load state dict for training resumption, and provides \[WrappedDataLoader\]\[nemo.lightning.data.WrappedDataLoader\] to add a `mode` attribute to \[DataLoader\]\[torch.utils.data.DataLoader\].
55
57
56
58
```python
57
59
classMyDataModule(MegatronDataModule):
@@ -83,23 +85,29 @@ class MyDataModule(MegatronDataModule):
83
85
84
86
!!! note "MegatronDataModule"
85
87
86
-
Users will see non-overlapping training curve if their datamodule is not inheritting from `MegatronDataModule`, unless similar logics are handled by the users. In `MegatronDataModule`, `self.update_init_global_step()` must be called right before the dataloaders are returned to ensure that training resumes with the correct sample index instead of restarting from 0 everytime. We recommend users to inherit from `MegatronDataModule` similar to the pattern above.
88
+
```
89
+
Users will see non-overlapping training curve if their datamodule is not inheritting from `MegatronDataModule`, unless similar logics are handled by the users. In `MegatronDataModule`, `self.update_init_global_step()` must be called right before the dataloaders are returned to ensure that training resumes with the correct sample index instead of restarting from 0 everytime. We recommend users to inherit from `MegatronDataModule` similar to the pattern above.
90
+
```
87
91
88
92
!!! note "WrappedDataLoader"
89
93
90
-
The `WrappedDataLoader` class is a wrapper around the PyTorch DataLoader class that adds the `mode` attribute to the dataloader. The dataloader will resume from the last sample index only when mode is 'train'. `val_dataloader` and `test_dataloader` are unaffected.
94
+
```
95
+
The `WrappedDataLoader` class is a wrapper around the PyTorch DataLoader class that adds the `mode` attribute to the dataloader. The dataloader will resume from the last sample index only when mode is 'train'. `val_dataloader` and `test_dataloader` are unaffected.
91
96
92
-
WARNING: 'train' is the default value of `mode` in `WrappedDataLoader`. If not set, users might find their validation/test dataloader changes behavior by resuming from a non-zero sample index.
97
+
WARNING: 'train' is the default value of `mode` in `WrappedDataLoader`. If not set, users might find their validation/test dataloader changes behavior by resuming from a non-zero sample index.
98
+
```
93
99
94
100
## Testing Datasets for Megatron Compatibility
95
101
96
102
BioNeMo also provides utility functions for test suites to validate that datasets conform to the megatron data model.
97
-
The [assert_dataset_compatible_with_megatron][bionemo.testing.data_utils.assert_dataset_compatible_with_megatron]
103
+
The \[assert_dataset_compatible_with_megatron\]\[bionemo.testing.data_utils.assert_dataset_compatible_with_megatron\]
98
104
function calls the dataset with identical indices and ensures the outputs are identical, while also checking to see if
99
105
`torch.manual_seed` was used.
100
106
101
107
!!! example "Example datasets in BioNeMo"
102
108
103
-
The [ESMMaskedResidueDataset][bionemo.esm2.data.dataset.ESMMaskedResidueDataset] demonstrates one approach for
104
-
leveraging [EpochIndex][bionemo.core.data.multi_epoch_dataset.EpochIndex] indices to perform epoch-level
105
-
randomization within the confines of megatron's data model.
109
+
```
110
+
The [ESMMaskedResidueDataset][bionemo.esm2.data.dataset.ESMMaskedResidueDataset] demonstrates one approach for
111
+
leveraging [EpochIndex][bionemo.core.data.multi_epoch_dataset.EpochIndex] indices to perform epoch-level
112
+
randomization within the confines of megatron's data model.
0 commit comments