You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description
<!-- Provide a detailed description of the changes in this PR -->
#### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels. By default, only
basic unit tests are run.
-
[ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip)
- Skip all CI tests for this PR
-
[ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks)
- Run Jupyter notebooks execution tests for bionemo2
-
[ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow)
- Run slow single GPU integration tests marked as @pytest.mark.slow for
bionemo2
-
[ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all)
- Run all tests (unit tests, slow tests, and notebooks) for bionemo2.
This label can be used to enforce running tests for all bionemo2.
-
[ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes)
- Run tests for all recipes (under bionemo-recipes). This label can be
used to enforce running tests for all recipes.
Unit tests marked as `@pytest.mark.multi_gpu` or
`@pytest.mark.distributed` are not run in the PR pipeline.
For more details, see [CONTRIBUTING](CONTRIBUTING.md)
> [!NOTE]
> By default, only basic unit tests are run. Add appropriate labels to
enable an additional test coverage.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
- If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
- If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [x] I have tested these changes locally
- [x] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
---------
Signed-off-by: Linette Tang <lvojktu@nvidia.com>
Co-authored-by: Linette Tang <lvojktu@nvidia.com>
Co-authored-by: Jared Wilber <jwilber@nvidia.com>
Copy file name to clipboardExpand all lines: bionemo-recipes/README.md
+32-26Lines changed: 32 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
# BioNeMo Recipes
2
2
3
-
BioNeMo Recipes provides an easy path for the biological foundation model training community to scale up transformer-based models efficiently. Rather than offering a batteries-included training framework, we provide **model checkpoints** with TransformerEngine (TE) layers and **training recipes** that demonstrate how to achieve maximum throughput with popular open-source frameworks and fully sharded data parallel (FSDP) scale-out.
3
+
BioNeMo Recipes provides an easy path for the biological foundation model training community to scale up transformer-based models efficiently. Rather than offering a batteries-included training framework, BioNeMo Recipes provide **model checkpoints** with TransformerEngine (TE) layers and **training recipes** that demonstrate how to achieve maximum throughput with popular open-source frameworks and fully sharded data parallel (FSDP) scale-out.
4
4
5
5
## Overview
6
6
7
-
The biological AI community is actively prototyping model architectures and needs tooling that prioritizes extensibility, interoperability, and ease-of-use alongside performance. BioNeMo Recipes addresses this by offering:
7
+
The biological AI community actively prototypes model architectures and needs tooling that prioritizes extensibility, interoperability, and ease-of-use, alongside performance. BioNeMo Recipes addresses this by offering:
8
8
9
-
-**Flexible scaling**: Scale from single-GPU prototyping to multi-node training without complex parallelism configurations
9
+
-**Flexible scaling**: Scales from single-GPU prototyping to multi-node training without complex parallelism configurations
10
10
-**Framework compatibility**: Works with popular frameworks like HuggingFace Accelerate, PyTorch Lightning, and vanilla PyTorch
11
11
-**Performance optimization**: Leverages TransformerEngine and megatron-FSDP for state-of-the-art training efficiency
12
-
-**Research-friendly**: Hackable, readable code that researchers can easily adapt for their experiments
12
+
-**Research-friendly**: Contains hackable and readable code that researchers can easily adapt for their experiments
13
13
14
14
### Performance Benchmarks
15
15
@@ -21,6 +21,8 @@ The biological AI community is actively prototyping model architectures and need
21
21
22
22
### Use Cases
23
23
24
+
The use cases of BioNeMO Recipes include:
25
+
24
26
-**Foundation Model Developers**: AI researchers and ML engineers developing novel biological foundation models who need to scale up prototypes efficiently
25
27
-**Foundation Model Customizers**: Domain scientists looking to fine-tune existing models with proprietary data for drug discovery and biological research
26
28
@@ -48,9 +50,9 @@ Abbreviations:
48
50
- BF16: [brain-float 16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format), a common 16 bit float format for deep learning.
49
51
- FP8<sup>[1]</sup>: [8-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html), a compact format for weights allowing for faster training and inference.
50
52
- MXFP8<sup>[2]</sup>: [Multi Scale 8-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html), as compact as FP8 but with better numerical precision.
51
-
- NVFP4<sup>[2]</sup>: [NVIDIA 4-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#Beyond-FP8---training-with-NVFP4), faster than FP8, retaining accuracy via multi-scale.
52
-
- THD: **T**otal **H**eads **D**imension, also known as ["sequence packing"](https://docs.nvidia.com/nemo-framework/user-guide/24.07/nemotoolkit/features/optimizations/sequence_packing.html#sequence-packing-for-sft-peft). A way to construct a batch with sequences of different length so there are no pads, therefore no compute is wasted on computing attention for padding tokens. This is in contrast to **B**atch **S**equence **H**ead **D**imension (BSHD) format, which uses pads to create a rectangular batch.
53
-
- CP: Context parallel, also known as sequence parallel. A way to distribute the memory required to process long sequences across multiple GPUs. For more information please see[context parallel](./recipes/context_parallel.md)
53
+
- NVFP4<sup>[2]</sup>: [NVIDIA 4-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#Beyond-FP8---training-with-NVFP4), faster than FP8, retaining accuracy using multi-scale.
54
+
- THD: **T**otal **H**eads **D**imension, also known as ["sequence packing"](https://docs.nvidia.com/nemo-framework/user-guide/24.07/nemotoolkit/features/optimizations/sequence_packing.html#sequence-packing-for-sft-peft). A way to construct a batch with sequences of different lengths so there are no pads, which results in no compute wasted on computing attention for padding tokens. This is in contrast to **B**atch **S**equence **H**ead **D**imension (BSHD) format, which uses pads to create a rectangular batch.
55
+
- CP: Context parallel, also known as sequence parallel. A way to distribute the memory required to process long sequences across multiple GPUs. For more information, refer to[context parallel](./recipes/context_parallel.md)
54
56
55
57
\[1\]: Requires [compute capability](https://developer.nvidia.com/cuda-gpus) 9.0 and above (Hopper+) <br/>
56
58
\[2\]: Requires [compute capability](https://developer.nvidia.com/cuda-gpus) 10.0 and 10.3 (Blackwell), 12.0 support pending <br/>
@@ -63,7 +65,7 @@ This repository contains two types of components:
63
65
64
66
Huggingface-compatible `PreTrainedModel` classes that use TransformerEngine layers internally. These are designed to be:
65
67
66
-
-**Distributed via Hugging Face Hub**: Pre-converted checkpoints available at [huggingface.co/nvidia](https://huggingface.co/nvidia)
68
+
-**Distributed through Hugging Face Hub**: Pre-converted checkpoints available at [huggingface.co/nvidia](https://huggingface.co/nvidia)
67
69
-**Drop-in replacements**: Compatible with `AutoModel.from_pretrained()` without additional dependencies
68
70
-**Performance optimized**: Leverage TransformerEngine features like FP8 training and context parallelism
69
71
@@ -82,7 +84,11 @@ Recipes are **not pip-installable packages** but serve as reference implementati
82
84
83
85
## Quick Start
84
86
85
-
### Using Models
87
+
This section describe how you can get started with BioNeMo Recipes.
Copy file name to clipboardExpand all lines: bionemo-recipes/models/README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
# Models Directory
2
2
3
-
This directory contains HuggingFace-compatible model implementations that use TransformerEngine layers internally. These models are designed to be distributed via the Hugging Face Hub and serve as drop-in replacements for standard transformer models with enhanced performance.
3
+
This directory contains HuggingFace-compatible model implementations that use TransformerEngine layers internally. These models are designed to be distributed through the Hugging Face Hub and serve as drop-in replacements for standard transformer models with enhanced performance.
4
4
5
5
## Overview
6
6
7
7
Models in this directory are **not intended to be pip-installed directly**. Instead, they serve as:
8
8
9
-
1.**Reference implementations** of biological foundation models using TransformerEngine
10
-
2.**Conversion utilities** for transforming existing model checkpoints to TE-compatible format
11
-
3.**Export tools** for preparing model releases on the Hugging Face Hub
9
+
-**Reference implementations** of biological foundation models using TransformerEngine
10
+
-**Conversion utilities** for transforming existing model checkpoints to TE-compatible format
11
+
-**Export tools** for preparing model releases on the Hugging Face Hub
12
12
13
13
Users will typically interact with these models by loading pre-converted checkpoints directly from the Hugging Face Hub using standard transformers APIs.
14
14
@@ -33,7 +33,7 @@ To add a new model to this directory, you must provide:
33
33
#### 3. Checkpoint Export Script
34
34
35
35
-**`export.py`**: Script that packages all necessary files for Hugging Face Hub upload
36
-
-**Complete asset bundling**: Must include all required files (see [Export Requirements](#export-requirements))
36
+
-**Complete asset bundling**: Must include all required files, refer to [Export Requirements](#export-requirements)
37
37
-**Automated process**: Should be runnable with minimal manual intervention
See the commands in [Inference Examples](#inference-examples) above to load and test both the original and converted
118
-
models to ensure loss and logit values are similar. See also the golden value tests in
116
+
To validate the converted models, refer to the commands in [Inference Examples](#inference-examples) above to load and test both the original and converted
117
+
models to ensure loss and logit values are similar. Additionally, refer to the golden value tests in
119
118
[test_modeling_esm_te.py](tests/test_modeling_esm_te.py) and [test_convert.py](tests/test_convert.py).
120
119
121
120
## Developer Guide
@@ -153,7 +152,7 @@ Now deploy the converted checkpoints to the HuggingFace Hub by running the follo
0 commit comments