Skip to content

Commit f0d4bfd

Browse files
jstjohnmoradza
andauthored
Evo2 20b support and validation (#1536)
### Description * Adds in and verifies the evo2 20b checkpoint. * Pulls in bugfix for CP from @moradza's #1524 #### Usage See example notebooks. Output from the BRCA notebook with the 20b checkpoint: <img width="783" height="499" alt="image" src="https://github.com/user-attachments/assets/2c9f3bc0-8b15-48be-8a99-e9184c43df5a" /> ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks). This label can be used to enforce running all framework tests. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. #### Triggering Code Rabbit AI Review To trigger a code review from code rabbit, comment on a pull request with one of these commands: - @coderabbitai review - Triggers a standard review - @coderabbitai full review - Triggers a comprehensive review See https://docs.coderabbit.ai/reference/review-commands for a full list of commands. ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [x] I have tested these changes locally - [x] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: amoradzadeh <amoradzadeh@nvidia.com> Signed-off-by: John St. John <jstjohn@nvidia.com> Signed-off-by: John St John <jstjohn@nvidia.com> Co-authored-by: amoradzadeh <amoradzadeh@nvidia.com>
1 parent 1942554 commit f0d4bfd

11 files changed

Lines changed: 348 additions & 109 deletions

File tree

bionemo-recipes/recipes/evo2_megatron/README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -248,19 +248,22 @@ evo2_convert_savanna_to_mbridge \
248248
```
249249

250250
The `--savanna-ckpt-path` accepts either a local `.pt` file path or a HuggingFace
251-
repo ID (e.g., `arcinstitute/savanna_evo2_1b_base`). Available Savanna checkpoints:
251+
repo ID (e.g., `arcinstitute/savanna_evo2_1b_base`). Available Savanna checkpoints include:
252252

253253
| HuggingFace Repo | Model Size |
254254
| ------------------------------------ | --------------- |
255255
| `arcinstitute/savanna_evo2_1b_base` | `evo2_1b_base` |
256+
| `arcinstitute/savanna_evo2_7b_base` | `evo2_7b_base` |
256257
| `arcinstitute/savanna_evo2_7b` | `evo2_7b` |
258+
| `arcinstitute/savanna_evo2_20b` | `evo2_20b` |
257259
| `arcinstitute/savanna_evo2_40b_base` | `evo2_40b_base` |
260+
| `arcinstitute/savanna_evo2_40b` | `evo2_40b` |
258261

259262
Options:
260263

261264
- `--no-te` — disable Transformer Engine fused layernorm key mapping (use if the
262265
checkpoint was saved without TE).
263-
- `--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`).
266+
- `--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`). NOTE for checkpoints sensitive to FP8 and Hopper you need to run with `--mixed-precision-recipe bf16-mixed` and also supply the `--vortex-style-fp8` option for prediction/inference, you should not use the fp8 recipe for those models, as they are sensitive to the exact FP8 configuration they were trained with in savanna, see the [table under the section on available nvidia checkpoints for download from NGC](#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge).
264267
- `--verbose` / `-v` — enable debug logging.
265268

266269
## Exporting to Vortex format
@@ -353,8 +356,8 @@ docker build -t evo2_megatron_recipe-$(git rev-parse --short HEAD) .
353356

354357
## Performance and accuracy comparisons
355358

356-
NOTE: this section is largely a work in progress. This reflects the most updated information, but may not reflect the
357-
current state of the code base at any given time.
359+
> **Note:** This section is largely a work in progress. This reflects the most updated information, but may not
360+
> reflect the current state of the code base at any given time.
358361
359362
### Training accuracy convergence
360363

@@ -397,14 +400,21 @@ have currently demonstrated small training runs at 2M context on only 512 H100 G
397400

398401
## Available models in NGC (Currently NeMo format so first convert to mbridge)
399402

403+
> **Note:** If you would like to use one of the checkpoints that requires FP8 and Hopper (e.g., that does not work
404+
> on Blackwell), you need to supply both `--mixed-precision-recipe bf16-mixed` to disable the default Megatron FP8
405+
> recipes, as well as `--vortex-style-fp8` which enables the custom FP8 recipe that supports these models. For the
406+
> robust NVIDIA fine-tuned variants of these models, you can run with FP8 using the available Megatron recipes. The
407+
> `evo2_7b` model size does not have these sensitivity issues so it can be executed with Megatron style FP8 or BF16.
408+
400409
| HF Model | BioNeMo Resource Name | Blackwell FP8 | Blackwell BF16 | Hopper FP8 | Hopper BF16 | Ampere | Notes |
401410
| ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | ------------- | -------------- | ---------- | ----------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
402411
| [arcinstitute/savanna_evo2_1b_base](https://huggingface.co/arcinstitute/savanna_evo2_1b_base) | [evo2/1b-8k:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-1b-8k-nemo2) |||||| Low accuracy on bf16 (eg ampere) GPUs |
403412
| | [evo2/1b-8k-bf16:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-1b-8k-bf16-nemo2) |||||| Fine-tuned variant of the 1b-8k that supports bf16 as well as fp8, enabling ampere as well as hopper/blackwell. |
404413
| [arcinstitute/savanna_evo2_7b_base](https://huggingface.co/arcinstitute/savanna_evo2_7b_base) | [evo2/7b-8k:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-7b-8k-nemo2) |||||| The original 7b models have good accuracy across the board at bf16 and fp8 across tested hardware. |
405414
| [arcinstitute/savanna_evo2_7b](https://huggingface.co/arcinstitute/savanna_evo2_7b) | [evo2/7b-1m:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-7b-1m-nemo2) |||||| The original 7b models have good accuracy across the board at bf16 and fp8 across tested hardware. |
415+
| [arcinstitute/savanna_evo2_20b](https://huggingface.co/arcinstitute/savanna_evo2_20b) | | ? | ? |||| The 20b model appears to have the same FP8+Hopper support matrix as the 40b model, but we have not tested all configurations thoroughly yet. |
406416
| [arcinstitute/savanna_evo2_40b_base](https://huggingface.co/arcinstitute/savanna_evo2_40b_base) | | ? | ? | ? | ? | ? | Unknown, likely has the same support pattern as the 40b-1m row below since this is the same model at an earlier step of training. |
407-
| [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b) | |||||| The original 40b-1m context trained model only supports hpper fp8 |
417+
| [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b) | |||||| The original 40b-1m context trained model only supports Hopper FP8 |
408418
| | [evo2/40b-1m-fp8-bf16:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-40b-1m-fp8-bf16-nemo2) |||||| A fine-tuned variant of [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b) with broad hardware support (fp8 or bf16 and ampere, hopper, and blackwell have all been tested). The original model only has good accuracy on hopper fp8. |
409419

410420
On the CLI you can access the resources in this table (and others) with:

bionemo-recipes/recipes/evo2_megatron/examples/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
*.yaml
88

99
# directories created during these notebook runs.
10+
*_mbridge
1011
evo2_20b_finetune/
1112
savanna_20b_download/
1213
nemo2_evo2_1b_8k/

bionemo-recipes/recipes/evo2_megatron/examples/fine-tuning-tutorial.ipynb

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,72 @@
217217
}
218218
]
219219
},
220+
{
221+
"cell_type": "markdown",
222+
"metadata": {},
223+
"source": [
224+
"### Obtaining Evo2 checkpoints\n",
225+
"\n",
226+
"There are two ways to obtain Evo2 checkpoints for fine-tuning with this recipe:\n",
227+
"\n",
228+
"#### Option 1: Download NVIDIA fine-tuned variants from NGC\n",
229+
"\n",
230+
"NVIDIA provides pre-converted NeMo2 checkpoints on NGC that can be downloaded with `download_bionemo_data` on the\n",
231+
"command line or `bionemo.core.data.load` in Python. These include variants that have been fine-tuned for broader\n",
232+
"hardware compatibility:\n",
233+
"\n",
234+
"| NGC Resource Name | `--model-size` | Notes |\n",
235+
"| --- | --- | --- |\n",
236+
"| `evo2/1b-8k-bf16:1.0` | `evo2_1b_base` | Fine-tuned for FP8 and BF16 on Ampere+ GPUs, accuracy parity with original |\n",
237+
"| `evo2/7b-1m:1.0` | `evo2_7b` | Original model, robust to FP8/BF16 across GPU architectures |\n",
238+
"| `evo2/40b-1m-fp8-bf16:1.0` | `evo2_40b` | Fine-tuned for FP8 and BF16 on Ampere+ GPUs (slight accuracy regression vs. original on Hopper FP8) |\n",
239+
"\n",
240+
"Run `download_bionemo_data --list-resources` for the full list of available checkpoints. After downloading, convert\n",
241+
"to MBridge format with `evo2_convert_nemo2_to_mbridge` as shown in the next section.\n",
242+
"\n",
243+
"#### Option 2: Convert Arc Institute's Savanna checkpoints from HuggingFace\n",
244+
"\n",
245+
"The original Evo2 models are published by the Arc Institute in Savanna format on HuggingFace. You can convert these\n",
246+
"directly to MBridge format using `evo2_convert_savanna_to_mbridge`, which accepts a HuggingFace repo ID as the\n",
247+
"`--savanna-ckpt-path` argument and will download and convert in one step:\n",
248+
"\n",
249+
"```bash\n",
250+
"evo2_convert_savanna_to_mbridge \\\n",
251+
" --savanna-ckpt-path arcinstitute/savanna_evo2_20b \\\n",
252+
" --mbridge-ckpt-dir evo2_20b_mbridge \\\n",
253+
" --model-size evo2_20b \\\n",
254+
" --tokenizer-path tokenizers/nucleotide_fast_tokenizer_512 \\\n",
255+
" --seq-length 8192\n",
256+
"```\n",
257+
"\n",
258+
"Available Savanna checkpoints on HuggingFace:\n",
259+
"\n",
260+
"| HuggingFace Repo | `--model-size` |\n",
261+
"| --- | --- |\n",
262+
"| `arcinstitute/savanna_evo2_1b_base` | `evo2_1b_base` |\n",
263+
"| `arcinstitute/savanna_evo2_7b` | `evo2_7b` |\n",
264+
"| `arcinstitute/savanna_evo2_20b` | `evo2_20b` |\n",
265+
"| `arcinstitute/savanna_evo2_40b` | `evo2_40b` |\n",
266+
"\n",
267+
"It is **strongly recommended** to pass a `--revision` (commit SHA) to ensure reproducibility and guard against\n",
268+
"potential checkpoint tampering on HuggingFace. If omitted, the latest commit is used.\n",
269+
"\n",
270+
"#### FP8 and hardware compatibility\n",
271+
"\n",
272+
"The original Arc Institute checkpoints for the **1B, 20B, and 40B** models are sensitive to FP8 precision on Hopper\n",
273+
"GPUs --- they produce degraded accuracy when run in BF16 or on non-Hopper hardware. The **7B model is the only one**\n",
274+
"that is robust across FP8/BF16 and across GPU architectures (Ampere, Hopper, Blackwell).\n",
275+
"\n",
276+
"NVIDIA has fine-tuned the 1B and 40B models to support both FP8 and BF16 on any Ampere or newer GPU (older\n",
277+
"architectures have not been tested). The fine-tuned **1B model achieves accuracy parity** with the original, while\n",
278+
"the fine-tuned **40B model has a slight accuracy regression** compared to the original on Hopper FP8. These\n",
279+
"fine-tuned variants are available as `evo2/1b-8k-bf16:1.0` and `evo2/40b-1m-fp8-bf16:1.0` on NGC.\n",
280+
"\n",
281+
"For detailed accuracy and performance comparisons across hardware and precision configurations, see the\n",
282+
"[Available models in NGC](../README.md#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge)\n",
283+
"table in the recipe README."
284+
]
285+
},
220286
{
221287
"cell_type": "markdown",
222288
"metadata": {},
@@ -1134,6 +1200,37 @@
11341200
"\n",
11351201
"```"
11361202
]
1203+
},
1204+
{
1205+
"cell_type": "markdown",
1206+
"metadata": {},
1207+
"source": [
1208+
"### Exporting to Vortex format\n",
1209+
"\n",
1210+
"After fine-tuning, you may want to export your MBridge checkpoint to Vortex format for use with the\n",
1211+
"Arc Institute's [evo2](https://github.com/ArcInstitute/evo2) inference package. The\n",
1212+
"`evo2_export_mbridge_to_vortex` command converts MBridge distributed-checkpoint weights into a single `.pt` file\n",
1213+
"in the Vortex format expected by ARC's inference code:\n",
1214+
"\n",
1215+
"```bash\n",
1216+
"evo2_export_mbridge_to_vortex \\\n",
1217+
" --mbridge-ckpt-dir pretraining_demo/evo2/checkpoints/iter_0000100 \\\n",
1218+
" --output-path evo2_1b_vortex.pt \\\n",
1219+
" --model-size evo2_1b_base\n",
1220+
"```\n",
1221+
"\n",
1222+
"The `--mbridge-ckpt-dir` should point to a specific iteration directory (e.g., `iter_0000100`) within your\n",
1223+
"checkpoint directory. The exporter handles MLP weight splitting, Hyena filter pole/residue computation, and\n",
1224+
"layer-norm key remapping.\n",
1225+
"\n",
1226+
"Options:\n",
1227+
"- `--model-size` — the model architecture key (e.g., `evo2_1b_base`, `evo2_7b`, `evo2_20b`, `evo2_40b`)\n",
1228+
"- `--no-te` — disable Transformer Engine fused layernorm key mapping (use if the checkpoint was saved without TE)\n",
1229+
"- `--verbose` / `-v` — enable debug logging\n",
1230+
"\n",
1231+
"Once exported, the resulting `.pt` file can be loaded with the\n",
1232+
"[arcinstitute/evo2](https://github.com/ArcInstitute/evo2) package for inference."
1233+
]
11371234
}
11381235
],
11391236
"metadata": {

0 commit comments

Comments
 (0)