NVIDIA
diff --git a/‎bionemo-recipes/recipes/evo2_megatron/README.md‎
Lines changed: 15 additions & 5 deletions b/‎bionemo-recipes/recipes/evo2_megatron/README.md‎
Lines changed: 15 additions & 5 deletions
diff --git a/‎bionemo-recipes/recipes/evo2_megatron/examples/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎bionemo-recipes/recipes/evo2_megatron/examples/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎bionemo-recipes/recipes/evo2_megatron/examples/fine-tuning-tutorial.ipynb‎
Lines changed: 97 additions & 0 deletions b/‎bionemo-recipes/recipes/evo2_megatron/examples/fine-tuning-tutorial.ipynb‎
Lines changed: 97 additions & 0 deletions
@@ -248,19 +248,22 @@ evo2_convert_savanna_to_mbridge \
 ```
 
 The `--savanna-ckpt-path` accepts either a local `.pt` file path or a HuggingFace
-repo ID (e.g., `arcinstitute/savanna_evo2_1b_base`). Available Savanna checkpoints:
+repo ID (e.g., `arcinstitute/savanna_evo2_1b_base`). Available Savanna checkpoints include:
 
 | HuggingFace Repo                     | Model Size      |
 | ------------------------------------ | --------------- |
 | `arcinstitute/savanna_evo2_1b_base`  | `evo2_1b_base`  |
+| `arcinstitute/savanna_evo2_7b_base`  | `evo2_7b_base`  |
 | `arcinstitute/savanna_evo2_7b`       | `evo2_7b`       |
+| `arcinstitute/savanna_evo2_20b`      | `evo2_20b`      |
 | `arcinstitute/savanna_evo2_40b_base` | `evo2_40b_base` |
+| `arcinstitute/savanna_evo2_40b`      | `evo2_40b`      |
 
 Options:
 
 - `--no-te` — disable Transformer Engine fused layernorm key mapping (use if the
   checkpoint was saved without TE).
-- `--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`).
+- `--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`). NOTE for checkpoints sensitive to FP8 and Hopper you need to run with `--mixed-precision-recipe bf16-mixed` and also supply the `--vortex-style-fp8` option for prediction/inference, you should not use the fp8 recipe for those models, as they are sensitive to the exact FP8 configuration they were trained with in savanna, see the [table under the section on available nvidia checkpoints for download from NGC](#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge).
 - `--verbose` / `-v` — enable debug logging.
 
 ## Exporting to Vortex format
@@ -353,8 +356,8 @@ docker build -t evo2_megatron_recipe-$(git rev-parse --short HEAD) .
 
 ## Performance and accuracy comparisons
 
-NOTE: this section is largely a work in progress. This reflects the most updated information, but may not reflect the
-current state of the code base at any given time.
+> **Note:** This section is largely a work in progress. This reflects the most updated information, but may not
+> reflect the current state of the code base at any given time.
 
 ### Training accuracy convergence
 
@@ -397,14 +400,21 @@ have currently demonstrated small training runs at 2M context on only 512 H100 G
 
 ## Available models in NGC (Currently NeMo format so first convert to mbridge)
 
+> **Note:** If you would like to use one of the checkpoints that requires FP8 and Hopper (e.g., that does not work
+> on Blackwell), you need to supply both `--mixed-precision-recipe bf16-mixed` to disable the default Megatron FP8
+> recipes, as well as `--vortex-style-fp8` which enables the custom FP8 recipe that supports these models. For the
+> robust NVIDIA fine-tuned variants of these models, you can run with FP8 using the available Megatron recipes. The
+> `evo2_7b` model size does not have these sensitivity issues so it can be executed with Megatron style FP8 or BF16.
+
 | HF Model                                                                                        | BioNeMo Resource Name                                                                                                 | Blackwell FP8 | Blackwell BF16 | Hopper FP8 | Hopper BF16 | Ampere | Notes                                                                                                                                                                                                                                                                    |
 | ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | ------------- | -------------- | ---------- | ----------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | [arcinstitute/savanna_evo2_1b_base](https://huggingface.co/arcinstitute/savanna_evo2_1b_base)   | [evo2/1b-8k:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-1b-8k-nemo2)                     | ✅            | ❌             | ✅         | ❌          | ❌     | Low accuracy on bf16 (eg ampere) GPUs                                                                                                                                                                                                                                    |
 |                                                                                                 | [evo2/1b-8k-bf16:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-1b-8k-bf16-nemo2)           | ✅            | ✅             | ✅         | ✅          | ✅     | Fine-tuned variant of the 1b-8k that supports bf16 as well as fp8, enabling ampere as well as hopper/blackwell.                                                                                                                                                          |
 | [arcinstitute/savanna_evo2_7b_base](https://huggingface.co/arcinstitute/savanna_evo2_7b_base)   | [evo2/7b-8k:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-7b-8k-nemo2)                     | ✅            | ✅             | ✅         | ✅          | ✅     | The original 7b models have good accuracy across the board at bf16 and fp8 across tested hardware.                                                                                                                                                                       |
 | [arcinstitute/savanna_evo2_7b](https://huggingface.co/arcinstitute/savanna_evo2_7b)             | [evo2/7b-1m:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-7b-1m-nemo2)                     | ✅            | ✅             | ✅         | ✅          | ✅     | The original 7b models have good accuracy across the board at bf16 and fp8 across tested hardware.                                                                                                                                                                       |
+| [arcinstitute/savanna_evo2_20b](https://huggingface.co/arcinstitute/savanna_evo2_20b)           |                                                                                                                       | ?             | ?              | ✅         | ❌          | ❌     | The 20b model appears to have the same FP8+Hopper support matrix as the 40b model, but we have not tested all configurations thoroughly yet.                                                                                                                             |
 | [arcinstitute/savanna_evo2_40b_base](https://huggingface.co/arcinstitute/savanna_evo2_40b_base) |                                                                                                                       | ?             | ?              | ?          | ?           | ?      | Unknown, likely has the same support pattern as the 40b-1m row below since this is the same model at an earlier step of training.                                                                                                                                        |
-| [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b)           |                                                                                                                       | ❌            | ❌             | ✅         | ❌          | ❌     | The original 40b-1m context trained model only supports hpper fp8                                                                                                                                                                                                        |
+| [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b)           |                                                                                                                       | ❌            | ❌             | ✅         | ❌          | ❌     | The original 40b-1m context trained model only supports Hopper FP8                                                                                                                                                                                                       |
 |                                                                                                 | [evo2/40b-1m-fp8-bf16:1.0](https://registry.ngc.nvidia.com/orgs/nvidia/teams/clara/models/evo2-40b-1m-fp8-bf16-nemo2) | ✅            | ✅             | ✅         | ✅          | ✅     | A fine-tuned variant of [arcinstitute/savanna_evo2_40b](https://huggingface.co/arcinstitute/savanna_evo2_40b) with broad hardware support (fp8 or bf16 and ampere, hopper, and blackwell have all been tested). The original model only has good accuracy on hopper fp8. |
 
 On the CLI you can access the resources in this table (and others) with:
 
@@ -7,6 +7,7 @@
 *.yaml
 
 # directories created during these notebook runs.
+*_mbridge
 evo2_20b_finetune/
 savanna_20b_download/
 nemo2_evo2_1b_8k/
 
@@ -217,6 +217,72 @@
         }
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Obtaining Evo2 checkpoints\n",
+        "\n",
+        "There are two ways to obtain Evo2 checkpoints for fine-tuning with this recipe:\n",
+        "\n",
+        "#### Option 1: Download NVIDIA fine-tuned variants from NGC\n",
+        "\n",
+        "NVIDIA provides pre-converted NeMo2 checkpoints on NGC that can be downloaded with `download_bionemo_data` on the\n",
+        "command line or `bionemo.core.data.load` in Python. These include variants that have been fine-tuned for broader\n",
+        "hardware compatibility:\n",
+        "\n",
+        "| NGC Resource Name | `--model-size` | Notes |\n",
+        "| --- | --- | --- |\n",
+        "| `evo2/1b-8k-bf16:1.0` | `evo2_1b_base` | Fine-tuned for FP8 and BF16 on Ampere+ GPUs, accuracy parity with original |\n",
+        "| `evo2/7b-1m:1.0` | `evo2_7b` | Original model, robust to FP8/BF16 across GPU architectures |\n",
+        "| `evo2/40b-1m-fp8-bf16:1.0` | `evo2_40b` | Fine-tuned for FP8 and BF16 on Ampere+ GPUs (slight accuracy regression vs. original on Hopper FP8) |\n",
+        "\n",
+        "Run `download_bionemo_data --list-resources` for the full list of available checkpoints. After downloading, convert\n",
+        "to MBridge format with `evo2_convert_nemo2_to_mbridge` as shown in the next section.\n",
+        "\n",
+        "#### Option 2: Convert Arc Institute's Savanna checkpoints from HuggingFace\n",
+        "\n",
+        "The original Evo2 models are published by the Arc Institute in Savanna format on HuggingFace. You can convert these\n",
+        "directly to MBridge format using `evo2_convert_savanna_to_mbridge`, which accepts a HuggingFace repo ID as the\n",
+        "`--savanna-ckpt-path` argument and will download and convert in one step:\n",
+        "\n",
+        "```bash\n",
+        "evo2_convert_savanna_to_mbridge \\\n",
+        "  --savanna-ckpt-path arcinstitute/savanna_evo2_20b \\\n",
+        "  --mbridge-ckpt-dir evo2_20b_mbridge \\\n",
+        "  --model-size evo2_20b \\\n",
+        "  --tokenizer-path tokenizers/nucleotide_fast_tokenizer_512 \\\n",
+        "  --seq-length 8192\n",
+        "```\n",
+        "\n",
+        "Available Savanna checkpoints on HuggingFace:\n",
+        "\n",
+        "| HuggingFace Repo | `--model-size` |\n",
+        "| --- | --- |\n",
+        "| `arcinstitute/savanna_evo2_1b_base` | `evo2_1b_base` |\n",
+        "| `arcinstitute/savanna_evo2_7b` | `evo2_7b` |\n",
+        "| `arcinstitute/savanna_evo2_20b` | `evo2_20b` |\n",
+        "| `arcinstitute/savanna_evo2_40b` | `evo2_40b` |\n",
+        "\n",
+        "It is **strongly recommended** to pass a `--revision` (commit SHA) to ensure reproducibility and guard against\n",
+        "potential checkpoint tampering on HuggingFace. If omitted, the latest commit is used.\n",
+        "\n",
+        "#### FP8 and hardware compatibility\n",
+        "\n",
+        "The original Arc Institute checkpoints for the **1B, 20B, and 40B** models are sensitive to FP8 precision on Hopper\n",
+        "GPUs --- they produce degraded accuracy when run in BF16 or on non-Hopper hardware. The **7B model is the only one**\n",
+        "that is robust across FP8/BF16 and across GPU architectures (Ampere, Hopper, Blackwell).\n",
+        "\n",
+        "NVIDIA has fine-tuned the 1B and 40B models to support both FP8 and BF16 on any Ampere or newer GPU (older\n",
+        "architectures have not been tested). The fine-tuned **1B model achieves accuracy parity** with the original, while\n",
+        "the fine-tuned **40B model has a slight accuracy regression** compared to the original on Hopper FP8. These\n",
+        "fine-tuned variants are available as `evo2/1b-8k-bf16:1.0` and `evo2/40b-1m-fp8-bf16:1.0` on NGC.\n",
+        "\n",
+        "For detailed accuracy and performance comparisons across hardware and precision configurations, see the\n",
+        "[Available models in NGC](../README.md#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge)\n",
+        "table in the recipe README."
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -1134,6 +1200,37 @@
         "\n",
         "```"
       ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Exporting to Vortex format\n",
+        "\n",
+        "After fine-tuning, you may want to export your MBridge checkpoint to Vortex format for use with the\n",
+        "Arc Institute's [evo2](https://github.com/ArcInstitute/evo2) inference package. The\n",
+        "`evo2_export_mbridge_to_vortex` command converts MBridge distributed-checkpoint weights into a single `.pt` file\n",
+        "in the Vortex format expected by ARC's inference code:\n",
+        "\n",
+        "```bash\n",
+        "evo2_export_mbridge_to_vortex \\\n",
+        "  --mbridge-ckpt-dir pretraining_demo/evo2/checkpoints/iter_0000100 \\\n",
+        "  --output-path evo2_1b_vortex.pt \\\n",
+        "  --model-size evo2_1b_base\n",
+        "```\n",
+        "\n",
+        "The `--mbridge-ckpt-dir` should point to a specific iteration directory (e.g., `iter_0000100`) within your\n",
+        "checkpoint directory. The exporter handles MLP weight splitting, Hyena filter pole/residue computation, and\n",
+        "layer-norm key remapping.\n",
+        "\n",
+        "Options:\n",
+        "- `--model-size` — the model architecture key (e.g., `evo2_1b_base`, `evo2_7b`, `evo2_20b`, `evo2_40b`)\n",
+        "- `--no-te` — disable Transformer Engine fused layernorm key mapping (use if the checkpoint was saved without TE)\n",
+        "- `--verbose` / `-v` — enable debug logging\n",
+        "\n",
+        "Once exported, the resulting `.pt` file can be loaded with the\n",
+        "[arcinstitute/evo2](https://github.com/ArcInstitute/evo2) package for inference."
+      ]
     }
   ],
   "metadata": {