Skip to content

Commit e870d31

Browse files
authored
Add Evo2 LoRA fine-tuning notebook (NVIDIA-BioNeMo#1567)
### Description Add Jupyter Notebook to demo LoRA fine-tuning of Evo2 ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks). This label can be used to enforce running all framework tests. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. #### Triggering Code Rabbit AI Review To trigger a code review from code rabbit, comment on a pull request with one of these commands: - @coderabbitai review - Triggers a standard review - @coderabbitai full review - Triggers a comprehensive review See https://docs.coderabbit.ai/reference/review-commands for a full list of commands. ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>
1 parent d7e9c69 commit e870d31

5 files changed

Lines changed: 1977 additions & 9 deletions

File tree

.github/workflows/unit-tests-mbridge-recipes.yaml

Lines changed: 80 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ jobs:
2525
any_changed: ${{ steps.changed-files.outputs.any_changed }}
2626
all_changed_files: ${{ steps.changed-files.outputs.all_changed_files }}
2727
dirs: ${{ steps.set-dirs.outputs.dirs }}
28+
labels: ${{ steps.set-dirs.outputs.labels }}
2829

2930
steps:
3031
- id: get-pr-info
@@ -118,6 +119,14 @@ jobs:
118119
')
119120
echo "dirs=$DIRS_WITH_IMAGES" >> $GITHUB_OUTPUT
120121
122+
# Emit PR labels as a JSON array so downstream jobs can gate on ciflow:* labels.
123+
if [[ "$PR_INFO" != "null" && "$PR_INFO" != "" ]]; then
124+
LABELS=$(echo "$PR_INFO" | jq -c '[.labels[]?.name]' 2>/dev/null || echo "[]")
125+
else
126+
LABELS="[]"
127+
fi
128+
echo "labels=$LABELS" >> $GITHUB_OUTPUT
129+
121130
- name: Show output
122131
run: |
123132
echo "=== Changed Files Analysis ==="
@@ -195,17 +204,83 @@ jobs:
195204
fi
196205
pytest -v .
197206
207+
run-tests-notebooks:
208+
needs: changed-dirs
209+
runs-on: linux-amd64-gpu-l4-latest-1
210+
# Mirrors the framework workflow's notebook-trigger pattern (label-only on PRs,
211+
# auto on merge_group + nightly schedule). Currently scoped to evo2_megatron --
212+
# the only megatron recipe with example notebooks.
213+
if: |
214+
contains(needs.changed-dirs.outputs.dirs, 'bionemo-recipes/recipes/evo2_megatron') &&
215+
(
216+
(github.event_name == 'schedule') ||
217+
(github.event_name == 'merge_group') ||
218+
contains(fromJSON(needs.changed-dirs.outputs.labels || '[]'), 'ciflow:all-recipes') ||
219+
contains(fromJSON(needs.changed-dirs.outputs.labels || '[]'), 'ciflow:notebooks')
220+
)
221+
name: "mbridge-notebook-tests (evo2_megatron)"
222+
container:
223+
image: svcbionemo023/bionemo-framework:pytorch26.04-py3-squashed
224+
options: --shm-size=16G
225+
env:
226+
CI: true
227+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
228+
HF_HOME: /cache/huggingface
229+
BIONEMO_DATA_SOURCE: ngc
230+
231+
steps:
232+
- name: Show GPU info
233+
run: nvidia-smi
234+
- name: Setup proxy cache
235+
uses: nv-gha-runners/setup-proxy-cache@main
236+
237+
- name: Checkout repository
238+
uses: actions/checkout@v4
239+
with:
240+
sparse-checkout: |
241+
bionemo-recipes/recipes/evo2_megatron
242+
sub-packages/bionemo-recipeutils
243+
sub-packages/bionemo-core
244+
sparse-checkout-cone-mode: false
245+
246+
- name: Cache Hugging Face models
247+
uses: actions/cache@v4
248+
with:
249+
path: /cache/huggingface
250+
key: ${{ runner.os }}-huggingface-evo2_megatron-notebooks-${{ github.sha }}
251+
restore-keys: |
252+
${{ runner.os }}-huggingface-evo2_megatron-notebooks-
253+
${{ runner.os }}-huggingface-evo2_megatron-
254+
${{ runner.os }}-huggingface-
255+
256+
- name: Install dependencies
257+
working-directory: bionemo-recipes/recipes/evo2_megatron
258+
run: |
259+
bash .ci_build.sh
260+
source .ci_test_env.sh
261+
pip install nbval
262+
263+
- name: Run notebook tests
264+
working-directory: bionemo-recipes/recipes/evo2_megatron
265+
run: |
266+
source .ci_test_env.sh
267+
FAST_CI_MODE=1 pytest -v -s --nbval-lax -x -p no:python \
268+
examples/lora-fine-tuning-tutorial.ipynb
269+
198270
verify-mbridge-recipe-tests:
199-
needs: unit-tests
271+
needs:
272+
- changed-dirs
273+
- unit-tests
274+
- run-tests-notebooks
200275
runs-on: ubuntu-latest
201276
if: always()
202277
steps:
203-
- name: Check unit-tests matrix status
278+
- name: Check test job statuses
204279
run: |
205-
if [[ "${{ needs.unit-tests.result }}" == "failure" || "${{ needs.unit-tests.result }}" == "cancelled" ]]; then
206-
echo "Some mbridge unit-tests matrix jobs have failed or been cancelled!"
280+
if [[ "${{ contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
281+
echo "Some mbridge test jobs have failed or been cancelled!"
207282
exit 1
208283
else
209-
echo "All mbridge unit-tests matrix jobs have completed successfully or were skipped!"
284+
echo "All mbridge test jobs have completed successfully or were skipped!"
210285
exit 0
211286
fi

bionemo-recipes/recipes/evo2_megatron/README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,10 @@ freezes the entire base model and attaches low-rank adapter matrices to the
276276
modules you specify, with an optional escape hatch to keep selected modules
277277
fully trainable.
278278

279+
> **End-to-end example:** see [`examples/lora-fine-tuning-tutorial.ipynb`](examples/lora-fine-tuning-tutorial.ipynb)
280+
> for a runnable walkthrough that fine-tunes the 1B checkpoint for splice-site
281+
> classification, including a head-only baseline for comparison.
282+
279283
### Basic usage
280284

281285
Add `--lora-finetune` to any `train_evo2` command alongside a checkpoint:
@@ -509,10 +513,11 @@ checkpoint.
509513

510514
The `examples/` directory contains Jupyter notebooks demonstrating common workflows:
511515

512-
| Notebook | Description |
513-
| ---------------------------- | ------------------------------------------------------ |
514-
| `zeroshot_brca1.ipynb` | Zero-shot BRCA1 variant effect prediction with Evo2 1B |
515-
| `fine-tuning-tutorial.ipynb` | Fine-tune the 1B checkpoint on human chromosomes |
516+
| Notebook | Description |
517+
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
518+
| `zeroshot_brca1.ipynb` | Zero-shot BRCA1 variant effect prediction with Evo2 1B |
519+
| `fine-tuning-tutorial.ipynb` | Fine-tune the 1B checkpoint on human chromosomes |
520+
| `lora-fine-tuning-tutorial.ipynb` | LoRA fine-tune the 1B checkpoint for splice-site classification, with a head-only baseline for trainable-param savings |
516521

517522
## Docker build
518523

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-Apache2
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# nbval hardcodes a 5s "no iopub message" timeout that fires when a cell
17+
# subprocess (torchrun) is silent during CUDA setup. Bump it for these notebooks.
18+
try:
19+
import nbval.plugin
20+
except ImportError:
21+
pass
22+
else:
23+
_original_init = nbval.plugin.IPyNbCell.__init__
24+
25+
def _patched_init(self, *args, **kwargs):
26+
_original_init(self, *args, **kwargs)
27+
self.output_timeout = 300
28+
29+
nbval.plugin.IPyNbCell.__init__ = _patched_init

0 commit comments

Comments
 (0)