You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bionemo-recipes/recipes/evo2_megatron/README.md
+129Lines changed: 129 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -266,6 +266,135 @@ Options:
266
266
-`--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`). NOTE for checkpoints sensitive to FP8 and Hopper you need to run with `--mixed-precision-recipe bf16-mixed` and also supply the `--vortex-style-fp8` option for prediction/inference, you should not use the fp8 recipe for those models, as they are sensitive to the exact FP8 configuration they were trained with in savanna, see the [table under the section on available nvidia checkpoints for download from NGC](#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge).
267
267
-`--verbose` / `-v` — enable debug logging.
268
268
269
+
## LoRA Fine-tuning
270
+
271
+
`Evo2LoRA` is a LoRA variant built on top of the Megatron Bridge PEFT stack. It
272
+
freezes the entire base model and attaches low-rank adapter matrices to the
273
+
modules you specify, with an optional escape hatch to keep selected modules
274
+
fully trainable.
275
+
276
+
### Basic usage
277
+
278
+
Add `--lora-finetune` to any `train_evo2` command alongside a checkpoint:
|`False`|`word_embeddings` only | Embedding weight is fully trainable. Output projection is frozen unless also listed. |
371
+
|`False`|`output_layer` only | Output projection weight is fully trainable. Embedding is frozen unless also listed. |
372
+
|`False`| both | Both weights are fully trainable. |
373
+
|`True`|`word_embeddings` only |**Error.** Listing only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
374
+
|`True`|`output_layer` only |**Error.** Listing only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
375
+
|`True`| both | Accepted. The shared weight (owned by `word_embeddings`) is unfrozen, so both the embedding lookup and the output projection train via the same tensor. **Note:** because `output_layer` allocates no weight of its own, gradient flow through the output projection path back to the shared tensor is a TODO item and may not be fully wired in all pipeline-parallel configurations. |
376
+
377
+
#### Recommendations
378
+
379
+
-**Default (vocabulary weights frozen, LoRA on inner layers):** omit both
380
+
embedding/output modules from both flags. The default `--lora-target-modules`
381
+
does not touch either layer.
382
+
-**Apply LoRA to the output projection (untied models only):** list
383
+
`output_layer` in `--lora-target-modules` and set
384
+
`share_embeddings_and_output_weights=False` in the model config.
385
+
-**Fully fine-tune the vocabulary weight alongside LoRA on inner layers:**
386
+
list **both**`word_embeddings` and `output_layer` in
0 commit comments