You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bionemo-recipes/recipes/evo2_megatron/README.md
+131Lines changed: 131 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -266,6 +266,137 @@ Options:
266
266
-`--mixed-precision-recipe` — precision recipe (default: `bf16_mixed`). NOTE for checkpoints sensitive to FP8 and Hopper you need to run with `--mixed-precision-recipe bf16-mixed` and also supply the `--vortex-style-fp8` option for prediction/inference, you should not use the fp8 recipe for those models, as they are sensitive to the exact FP8 configuration they were trained with in savanna, see the [table under the section on available nvidia checkpoints for download from NGC](#available-models-in-ngc-currently-nemo-format-so-first-convert-to-mbridge).
267
267
-`--verbose` / `-v` — enable debug logging.
268
268
269
+
## LoRA Fine-tuning
270
+
271
+
`Evo2LoRA` is a LoRA variant built on top of the Megatron Bridge PEFT stack. It
272
+
freezes the entire base model and attaches low-rank adapter matrices to the
273
+
modules you specify, with an optional escape hatch to keep selected modules
274
+
fully trainable.
275
+
276
+
### Basic usage
277
+
278
+
Add `--lora-finetune` to any `train_evo2` command alongside a checkpoint:
|`False`|`word_embeddings` only | LoRA adapter on the embedding lookup. Output projection weight is independent and frozen by default. |
359
+
|`False`|`output_layer` only | LoRA adapter on the output projection. Embedding weight is independent and frozen by default. |
360
+
|`False`| both | Independent LoRA adapters on both layers. Both base weights are frozen. |
361
+
|`True`|`word_embeddings` only |**Error.** Applying LoRA to only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
362
+
|`True`|`output_layer` only |**Error.** Applying LoRA to only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
363
+
|`True`| both |**Not yet implemented.** Symmetric LoRA on a tied weight pair requires a transpose-view adapter mechanism (see note below). This combination is accepted as a design goal and will raise a `NotImplementedError` until it is implemented. |
364
+
365
+
> **Symmetric LoRA on tied weights (future work).** When both `word_embeddings`
366
+
> and `output_layer` are targeted with weight tying enabled, the correct
367
+
> approach is to apply a single LoRA decomposition to the shared weight and
368
+
> expose it symmetrically to both the embedding lookup and the output
369
+
> projection — analogous to HuggingFace PEFT's `ensure_weight_tying` mechanism,
370
+
> which shares the adapter parameters via transposed views. This is not yet
371
+
> implemented.
372
+
373
+
#### `--lora-skip-freeze-modules` and weight tying
374
+
375
+
|`share_embeddings_and_output_weights`|`--lora-skip-freeze-modules` includes | Behavior |
|`False`|`word_embeddings` only | Embedding weight is fully trainable. Output projection is frozen unless also listed. |
378
+
|`False`|`output_layer` only | Output projection weight is fully trainable. Embedding is frozen unless also listed. |
379
+
|`False`| both | Both weights are fully trainable. |
380
+
|`True`|`word_embeddings` only |**Error.** Listing only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
381
+
|`True`|`output_layer` only |**Error.** Listing only one side of a tied pair breaks the weight-tying invariant. Both must be listed together. |
382
+
|`True`| both | Accepted. The shared weight (owned by `word_embeddings`) is unfrozen, so both the embedding lookup and the output projection train via the same tensor. **Note:** because `output_layer` allocates no weight of its own, gradient flow through the output projection path back to the shared tensor is a TODO item and may not be fully wired in all pipeline-parallel configurations. |
383
+
384
+
#### Recommendations
385
+
386
+
-**Default (vocabulary weights frozen, LoRA on inner layers):** omit both
387
+
embedding/output modules from both flags. The default `--lora-target-modules`
388
+
does not touch either layer.
389
+
-**Fully fine-tune the shared vocabulary weight alongside LoRA on inner
390
+
layers:** list **both**`word_embeddings` and `output_layer` in
0 commit comments