Skip to content

Commit 0642c7a

Browse files
committed
add functional test and fix doc
Signed-off-by: ruit <ruit@nvidia.com>
1 parent c6e5495 commit 0642c7a

3 files changed

Lines changed: 134 additions & 35 deletions

File tree

docs/design-docs/checkpointing.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,20 @@ uv run --extra mcore examples/converters/convert_megatron_to_hf.py \
3737
--hf-ckpt-path=<path_to_save_hf_ckpt>
3838
```
3939

40-
## Merging Megatron LoRA Adapter Checkpoints to Hugging Face Format
40+
## Converting Megatron LoRA Adapter Checkpoints to Hugging Face Format
4141

42-
When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. To produce a standalone Hugging Face checkpoint suitable for inference or evaluation, use the LoRA merger script. It loads the base model, applies the LoRA adapter weights on top, and saves the merged result in Hugging Face format.
42+
When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. The `convert_lora_to_hf.py` script supports two export modes:
4343

44-
This script requires Megatron-Core, so make sure to launch with the `mcore` extra:
44+
- **Merged**: fold the LoRA adapter into the base model and export a single standalone HuggingFace checkpoint.
45+
- **Adapter-only**: export only the LoRA adapter weights in [HuggingFace PEFT](https://huggingface.co/docs/peft) format, keeping the base model separate.
46+
47+
This script requires Megatron-Core, so make sure to launch with the `mcore` extra.
48+
49+
### Option A — Merged checkpoint
50+
51+
Loads the base model, applies the LoRA adapter weights on top, and saves the merged result in HuggingFace format. The output can be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).
52+
53+
**Example:**
4554

4655
```sh
4756
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
@@ -51,24 +60,26 @@ uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
5160
--hf-ckpt-path <output_path_for_merged_hf_model>
5261
```
5362

54-
### Arguments
63+
### Option B — Adapter-only (PEFT format)
5564

56-
| Argument | Description |
57-
|---|---|
58-
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). |
59-
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
60-
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
61-
| `--hf-ckpt-path` | Output directory for the merged HuggingFace checkpoint. Must not already exist. |
65+
Exports only the LoRA adapter weights in HuggingFace PEFT format without merging into the base model. This is useful when you want to serve the base model and adapter separately (e.g. with vLLM's LoRA support).
6266

63-
### Example
67+
**Example:**
6468

6569
```sh
66-
# Merge a LoRA adapter trained on Qwen2.5-7B back into a full HF checkpoint
6770
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
68-
--base-ckpt ~/.cache/huggingface/nemo_rl/Qwen/Qwen2.5-7B/iter_0000000 \
69-
--adapter-ckpt results/sft_lora/step_100/policy/weights/iter_0000000 \
70-
--hf-model-name Qwen/Qwen2.5-7B \
71-
--hf-ckpt-path results/sft_lora/merged_hf
71+
--adapter-only \
72+
--adapter-ckpt <path_to_lora_adapter_checkpoint>/iter_0000000 \
73+
--hf-model-name <huggingface_model_name> \
74+
--hf-ckpt-path <output_path_for_hf_adapter>
7275
```
7376

74-
The merged checkpoint can then be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).
77+
### Arguments
78+
79+
| Argument | Description |
80+
|---|---|
81+
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). Required unless `--adapter-only` is set. |
82+
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
83+
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
84+
| `--hf-ckpt-path` | Output directory for the exported HuggingFace checkpoint or adapter. Must not already exist. |
85+
| `--adapter-only` | Export only the LoRA adapter in HuggingFace PEFT format without merging into the base model. |

docs/guides/sft.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,12 @@ For more details on LoRA, see [LoRA: Low-Rank Adaptation of Large Language Model
339339

340340
### Exporting a LoRA Checkpoint to Hugging Face Format
341341

342-
After training with LoRA on the Megatron backend, use the LoRA merger script to fold the adapter weights into the base model and produce a standalone Hugging Face checkpoint for inference or evaluation. See the [Checkpointing documentation](../design-docs/checkpointing.md#merging-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details.
342+
After training with LoRA on the Megatron backend, the `convert_lora_to_hf.py` script supports two export modes:
343+
344+
- **Merged**: fold the adapter into the base model and export a single standalone HuggingFace checkpoint for inference or evaluation.
345+
- **Adapter-only**: export only the adapter weights in HuggingFace PEFT format, keeping the base model separate (e.g. for use with vLLM's LoRA support).
346+
347+
See the [Checkpointing documentation](../design-docs/checkpointing.md#converting-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details and examples.
343348

344349
## Optimizations
345350

tests/functional/test_converter_roundtrip.py

Lines changed: 100 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
_convert_lora_mod = importlib.util.module_from_spec(_spec)
5555
_spec.loader.exec_module(_convert_lora_mod)
5656
merge_lora_to_hf = _convert_lora_mod.merge_lora_to_hf
57+
export_lora_adapter_to_hf = _convert_lora_mod.export_lora_adapter_to_hf
5758

5859

5960
def create_test_config() -> Dict[str, Any]:
@@ -374,7 +375,6 @@ def create_megatron_lora_checkpoint(
374375
model_cfg.fp8_param = False
375376

376377
peft = LoRA(**peft_cfg)
377-
model_cfg.peft = peft
378378
if hasattr(model_cfg, "finalize"):
379379
model_cfg.finalize()
380380
with megatron_cpu_init_context(model_cfg):
@@ -387,22 +387,41 @@ def create_megatron_lora_checkpoint(
387387
for m in megatron_model:
388388
m.requires_grad_(False)
389389

390-
# Apply a small deterministic perturbation to LoRA weights so the
391-
# merge produces something different from the base.
390+
# Save the base model first to create the checkpoint directory structure
391+
# and write run_config.yaml (which contains the "model" key needed by
392+
# load_model_config). Adapter weights are saved separately below.
393+
adapter_dir = os.path.join(temp_dir, "lora_adapter_checkpoint")
394+
save_megatron_model(megatron_model, adapter_dir)
395+
iter_dir = os.path.join(adapter_dir, "iter_0000000")
396+
397+
# Apply LoRA wrappers (same pattern as merge_lora_to_hf) and perturb
398+
# adapter weights so that the merge produces something different from base.
399+
megatron_model = peft(megatron_model, training=False)
400+
gc.collect()
401+
392402
torch.manual_seed(42)
393403
for m in megatron_model:
394404
for name, param in m.named_parameters():
395405
if "lora_" in name or "adapter" in name:
396406
param.data.normal_(0, 0.01)
397407

398-
adapter_dir = os.path.join(temp_dir, "lora_adapter_checkpoint")
399-
save_megatron_model(megatron_model, adapter_dir)
408+
# Save only the adapter weights using dist_checkpointing, which is the
409+
# format that merge_lora_to_hf expects to load from adapter_ckpt.
410+
from megatron.bridge.training.checkpointing import (
411+
_generate_model_state_dict,
412+
apply_peft_adapter_filter_to_state_dict,
413+
)
414+
from megatron.core import dist_checkpointing
400415

401-
# save_megatron_model already writes a run_config.yaml with the
402-
# "model" key. Merge the peft section into it so that both
416+
adapter_sharded_sd = _generate_model_state_dict(megatron_model, {})
417+
adapter_sharded_sd = apply_peft_adapter_filter_to_state_dict(
418+
adapter_sharded_sd, peft
419+
)
420+
dist_checkpointing.save(adapter_sharded_sd, iter_dir)
421+
422+
# Merge the peft section into run_config.yaml so that both
403423
# load_model_config (needs "model") and the LoRA converter
404424
# (needs "peft") can find what they expect.
405-
iter_dir = os.path.join(adapter_dir, "iter_0000000")
406425
run_config_path = os.path.join(iter_dir, "run_config.yaml")
407426
with open(run_config_path) as f:
408427
run_config = yaml.safe_load(f)
@@ -503,6 +522,17 @@ def main():
503522
hf_ckpt_path=lora_merged_hf_path,
504523
)
505524

525+
# Step 7d: Export LoRA adapter only in HuggingFace PEFT format
526+
print("\n" + "=" * 60)
527+
print("STEP 7d: Exporting LoRA adapter only (PEFT format)")
528+
print("=" * 60)
529+
lora_adapter_hf_path = os.path.join(temp_dir, "lora_adapter_hf")
530+
export_lora_adapter_to_hf(
531+
adapter_ckpt=lora_adapter_path,
532+
hf_model_name=model_name,
533+
hf_ckpt_path=lora_adapter_hf_path,
534+
)
535+
506536
# Step 8: Load converted models and compare
507537
print("\n" + "=" * 60)
508538
print("STEP 8: Loading converted models and comparing")
@@ -570,11 +600,11 @@ def main():
570600
)
571601
lora_merged_state_dict = get_model_state_dict(lora_merged_model)
572602

573-
lora_keys = set(lora_merged_state_dict.keys())
574-
assert lora_keys == set(original_state_dict.keys()), (
603+
lora_merged_keys = set(lora_merged_state_dict.keys())
604+
assert lora_merged_keys == set(original_state_dict.keys()), (
575605
f"LoRA merged model key mismatch.\n"
576-
f" Extra: {lora_keys - set(original_state_dict.keys())}\n"
577-
f" Missing: {set(original_state_dict.keys()) - lora_keys}"
606+
f" Extra: {lora_merged_keys - set(original_state_dict.keys())}\n"
607+
f" Missing: {set(original_state_dict.keys()) - lora_merged_keys}"
578608
)
579609
print("✓ LoRA merged model has the expected key structure")
580610

@@ -583,9 +613,9 @@ def main():
583613
any_different = False
584614
for key in original_state_dict:
585615
v_orig = original_state_dict[key]
586-
v_lora = lora_merged_state_dict[key]
616+
v_lora_merged = lora_merged_state_dict[key]
587617
if isinstance(v_orig, torch.Tensor) and not torch.allclose(
588-
v_orig, v_lora, rtol=1e-5, atol=1e-5
618+
v_orig, v_lora_merged, rtol=1e-5, atol=1e-5
589619
):
590620
any_different = True
591621
break
@@ -600,8 +630,59 @@ def main():
600630
with torch.no_grad():
601631
lora_output = lora_merged_model(test_input_lora)
602632
print("✓ LoRA merged model can perform forward pass")
633+
# del lora_merged_model
634+
gc.collect()
603635

604-
del lora_merged_model
636+
# Adapter-only (PEFT) export assertions
637+
print("Verifying adapter-only PEFT export...")
638+
adapter_config_path = os.path.join(lora_adapter_hf_path, "adapter_config.json")
639+
assert os.path.exists(adapter_config_path), (
640+
f"adapter_config.json not found in {lora_adapter_hf_path}"
641+
)
642+
weight_candidates = ["adapter_model.safetensors", "adapter_model.bin"]
643+
weight_file_found = any(
644+
os.path.exists(os.path.join(lora_adapter_hf_path, f))
645+
for f in weight_candidates
646+
)
647+
assert weight_file_found, (
648+
f"No adapter weight file found in {lora_adapter_hf_path}. "
649+
f"Expected one of: {weight_candidates}"
650+
)
651+
print(
652+
"✓ PEFT adapter directory has expected files (adapter_config.json + weights)"
653+
)
654+
655+
# Forward pass using the already-merged model from Step 7c.
656+
test_input_peft = torch.randint(0, 1000, (1, 10))
657+
with torch.no_grad():
658+
lora_merged_model(test_input_peft)
659+
print("✓ LoRA merged model can perform a forward pass")
660+
661+
# Verify the adapter-only export produces the same merged weights as Step 7c
662+
# by calling merge_lora_to_hf again with the same Megatron adapter. This
663+
# avoids tied-weight complications from PeftModel.merge_and_unload().
664+
adapter_only_merged_hf_path = os.path.join(temp_dir, "adapter_only_merged_hf")
665+
merge_lora_to_hf(
666+
base_ckpt=megatron_checkpoint_path,
667+
adapter_ckpt=lora_adapter_path,
668+
hf_model_name=model_name,
669+
hf_ckpt_path=adapter_only_merged_hf_path,
670+
)
671+
adapter_only_merged_model = AutoModelForCausalLM.from_pretrained(
672+
adapter_only_merged_hf_path,
673+
torch_dtype=torch.bfloat16,
674+
trust_remote_code=True,
675+
)
676+
adapter_only_merged_state_dict = get_model_state_dict(adapter_only_merged_model)
677+
assert_state_dicts_equal(
678+
adapter_only_merged_state_dict,
679+
lora_merged_state_dict,
680+
"adapter-only export + merge_lora_to_hf (Step 7d)",
681+
"lora merged (Step 7c)",
682+
)
683+
print("✓ adapter-only merge via merge_lora_to_hf matches Step 7c")
684+
685+
del adapter_only_merged_model, lora_merged_model
605686
gc.collect()
606687

607688
# Verify that both converted models have the expected structure
@@ -632,11 +713,13 @@ def main():
632713
megatron_output = megatron_converted_model(test_input)
633714

634715
print(
635-
"✓ Dtensor V1 and Dtensor V2 DCP, Megatron, and LoRA-merged models can perform forward passes"
716+
"✓ Dtensor V1 and Dtensor V2 DCP, Megatron, and LoRA models can perform forward passes"
636717
)
637718

638719
print("\n" + "=" * 80)
639-
print("✓ ALL TESTS PASSED (DCP v1, DCP v2, Megatron, LoRA merge)!")
720+
print(
721+
"✓ ALL TESTS PASSED (DCP v1, DCP v2, Megatron, LoRA merge, LoRA adapter-only PEFT)!"
722+
)
640723
print("=" * 80)
641724

642725

0 commit comments

Comments
 (0)