Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 31 additions & 17 deletions docs/design-docs/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,20 @@ uv run --extra mcore examples/converters/convert_megatron_to_hf.py \
--hf-ckpt-path=<path_to_save_hf_ckpt>
```

## Merging Megatron LoRA Adapter Checkpoints to Hugging Face Format
## Converting Megatron LoRA Adapter Checkpoints to Hugging Face Format

When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. To produce a standalone Hugging Face checkpoint suitable for inference or evaluation, use the LoRA merger script. It loads the base model, applies the LoRA adapter weights on top, and saves the merged result in Hugging Face format.
When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. The `convert_lora_to_hf.py` script supports two export modes:

This script requires Megatron-Core, so make sure to launch with the `mcore` extra:
- **Merged**: fold the LoRA adapter into the base model and export a single standalone HuggingFace checkpoint.
- **Adapter-only**: export only the LoRA adapter weights in [HuggingFace PEFT](https://huggingface.co/docs/peft) format, keeping the base model separate.

This script requires Megatron-Core, so make sure to launch with the `mcore` extra.

### Option A — Merged checkpoint

Loads the base model, applies the LoRA adapter weights on top, and saves the merged result in HuggingFace format. The output can be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).

**Example:**

```sh
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
Expand All @@ -51,24 +60,29 @@ uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
--hf-ckpt-path <output_path_for_merged_hf_model>
```

### Arguments
### Option B — Adapter-only (PEFT format)

| Argument | Description |
|---|---|
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). |
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
| `--hf-ckpt-path` | Output directory for the merged HuggingFace checkpoint. Must not already exist. |
Exports only the LoRA adapter weights in HuggingFace PEFT format without merging into the base model. This is useful when you want to serve the base model and adapter separately (e.g. with vLLM's LoRA support).

Although the output is adapter-only, the converter still needs `--base-ckpt` to reconstruct the Megatron model, apply the LoRA modules, and load the adapter weights before exporting them to PEFT format.

### Example
**Example:**

```sh
# Merge a LoRA adapter trained on Qwen2.5-7B back into a full HF checkpoint
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
--base-ckpt ~/.cache/huggingface/nemo_rl/Qwen/Qwen2.5-7B/iter_0000000 \
--adapter-ckpt results/sft_lora/step_100/policy/weights/iter_0000000 \
--hf-model-name Qwen/Qwen2.5-7B \
--hf-ckpt-path results/sft_lora/merged_hf
--base-ckpt <path_to_base_megatron_checkpoint>/iter_0000000 \
--adapter-only \
--adapter-ckpt <path_to_lora_adapter_checkpoint>/iter_0000000 \
--hf-model-name <huggingface_model_name> \
--hf-ckpt-path <output_path_for_hf_adapter>
```

The merged checkpoint can then be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).
### Arguments

| Argument | Description |
|---|---|
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). Required for both merged and adapter-only export. |
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
| `--hf-ckpt-path` | Output directory for the exported HuggingFace checkpoint or adapter. Must not already exist. |
| `--adapter-only` | Export only the LoRA adapter in HuggingFace PEFT format without merging into the base model. |
7 changes: 6 additions & 1 deletion docs/guides/sft.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,12 @@ For more details on LoRA, see [LoRA: Low-Rank Adaptation of Large Language Model

### Exporting a LoRA Checkpoint to Hugging Face Format

After training with LoRA on the Megatron backend, use the LoRA merger script to fold the adapter weights into the base model and produce a standalone Hugging Face checkpoint for inference or evaluation. See the [Checkpointing documentation](../design-docs/checkpointing.md#merging-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details.
After training with LoRA on the Megatron backend, the `convert_lora_to_hf.py` script supports two export modes:

- **Merged**: fold the adapter into the base model and export a single standalone HuggingFace checkpoint for inference or evaluation.
- **Adapter-only**: export only the adapter weights in HuggingFace PEFT format, keeping the base model separate (e.g. for use with vLLM's LoRA support).

See the [Checkpointing documentation](../design-docs/checkpointing.md#converting-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details and examples.

## Optimizations

Expand Down
2 changes: 1 addition & 1 deletion docs/nsys-profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ NeMo RL supports Nsight profiling for Ray workers through environment variable p

## Prerequisites

* Install NVIDIA Nsight Systems (`nsys`) on the compute nodes where workers will run. For Ubuntu installation instructions, see the [NVIDIA Nsight Systems Installation Guide](https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation)).
* Install NVIDIA Nsight Systems (`nsys`) on the compute nodes where workers will run. For Ubuntu installation instructions, see the [NVIDIA Nsight Systems Installation Guide](https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation).

**Note: If you're using NeMo RL containers, `nsys` is already installed.**

Expand Down
209 changes: 165 additions & 44 deletions examples/converters/convert_lora_to_hf.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,36 @@
"""Merge a Megatron LoRA adapter checkpoint with its base model and export to HuggingFace format.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

This is helpful when one wants to train the model using Megatron with LoRA adapter and then convert it to HuggingFace format
for inference and evaluation.

"""Export a Megatron LoRA adapter checkpoint to HuggingFace format.

This script supports two workflows:

1. Merge the base model and LoRA adapter, then export a standard HuggingFace model.
2. Export only the LoRA adapter to a HuggingFace PEFT-compatible directory without merging.

Usage (requires mcore extra):

# Export adapter only (recommended when you want PEFT format)
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
--base-ckpt ~/.cache/huggingface/nemo_rl/zai-org/GLM-5/iter_0000000 \
--adapter-only \
--adapter-ckpt results/dpo_glm5/step_5/policy/weights/iter_0000000 \
--hf-model-name zai-org/GLM-5 \
--hf-ckpt-path ./hf_lora_adapter

# Merge base model + adapter and export a full HF checkpoint
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
--base-ckpt ~/.cache/huggingface/nemo_rl/zai-org/GLM-5/iter_0000000 \
--adapter-ckpt results/dpo_glm5/step_5/policy/weights/iter_0000000 \
Expand All @@ -18,6 +44,7 @@
import logging
import os
import sys
from contextlib import contextmanager

import yaml

Expand All @@ -29,13 +56,13 @@

def parse_args():
parser = argparse.ArgumentParser(
description="Merge Megatron LoRA adapter with base model and export to HF"
description="Export Megatron LoRA checkpoint to HuggingFace format"
)
parser.add_argument(
"--base-ckpt",
type=str,
required=True,
help="Path to base model Megatron checkpoint (iter_XXXXXXX directory)",
help="Path to base model Megatron checkpoint (iter_XXXXXXX directory). Required for both merged and adapter-only export.",
)
parser.add_argument(
"--adapter-ckpt",
Expand All @@ -53,46 +80,37 @@ def parse_args():
"--hf-ckpt-path",
type=str,
required=True,
help="Output path for merged HF checkpoint",
help="Output path for the exported HF checkpoint or adapter directory",
)
parser.add_argument(
"--adapter-only",
action="store_true",
help="Export only the LoRA adapter in HuggingFace PEFT format without merging into the base model.",
)
return parser.parse_args()
args = parser.parse_args()
return args


def merge_lora_to_hf(
@contextmanager
def _build_megatron_model_with_lora(
base_ckpt: str,
adapter_ckpt: str,
hf_model_name: str,
hf_ckpt_path: str,
) -> str:
"""Merge a Megatron LoRA adapter with its base model and export to HuggingFace format.

Args:
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
Must contain a ``run_config.yaml`` with a ``peft`` section.
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
hf_ckpt_path: Output directory for the merged HuggingFace checkpoint.

Returns:
The *hf_ckpt_path* that was written to.

Raises:
FileExistsError: If *hf_ckpt_path* already exists.
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
"""
if os.path.exists(hf_ckpt_path):
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")

):
"""Build a single-rank Megatron model with LoRA weights loaded for export flows."""
from megatron.bridge import AutoBridge
from megatron.bridge.peft.lora import LoRA
from megatron.bridge.training.checkpointing import (
_generate_model_state_dict,
_load_model_weights_from_checkpoint,
apply_peft_adapter_filter_to_state_dict,
)
from megatron.bridge.training.model_load_save import (
load_model_config,
megatron_cpu_init_context,
temporary_distributed_context,
)
from megatron.core import dist_checkpointing

bridge = AutoBridge.from_hf_pretrained(hf_model_name, trust_remote_code=True)

Expand Down Expand Up @@ -128,6 +146,7 @@ def merge_lora_to_hf(
model_cfg.hierarchical_context_parallel_sizes = None
model_cfg.fp8 = None
model_cfg.fp8_param = False
model_cfg.gradient_accumulation_fusion = False

peft = LoRA(
target_modules=peft_section.get("target_modules", []),
Expand All @@ -140,9 +159,10 @@ def merge_lora_to_hf(
lora_B_init_method=peft_section.get("lora_B_init_method", "zero"),
a2a_experimental=peft_section.get("a2a_experimental", False),
)
model_cfg.peft = peft

logger.info("Building model with LoRA wrappers on CPU...")
logger.info(
"Building base model on CPU (LoRA wrappers applied after base weights are loaded)..."
)
if hasattr(model_cfg, "finalize"):
model_cfg.finalize()
with megatron_cpu_init_context(model_cfg):
Expand All @@ -159,10 +179,67 @@ def merge_lora_to_hf(
_load_model_weights_from_checkpoint(base_ckpt, megatron_model, strict=False)
gc.collect()

logger.info("Applying LoRA wrappers to model...")
megatron_model = peft(megatron_model, training=False)
gc.collect()

logger.info(f"Loading LoRA adapter from {adapter_ckpt}...")
_load_model_weights_from_checkpoint(adapter_ckpt, megatron_model, strict=False)
adapter_sharded_state_dict = _generate_model_state_dict(megatron_model, {})
adapter_sharded_state_dict = apply_peft_adapter_filter_to_state_dict(
adapter_sharded_state_dict, peft
)
loaded_adapter_state_dict = dist_checkpointing.load(
adapter_sharded_state_dict, adapter_ckpt
)
model_key = (
"model"
if "model" in loaded_adapter_state_dict
else next(k for k in loaded_adapter_state_dict if k.startswith("model"))
)
for m in megatron_model:
m.load_state_dict(loaded_adapter_state_dict[model_key], strict=False)
gc.collect()

try:
yield bridge, megatron_model, peft
finally:
del megatron_model
gc.collect()
logger.info("Freed model memory.")
sys.stderr.flush()
sys.stdout.flush()


def merge_lora_to_hf(
base_ckpt: str,
adapter_ckpt: str,
hf_model_name: str,
hf_ckpt_path: str,
) -> str:
"""Merge a Megatron LoRA adapter with its base model and export to HuggingFace format.

Args:
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
Must contain a ``run_config.yaml`` with a ``peft`` section.
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
hf_ckpt_path: Output directory for the merged HuggingFace checkpoint.

Returns:
The *hf_ckpt_path* that was written to.

Raises:
FileExistsError: If *hf_ckpt_path* already exists.
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
"""
if os.path.exists(hf_ckpt_path):
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")

with _build_megatron_model_with_lora(
base_ckpt=base_ckpt,
adapter_ckpt=adapter_ckpt,
hf_model_name=hf_model_name,
) as (bridge, megatron_model, _):
logger.info("Saving merged model in HuggingFace format...")
bridge.save_hf_pretrained(
megatron_model,
Expand All @@ -171,24 +248,68 @@ def merge_lora_to_hf(
merge_adapter_weights=True,
)

del megatron_model
gc.collect()
logger.info("Freed model memory.")
sys.stderr.flush()
sys.stdout.flush()

logger.info(f"Done! Merged HF model saved to: {hf_ckpt_path}")
return hf_ckpt_path


def export_lora_adapter_to_hf(
base_ckpt: str,
adapter_ckpt: str,
hf_model_name: str,
hf_ckpt_path: str,
) -> str:
"""Export a Megatron LoRA adapter to HuggingFace PEFT format.

Args:
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
Must contain a ``run_config.yaml`` with a ``peft`` section.
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
hf_ckpt_path: Output directory for the HuggingFace PEFT adapter checkpoint.

Returns:
The *hf_ckpt_path* that was written to.

Raises:
FileExistsError: If *hf_ckpt_path* already exists.
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
"""
if os.path.exists(hf_ckpt_path):
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")

with _build_megatron_model_with_lora(
base_ckpt=base_ckpt,
adapter_ckpt=adapter_ckpt,
hf_model_name=hf_model_name,
) as (bridge, megatron_model, peft):
logger.info("Saving LoRA adapter in HuggingFace PEFT format...")
bridge.save_hf_adapter(
megatron_model,
hf_ckpt_path,
peft_config=peft,
base_model_name_or_path=hf_model_name,
)

logger.info(f"Done! HF adapter saved to: {hf_ckpt_path}")
return hf_ckpt_path


def main():
args = parse_args()
merge_lora_to_hf(
base_ckpt=args.base_ckpt,
adapter_ckpt=args.adapter_ckpt,
hf_model_name=args.hf_model_name,
hf_ckpt_path=args.hf_ckpt_path,
)
if args.adapter_only:
export_lora_adapter_to_hf(
base_ckpt=args.base_ckpt,
adapter_ckpt=args.adapter_ckpt,
hf_model_name=args.hf_model_name,
hf_ckpt_path=args.hf_ckpt_path,
)
else:
merge_lora_to_hf(
base_ckpt=args.base_ckpt,
adapter_ckpt=args.adapter_ckpt,
hf_model_name=args.hf_model_name,
hf_ckpt_path=args.hf_ckpt_path,
)


if __name__ == "__main__":
Expand Down
Loading
Loading