Skip to content

Commit f1835bb

Browse files
authored
feat: support convert_lora_to_hf without merge (#2234)
Signed-off-by: ruit <ruit@nvidia.com>
1 parent 92b1ff0 commit f1835bb

5 files changed

Lines changed: 298 additions & 80 deletions

File tree

docs/design-docs/checkpointing.md

Lines changed: 31 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,20 @@ uv run --extra mcore examples/converters/convert_megatron_to_hf.py \
3737
--hf-ckpt-path=<path_to_save_hf_ckpt>
3838
```
3939

40-
## Merging Megatron LoRA Adapter Checkpoints to Hugging Face Format
40+
## Converting Megatron LoRA Adapter Checkpoints to Hugging Face Format
4141

42-
When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. To produce a standalone Hugging Face checkpoint suitable for inference or evaluation, use the LoRA merger script. It loads the base model, applies the LoRA adapter weights on top, and saves the merged result in Hugging Face format.
42+
When training with [LoRA (Low-Rank Adaptation)](../guides/sft.md#lora-configuration) on the Megatron backend, the resulting checkpoint contains only the adapter weights alongside the base model configuration. The `convert_lora_to_hf.py` script supports two export modes:
4343

44-
This script requires Megatron-Core, so make sure to launch with the `mcore` extra:
44+
- **Merged**: fold the LoRA adapter into the base model and export a single standalone HuggingFace checkpoint.
45+
- **Adapter-only**: export only the LoRA adapter weights in [HuggingFace PEFT](https://huggingface.co/docs/peft) format, keeping the base model separate.
46+
47+
This script requires Megatron-Core, so make sure to launch with the `mcore` extra.
48+
49+
### Option A — Merged checkpoint
50+
51+
Loads the base model, applies the LoRA adapter weights on top, and saves the merged result in HuggingFace format. The output can be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).
52+
53+
**Example:**
4554

4655
```sh
4756
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
@@ -51,24 +60,29 @@ uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
5160
--hf-ckpt-path <output_path_for_merged_hf_model>
5261
```
5362

54-
### Arguments
63+
### Option B — Adapter-only (PEFT format)
5564

56-
| Argument | Description |
57-
|---|---|
58-
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). |
59-
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
60-
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
61-
| `--hf-ckpt-path` | Output directory for the merged HuggingFace checkpoint. Must not already exist. |
65+
Exports only the LoRA adapter weights in HuggingFace PEFT format without merging into the base model. This is useful when you want to serve the base model and adapter separately (e.g. with vLLM's LoRA support).
66+
67+
Although the output is adapter-only, the converter still needs `--base-ckpt` to reconstruct the Megatron model, apply the LoRA modules, and load the adapter weights before exporting them to PEFT format.
6268

63-
### Example
69+
**Example:**
6470

6571
```sh
66-
# Merge a LoRA adapter trained on Qwen2.5-7B back into a full HF checkpoint
6772
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
68-
--base-ckpt ~/.cache/huggingface/nemo_rl/Qwen/Qwen2.5-7B/iter_0000000 \
69-
--adapter-ckpt results/sft_lora/step_100/policy/weights/iter_0000000 \
70-
--hf-model-name Qwen/Qwen2.5-7B \
71-
--hf-ckpt-path results/sft_lora/merged_hf
73+
--base-ckpt <path_to_base_megatron_checkpoint>/iter_0000000 \
74+
--adapter-only \
75+
--adapter-ckpt <path_to_lora_adapter_checkpoint>/iter_0000000 \
76+
--hf-model-name <huggingface_model_name> \
77+
--hf-ckpt-path <output_path_for_hf_adapter>
7278
```
7379

74-
The merged checkpoint can then be used directly with `AutoModelForCausalLM.from_pretrained` or passed to the [evaluation pipeline](../guides/eval.md).
80+
### Arguments
81+
82+
| Argument | Description |
83+
|---|---|
84+
| `--base-ckpt` | Path to the base model's Megatron checkpoint directory (the `iter_XXXXXXX` folder). Required for both merged and adapter-only export. |
85+
| `--adapter-ckpt` | Path to the LoRA adapter's Megatron checkpoint directory (must contain a `run_config.yaml` with a `peft` section). |
86+
| `--hf-model-name` | HuggingFace model identifier used to resolve the model architecture and tokenizer (e.g. `Qwen/Qwen2.5-7B`). |
87+
| `--hf-ckpt-path` | Output directory for the exported HuggingFace checkpoint or adapter. Must not already exist. |
88+
| `--adapter-only` | Export only the LoRA adapter in HuggingFace PEFT format without merging into the base model. |

docs/guides/sft.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,12 @@ For more details on LoRA, see [LoRA: Low-Rank Adaptation of Large Language Model
339339

340340
### Exporting a LoRA Checkpoint to Hugging Face Format
341341

342-
After training with LoRA on the Megatron backend, use the LoRA merger script to fold the adapter weights into the base model and produce a standalone Hugging Face checkpoint for inference or evaluation. See the [Checkpointing documentation](../design-docs/checkpointing.md#merging-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details.
342+
After training with LoRA on the Megatron backend, the `convert_lora_to_hf.py` script supports two export modes:
343+
344+
- **Merged**: fold the adapter into the base model and export a single standalone HuggingFace checkpoint for inference or evaluation.
345+
- **Adapter-only**: export only the adapter weights in HuggingFace PEFT format, keeping the base model separate (e.g. for use with vLLM's LoRA support).
346+
347+
See the [Checkpointing documentation](../design-docs/checkpointing.md#converting-megatron-lora-adapter-checkpoints-to-hugging-face-format) for full usage details and examples.
343348

344349
## Optimizations
345350

docs/nsys-profiling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ NeMo RL supports Nsight profiling for Ray workers through environment variable p
66

77
## Prerequisites
88

9-
* Install NVIDIA Nsight Systems (`nsys`) on the compute nodes where workers will run. For Ubuntu installation instructions, see the [NVIDIA Nsight Systems Installation Guide](https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation)).
9+
* Install NVIDIA Nsight Systems (`nsys`) on the compute nodes where workers will run. For Ubuntu installation instructions, see the [NVIDIA Nsight Systems Installation Guide](https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation).
1010

1111
**Note: If you're using NeMo RL containers, `nsys` is already installed.**
1212

examples/converters/convert_lora_to_hf.py

Lines changed: 165 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,36 @@
1-
"""Merge a Megatron LoRA adapter checkpoint with its base model and export to HuggingFace format.
1+
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
214

3-
This is helpful when one wants to train the model using Megatron with LoRA adapter and then convert it to HuggingFace format
4-
for inference and evaluation.
15+
16+
"""Export a Megatron LoRA adapter checkpoint to HuggingFace format.
17+
18+
This script supports two workflows:
19+
20+
1. Merge the base model and LoRA adapter, then export a standard HuggingFace model.
21+
2. Export only the LoRA adapter to a HuggingFace PEFT-compatible directory without merging.
522
623
Usage (requires mcore extra):
724
25+
# Export adapter only (recommended when you want PEFT format)
26+
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
27+
--base-ckpt ~/.cache/huggingface/nemo_rl/zai-org/GLM-5/iter_0000000 \
28+
--adapter-only \
29+
--adapter-ckpt results/dpo_glm5/step_5/policy/weights/iter_0000000 \
30+
--hf-model-name zai-org/GLM-5 \
31+
--hf-ckpt-path ./hf_lora_adapter
32+
33+
# Merge base model + adapter and export a full HF checkpoint
834
uv run --extra mcore python examples/converters/convert_lora_to_hf.py \
935
--base-ckpt ~/.cache/huggingface/nemo_rl/zai-org/GLM-5/iter_0000000 \
1036
--adapter-ckpt results/dpo_glm5/step_5/policy/weights/iter_0000000 \
@@ -18,6 +44,7 @@
1844
import logging
1945
import os
2046
import sys
47+
from contextlib import contextmanager
2148

2249
import yaml
2350

@@ -29,13 +56,13 @@
2956

3057
def parse_args():
3158
parser = argparse.ArgumentParser(
32-
description="Merge Megatron LoRA adapter with base model and export to HF"
59+
description="Export Megatron LoRA checkpoint to HuggingFace format"
3360
)
3461
parser.add_argument(
3562
"--base-ckpt",
3663
type=str,
3764
required=True,
38-
help="Path to base model Megatron checkpoint (iter_XXXXXXX directory)",
65+
help="Path to base model Megatron checkpoint (iter_XXXXXXX directory). Required for both merged and adapter-only export.",
3966
)
4067
parser.add_argument(
4168
"--adapter-ckpt",
@@ -53,46 +80,37 @@ def parse_args():
5380
"--hf-ckpt-path",
5481
type=str,
5582
required=True,
56-
help="Output path for merged HF checkpoint",
83+
help="Output path for the exported HF checkpoint or adapter directory",
84+
)
85+
parser.add_argument(
86+
"--adapter-only",
87+
action="store_true",
88+
help="Export only the LoRA adapter in HuggingFace PEFT format without merging into the base model.",
5789
)
58-
return parser.parse_args()
90+
args = parser.parse_args()
91+
return args
5992

6093

61-
def merge_lora_to_hf(
94+
@contextmanager
95+
def _build_megatron_model_with_lora(
6296
base_ckpt: str,
6397
adapter_ckpt: str,
6498
hf_model_name: str,
65-
hf_ckpt_path: str,
66-
) -> str:
67-
"""Merge a Megatron LoRA adapter with its base model and export to HuggingFace format.
68-
69-
Args:
70-
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
71-
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
72-
Must contain a ``run_config.yaml`` with a ``peft`` section.
73-
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
74-
hf_ckpt_path: Output directory for the merged HuggingFace checkpoint.
75-
76-
Returns:
77-
The *hf_ckpt_path* that was written to.
78-
79-
Raises:
80-
FileExistsError: If *hf_ckpt_path* already exists.
81-
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
82-
"""
83-
if os.path.exists(hf_ckpt_path):
84-
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")
85-
99+
):
100+
"""Build a single-rank Megatron model with LoRA weights loaded for export flows."""
86101
from megatron.bridge import AutoBridge
87102
from megatron.bridge.peft.lora import LoRA
88103
from megatron.bridge.training.checkpointing import (
104+
_generate_model_state_dict,
89105
_load_model_weights_from_checkpoint,
106+
apply_peft_adapter_filter_to_state_dict,
90107
)
91108
from megatron.bridge.training.model_load_save import (
92109
load_model_config,
93110
megatron_cpu_init_context,
94111
temporary_distributed_context,
95112
)
113+
from megatron.core import dist_checkpointing
96114

97115
bridge = AutoBridge.from_hf_pretrained(hf_model_name, trust_remote_code=True)
98116

@@ -128,6 +146,7 @@ def merge_lora_to_hf(
128146
model_cfg.hierarchical_context_parallel_sizes = None
129147
model_cfg.fp8 = None
130148
model_cfg.fp8_param = False
149+
model_cfg.gradient_accumulation_fusion = False
131150

132151
peft = LoRA(
133152
target_modules=peft_section.get("target_modules", []),
@@ -140,9 +159,10 @@ def merge_lora_to_hf(
140159
lora_B_init_method=peft_section.get("lora_B_init_method", "zero"),
141160
a2a_experimental=peft_section.get("a2a_experimental", False),
142161
)
143-
model_cfg.peft = peft
144162

145-
logger.info("Building model with LoRA wrappers on CPU...")
163+
logger.info(
164+
"Building base model on CPU (LoRA wrappers applied after base weights are loaded)..."
165+
)
146166
if hasattr(model_cfg, "finalize"):
147167
model_cfg.finalize()
148168
with megatron_cpu_init_context(model_cfg):
@@ -159,10 +179,67 @@ def merge_lora_to_hf(
159179
_load_model_weights_from_checkpoint(base_ckpt, megatron_model, strict=False)
160180
gc.collect()
161181

182+
logger.info("Applying LoRA wrappers to model...")
183+
megatron_model = peft(megatron_model, training=False)
184+
gc.collect()
185+
162186
logger.info(f"Loading LoRA adapter from {adapter_ckpt}...")
163-
_load_model_weights_from_checkpoint(adapter_ckpt, megatron_model, strict=False)
187+
adapter_sharded_state_dict = _generate_model_state_dict(megatron_model, {})
188+
adapter_sharded_state_dict = apply_peft_adapter_filter_to_state_dict(
189+
adapter_sharded_state_dict, peft
190+
)
191+
loaded_adapter_state_dict = dist_checkpointing.load(
192+
adapter_sharded_state_dict, adapter_ckpt
193+
)
194+
model_key = (
195+
"model"
196+
if "model" in loaded_adapter_state_dict
197+
else next(k for k in loaded_adapter_state_dict if k.startswith("model"))
198+
)
199+
for m in megatron_model:
200+
m.load_state_dict(loaded_adapter_state_dict[model_key], strict=False)
164201
gc.collect()
165202

203+
try:
204+
yield bridge, megatron_model, peft
205+
finally:
206+
del megatron_model
207+
gc.collect()
208+
logger.info("Freed model memory.")
209+
sys.stderr.flush()
210+
sys.stdout.flush()
211+
212+
213+
def merge_lora_to_hf(
214+
base_ckpt: str,
215+
adapter_ckpt: str,
216+
hf_model_name: str,
217+
hf_ckpt_path: str,
218+
) -> str:
219+
"""Merge a Megatron LoRA adapter with its base model and export to HuggingFace format.
220+
221+
Args:
222+
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
223+
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
224+
Must contain a ``run_config.yaml`` with a ``peft`` section.
225+
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
226+
hf_ckpt_path: Output directory for the merged HuggingFace checkpoint.
227+
228+
Returns:
229+
The *hf_ckpt_path* that was written to.
230+
231+
Raises:
232+
FileExistsError: If *hf_ckpt_path* already exists.
233+
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
234+
"""
235+
if os.path.exists(hf_ckpt_path):
236+
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")
237+
238+
with _build_megatron_model_with_lora(
239+
base_ckpt=base_ckpt,
240+
adapter_ckpt=adapter_ckpt,
241+
hf_model_name=hf_model_name,
242+
) as (bridge, megatron_model, _):
166243
logger.info("Saving merged model in HuggingFace format...")
167244
bridge.save_hf_pretrained(
168245
megatron_model,
@@ -171,24 +248,68 @@ def merge_lora_to_hf(
171248
merge_adapter_weights=True,
172249
)
173250

174-
del megatron_model
175-
gc.collect()
176-
logger.info("Freed model memory.")
177-
sys.stderr.flush()
178-
sys.stdout.flush()
179-
180251
logger.info(f"Done! Merged HF model saved to: {hf_ckpt_path}")
181252
return hf_ckpt_path
182253

183254

255+
def export_lora_adapter_to_hf(
256+
base_ckpt: str,
257+
adapter_ckpt: str,
258+
hf_model_name: str,
259+
hf_ckpt_path: str,
260+
) -> str:
261+
"""Export a Megatron LoRA adapter to HuggingFace PEFT format.
262+
263+
Args:
264+
base_ckpt: Path to the base model Megatron checkpoint (iter_XXXXXXX directory).
265+
adapter_ckpt: Path to the LoRA adapter Megatron checkpoint (iter_XXXXXXX directory).
266+
Must contain a ``run_config.yaml`` with a ``peft`` section.
267+
hf_model_name: HuggingFace model identifier (e.g. ``zai-org/GLM-5``).
268+
hf_ckpt_path: Output directory for the HuggingFace PEFT adapter checkpoint.
269+
270+
Returns:
271+
The *hf_ckpt_path* that was written to.
272+
273+
Raises:
274+
FileExistsError: If *hf_ckpt_path* already exists.
275+
ValueError: If the adapter's ``run_config.yaml`` has no ``peft`` section.
276+
"""
277+
if os.path.exists(hf_ckpt_path):
278+
raise FileExistsError(f"Output path already exists: {hf_ckpt_path}")
279+
280+
with _build_megatron_model_with_lora(
281+
base_ckpt=base_ckpt,
282+
adapter_ckpt=adapter_ckpt,
283+
hf_model_name=hf_model_name,
284+
) as (bridge, megatron_model, peft):
285+
logger.info("Saving LoRA adapter in HuggingFace PEFT format...")
286+
bridge.save_hf_adapter(
287+
megatron_model,
288+
hf_ckpt_path,
289+
peft_config=peft,
290+
base_model_name_or_path=hf_model_name,
291+
)
292+
293+
logger.info(f"Done! HF adapter saved to: {hf_ckpt_path}")
294+
return hf_ckpt_path
295+
296+
184297
def main():
185298
args = parse_args()
186-
merge_lora_to_hf(
187-
base_ckpt=args.base_ckpt,
188-
adapter_ckpt=args.adapter_ckpt,
189-
hf_model_name=args.hf_model_name,
190-
hf_ckpt_path=args.hf_ckpt_path,
191-
)
299+
if args.adapter_only:
300+
export_lora_adapter_to_hf(
301+
base_ckpt=args.base_ckpt,
302+
adapter_ckpt=args.adapter_ckpt,
303+
hf_model_name=args.hf_model_name,
304+
hf_ckpt_path=args.hf_ckpt_path,
305+
)
306+
else:
307+
merge_lora_to_hf(
308+
base_ckpt=args.base_ckpt,
309+
adapter_ckpt=args.adapter_ckpt,
310+
hf_model_name=args.hf_model_name,
311+
hf_ckpt_path=args.hf_ckpt_path,
312+
)
192313

193314

194315
if __name__ == "__main__":

0 commit comments

Comments
 (0)