Skip to content

Commit fe3c4fc

Browse files
feat: support override HF model name in convert_megatron_to_hf (#2202)
Signed-off-by: Dhineshkumar Ramasubbu <dhineshkumar.ramasubbu@gmail.com>
1 parent 2332d20 commit fe3c4fc

2 files changed

Lines changed: 20 additions & 3 deletions

File tree

docs/design-docs/checkpointing.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,17 @@ rsync -ahP $CKPT_DIR/policy/tokenizer ${CKPT_DIR}-hf/
2222

2323
## Converting Megatron Checkpoints to Hugging Face Format
2424

25-
For models that were originally trained using the Megatron-LM backend, a separate converter is available to convert Megatron checkpoints to Hugging Face format. This script requires Megatron-Core, so make sure to launch the conversion with the `mcore` extra. For example,
25+
For models that were originally trained using the Megatron-LM backend, a separate converter is available to convert Megatron checkpoints to Hugging Face format. This script requires Megatron-Core, so make sure to launch the conversion with the `mcore` extra.
2626

27+
Use `--hf-model-name` argument to override the model name mentioned in `config.yaml`. This is useful for models like GPT-OSS whose base checkpoint precision(mxfp4) is different from supported export precision(bfloat16) in Megatron-Bridge, [Ref](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/gpt_oss).
28+
29+
For example,
2730
```sh
2831
CKPT_DIR=results/sft/step_10
2932

30-
uv run --extra mcore examples/converters/convert_megatron_to_hf.py --config=$CKPT_DIR/config.yaml --megatron-ckpt-path=$CKPT_DIR/policy/weights/iter_0000000/ --hf-ckpt-path=<path_to_save_hf_ckpt>
33+
uv run --extra mcore examples/converters/convert_megatron_to_hf.py \
34+
--config=$CKPT_DIR/config.yaml \
35+
--hf-model-name <repo>/model_name \
36+
--megatron-ckpt-path=$CKPT_DIR/policy/weights/iter_0000000/ \
37+
--hf-ckpt-path=<path_to_save_hf_ckpt>
3138
```

examples/converters/convert_megatron_to_hf.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
""" NOTE: this script requires mcore. Make sure to launch with the mcore extra:
2222
uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
2323
--config <path_to_ckpt>/config.yaml \
24+
--hf-model-name <repo>/model_name \
2425
--megatron-ckpt-path <path_to_ckpt>/policy/weights/iter_xxxxx \
2526
--hf-ckpt-path <path_to_save_hf_ckpt>
2627
"""
@@ -37,6 +38,12 @@ def parse_args():
3738
default=None,
3839
help="Path to config.yaml file in the checkpoint directory",
3940
)
41+
parser.add_argument(
42+
"--hf-model-name",
43+
type=str,
44+
default=None,
45+
help="HuggingFace model name override",
46+
)
4047
parser.add_argument(
4148
"--megatron-ckpt-path",
4249
type=str,
@@ -59,7 +66,10 @@ def main():
5966
with open(args.config, "r") as f:
6067
config = yaml.safe_load(f)
6168

62-
model_name = config["policy"]["model_name"]
69+
# Use hf_model_name override, if available.
70+
model_name = (
71+
args.hf_model_name if args.hf_model_name else config["policy"]["model_name"]
72+
)
6373
tokenizer_name = config["policy"]["tokenizer"]["name"]
6474
hf_overrides = config["policy"].get("hf_overrides", {}) or {}
6575

0 commit comments

Comments
 (0)