-
Notifications
You must be signed in to change notification settings - Fork 360
Add Megatron-Bridge recipe-free distillation example script #861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a4ad1b8
Add Megatron-Bridge recipe-free distillation example script
kevalmorabia97 48c74bd
Update docs
kevalmorabia97 ce4d081
minor
kevalmorabia97 c18315b
Fix resuming
kevalmorabia97 50b6b7e
Update readme
kevalmorabia97 59bc44c
minor doc update
kevalmorabia97 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,21 +4,47 @@ This directory contains examples of using Model Optimizer with [NeMo Megatron-Br | |
|
|
||
| <div align="center"> | ||
|
|
||
| | **Section** | **Description** | **Link** | **Docs** | | ||
| | :------------: | :------------: | :------------: | :------------: | | ||
| | Pre-Requisites | Development environment setup | \[[Link](#pre-requisites)\] | | | ||
| | Pruning | Examples of pruning a model using Minitron algorithm | \[[Link](#pruning)\] | | | ||
| | Distillation | Examples of distillation a pruned or quantized model | \[[Link](#distillation)\] | | | ||
| | Quantization | Examples of quantizing a model | \[[Link](#quantization)\] | | | ||
| | Resources | Extra links to relevant resources | \[[Link](#resources)\] | | | ||
| | **Section** | **Description** | **Link** | | ||
| | :------------: | :------------: | :------------: | | ||
| | Pre-Requisites | Development environment setup | \[[Link](#pre-requisites)\] | | ||
| | Pruning | Examples of pruning a model using Minitron algorithm | \[[Link](#pruning)\] | | ||
| | Distillation | Examples of distillation a pruned or quantized model | \[[Link](#distillation)\] | | ||
| | Quantization | Examples of quantizing a model | \[[Link](#quantization)\] | | ||
| | Resources | Extra links to relevant resources | \[[Link](#resources)\] | | ||
|
|
||
| </div> | ||
|
|
||
| ## Pre-Requisites | ||
|
|
||
| Running these examples requires many additional dependencies to be installed (e.g., Megatron-Bridge, Megatron-core, etc.), hence we strongly recommend directly using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`) which has all the dependencies installed. | ||
|
|
||
| To get the latest ModelOpt features and examples, you can mount your latest ModelOpt cloned repository to the container at `/opt/Megatron-Bridge/3rdparty/Model-Optimizer` or pull the latest changes once inside the docker container (`cd /opt/Megatron-Bridge/3rdparty/Model-Optimizer && git checkout main && git pull`). | ||
| To get the latest ModelOpt features and examples scripts, mount your Model-Optimizer repo to the container. | ||
|
|
||
| ```bash | ||
| export MODELOPT_DIR=${PWD}/Model-Optimizer # or set to your local Model-Optimizer repository path if you have cloned it | ||
| if [ ! -d "${MODELOPT_DIR}" ]; then | ||
| git clone https://github.com/NVIDIA/Model-Optimizer.git ${MODELOPT_DIR} | ||
| fi | ||
|
|
||
| export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02 | ||
| docker run \ | ||
| --gpus all \ | ||
| --shm-size=16GB \ | ||
| --net=host \ | ||
| --ulimit memlock=-1 \ | ||
| --rm -it \ | ||
| -v ${MODELOPT_DIR}:/opt/Model-Optimizer \ | ||
| -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \ | ||
| -w /opt/Model-Optimizer/examples/megatron_bridge \ | ||
| ${DOCKER_IMAGE} bash | ||
| ``` | ||
|
|
||
| Once inside the container, you need to login with your HuggingFace token to download gated datasets / models. | ||
| Note that the default dataset for pruning and quantization is [`nemotron-post-training-dataset-v2`](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2), which is gated. | ||
|
|
||
| ```bash | ||
| huggingface-cli login --token <your token> | ||
| ``` | ||
|
|
||
| ## Pruning | ||
|
|
||
|
|
@@ -30,7 +56,8 @@ Example usage to prune Qwen3-8B to 6B on 2-GPUs (Pipeline Parallelism = 2) while | |
| top-10 candidates are evaluated for MMLU score (5% sampled data) to select the best model. | ||
|
|
||
| ```bash | ||
| torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \ | ||
| torchrun --nproc_per_node 2 prune_minitron.py \ | ||
| --pp_size 2 \ | ||
| --hf_model_name_or_path Qwen/Qwen3-8B \ | ||
| --prune_target_params 6e9 \ | ||
| --hparams_to_skip num_attention_heads \ | ||
|
|
@@ -41,7 +68,8 @@ Example usage for manually pruning to a specific architecture using following de | |
| 1024 samples from [`nemotron-post-training-dataset-v2`](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) for calibration. | ||
|
|
||
| ```bash | ||
| torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \ | ||
| torchrun --nproc_per_node 2 prune_minitron.py \ | ||
| --pp_size 2 \ | ||
| --hf_model_name_or_path Qwen/Qwen3-8B \ | ||
| --prune_export_config '{"hidden_size": 3584, "ffn_hidden_size": 9216}' \ | ||
| --output_hf_path /tmp/Qwen3-8B-Pruned-6B-manual | ||
|
|
@@ -50,7 +78,7 @@ torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/exampl | |
| To see the full usage for advanced configurations, run: | ||
|
|
||
| ```bash | ||
| python /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py --help | ||
| torchrun --nproc_per_node 1 prune_minitron.py --help | ||
| ``` | ||
|
|
||
| > [!TIP] | ||
|
|
@@ -60,7 +88,102 @@ python /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/pr | |
|
|
||
| ## Distillation | ||
|
|
||
| TODO | ||
| This section shows how to distill a student model from a teacher model in the Megatron-Bridge framework. | ||
|
|
||
| This can be used stand-alone or after pruning (see [Pruning](#pruning)) / quantization (see [Quantization](#quantization)) to recover accuracy of the model by distilling from the original model (teacher). | ||
|
|
||
| The [distill.py](distill.py) script loads student and teacher models from HuggingFace checkpoints and saves the distilled model to `<output_dir>/checkpoints` in Megatron distributed checkpoint format. | ||
|
|
||
| ### Data Preparation | ||
|
|
||
| The distillation script expects pre-tokenized data in Megatron's binary format (`.bin` / `.idx` files). | ||
| You can tokenize your JSONL dataset using the following function: | ||
|
|
||
| ```python | ||
| from modelopt.torch.utils.plugins import megatron_preprocess_data | ||
|
|
||
| megatron_preprocess_data( | ||
| input_path="/path/to/your/data.jsonl", | ||
| output_dir="/path/to/tokenized/data", | ||
| tokenizer_name_or_path="Qwen/Qwen3-0.6B", | ||
| json_keys=["text"], # change to your JSON key if needed | ||
| workers=32, | ||
| log_interval=100000, | ||
| max_sequence_length=256000, # To avoid rare OOM errors if text is too long | ||
| ) | ||
| ``` | ||
|
|
||
| If you have multiple JSONL files, you can tokenize them one by one and pass all the paths to the `--data_paths` argument. | ||
|
|
||
| ### Distillation with Real Data | ||
|
|
||
| Example usage to distill a 4B student (HF) from an 8B teacher (HF) on 8 GPUs (TP=8, PP=1): | ||
|
|
||
| ```bash | ||
| torchrun --nnodes 1 --nproc_per_node 8 distill.py \ | ||
| --tp_size 8 \ | ||
| --teacher_hf_path Qwen/Qwen3-8B \ | ||
| --student_hf_path Qwen/Qwen3-4B \ | ||
| --data_paths 1.0 /path/to/tokenized/data \ | ||
| --data_path_to_cache /path/to/cache/dataset_indices_qwen3 \ | ||
| --seq_length 8192 \ | ||
| --mbs 1 \ | ||
| --gbs 768 \ | ||
| --train_iters 15000 \ | ||
| --lr 1e-4 \ | ||
| --min_lr 1e-5 \ | ||
| --lr_warmup_iters 50 \ | ||
| --eval_interval 100 \ | ||
| --eval_iters 32 \ | ||
| --log_interval 10 \ | ||
| --output_dir /output/qwen3_8b_to_4b_distill | ||
| ``` | ||
|
|
||
| Tensorboard logging is enabled by default and logs are saved to `<output_dir>/tensorboard` directory. | ||
| To use Weights & Biases for logging, set the `WANDB_API_KEY` environment variable and pass the `--wandb_project` argument. | ||
| Optionally, you can also pass `--wandb_entity` and `--wandb_exp_name` arguments to group runs under a project and experiment name. | ||
|
|
||
| To see all available arguments: | ||
|
|
||
| ```bash | ||
| torchrun --nproc_per_node 1 distill.py --help | ||
| ``` | ||
|
|
||
| ### Quick Test with Mock Data | ||
|
|
||
| Example usage with mock data for quick testing (no pre-tokenized data needed): | ||
|
|
||
| ```bash | ||
| torchrun --nproc_per_node 8 distill.py \ | ||
| --tp_size 8 \ | ||
| --teacher_hf_path Qwen/Qwen3-0.6B \ | ||
| --student_hf_path Qwen/Qwen3-0.6B \ | ||
| --use_mock_data \ | ||
| --seq_length 512 \ | ||
| --mbs 1 \ | ||
| --gbs 8 \ | ||
| --train_iters 100 \ | ||
| --eval_interval 10 \ | ||
| --eval_iters 4 \ | ||
| --output_dir /tmp/test_distill | ||
| ``` | ||
|
|
||
| ### Slurm Usage | ||
|
|
||
| To run the distillation script on a Slurm cluster for multi-node training, you just need use `python` instead of `torchrun` and set the number of nodes using `#SBATCH --nodes=<num_nodes>` clause in your Slurm script. | ||
|
|
||
| ### Convert Megatron checkpoint to Hugging Face format | ||
|
|
||
| To convert the Megatron checkpoint from last iteration (or any intermediate iteration) to Hugging Face format, you need the pruned model config (`--output_hf_path` from `prune_minitron.py` script) and the distilled megatron checkpoint dir (`<distill_output_dir>/checkpoints/iter_<iter_number>`) to run the following command: | ||
|
|
||
| ```bash | ||
| uv run python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we assume the user already has uv installed?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its in the nemo container so already installed |
||
| --hf-model <path_to_pruned_hf_ckpt> \ | ||
| --megatron-path <distill_output_dir>/checkpoints/iter_<iter_number> \ | ||
| --hf-path <path_to_save_distilled_hf_ckpt> | ||
| ``` | ||
|
|
||
| For more details, you can refer to the checkpoint conversion scripts in the [Megatron-Bridge README](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/conversion). | ||
|
|
||
| ## Quantization | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is mounting to venv also necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So users can mount library and examples from same version. This avoids the case where user uses old modelopt but with examples from main branch