Commit 8857025
committed
Add Megatron-Bridge recipe-free distillation example script (#861)
## What does this PR do?
**Type of change:** New example script <!-- Use one of the following:
Bug fix, new feature, new example, new tests, documentation. -->
- [x] M-Bridge recipe-free distillation script so its more easier to run
and can support pruned models
- [x] Fix resuming distillation run
## Usage
<!-- You can potentially add a usage example below. -->
```python
torchrun --nproc_per_node 8 distill.py \
--teacher_hf_path Qwen/Qwen3-8B \
--student_hf_path Qwen3-8B-NAS-Pruned-6B \
--tp_size 8 \
--data_paths <climbmix 25% tokenized (~90B tokens)> \
--data_path_to_cache /path/to/cache/climbmix_dataset_indices_qwen3 \
--seq_length 4096 \
--mbs 8 \
--gbs 768 \
--train_iters 28500 \
--lr 1e-4 \
--min_lr 1e-5 \
--lr_warmup_iters 100 \
--eval_interval 500 \
--eval_iters 32 \
--log_interval 10 \
--output_dir qwen3_8b_6b_mbridge_distill
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
- [x] Re-ran Qwen3 8B -> 6B experiments and compare with Nemo2 results
from blog
Best subnet from NAS: `{'num_layers': 30, 'hidden_size': 3584,
'ffn_hidden_size': 11776} -> 5.99B params, 0.5718 score`
| Model | MMLU | GSM8K - flexible, strict | MBPP (coding) |
| ------- | ------ | ------- | ------- |
| Qwen3-8B | 74.9 | 87.5, 84.6 | 65.4 |
| Qwen3-8B-Pruned-6B | 57.6 | 11.6, 10.0 | 4.8 |
| Qwen3-8B-Pruned-6B (Distilled for 16k steps i.e. 50B tokens ~3k GPU
hours) | 71.6 | 78.0, 64.7 | 43.4 |
| Qwen3-8B-Pruned-6B (Distilled for 28.5k steps i.e. 90B tokens ~5.2k
GPU hours) | 71.9 | 78.1, 64.8 | 44.2 |
| Qwen3-4B | 70.0 | 81.1, 84.7 | 62.8 |
Previous Nemo2 experiments on depth pruned Qwen3 8B -> 6B (24 layers)
had MMLU ~72.0 so more or less similar. No hparam tuning done for
current M-Bridge distillation run
- [ ] (Separate PR) GitHub CI/CD test for example script with NeMo 26.02
container
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: N/A
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added complete distillation workflow and example for Megatron-Bridge
optimization.
* **Documentation**
* Enhanced setup guide with Docker workflows, data preparation steps,
and detailed distillation instructions.
* Improved usage documentation and help references.
* **Improvements**
* Better data preprocessing output with human-readable formatting for
metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent 6ec5135 commit 8857025
File tree
5 files changed
+408
-22
lines changed- examples/megatron_bridge
- modelopt/torch/utils/plugins
5 files changed
+408
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
22 | 48 | | |
23 | 49 | | |
24 | 50 | | |
| |||
30 | 56 | | |
31 | 57 | | |
32 | 58 | | |
33 | | - | |
| 59 | + | |
| 60 | + | |
34 | 61 | | |
35 | 62 | | |
36 | 63 | | |
| |||
41 | 68 | | |
42 | 69 | | |
43 | 70 | | |
44 | | - | |
| 71 | + | |
| 72 | + | |
45 | 73 | | |
46 | 74 | | |
47 | 75 | | |
| |||
50 | 78 | | |
51 | 79 | | |
52 | 80 | | |
53 | | - | |
| 81 | + | |
54 | 82 | | |
55 | 83 | | |
56 | 84 | | |
| |||
60 | 88 | | |
61 | 89 | | |
62 | 90 | | |
63 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
64 | 187 | | |
65 | 188 | | |
66 | 189 | | |
| |||
0 commit comments