Commit d88dfcb
consolidate mbridge distillation: merge distill_hf.py into distill.py (#1220)
## Summary
- Unified `examples/puzzletron/mbridge_distillation/distill_hf.py`
(AnyModel-specific) into `examples/megatron_bridge/distill.py` (general)
- The single script now handles both standard HF and Puzzletron AnyModel
checkpoints.
- Added `--hf_export_path` / `--student_hf_model` args for inline HF
export after distillation.
- Merged AnyModel integration test into
`tests/examples/megatron_bridge/test_distill.py`
- test models use `vocab_size=128` (instead of default 102) for TP
divisibility including 8.
- Moved MMLU distillation results into `megatron_bridge/README.md`
- puzzletron README now redirects to the consolidated docs.
Limitation discovered during consolidation:
HF export via `--hf_export_path` seems to currently not work for
Puzzletron AnyModel (heterogeneous) checkpoints. Megatron-Bridge's
`export_ckpt` cannot reload heterogeneous model configs from saved
checkpoints (`heterogeneous_layers_config_encoded_json` is `None` during
`__post_init__` in `heterogeneous_config.py`). This affects both inline
`--hf_export_path` and the separate `convert_checkpoints.py export`
script.
The original `distill_hf.py` README documented this as supported, but I
think it might have been broken there too (on the side of
Megatron-Bridge). The consolidated README now documents this as a known
limitation. HF export for standard models works fine via both methods.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added support for Puzzletron AnyModel checkpoints in distillation
pipeline.
* Introduced inline HuggingFace export capability during distillation
process.
* **Documentation**
* Updated distillation guide with clearer conversion workflows and
optional HuggingFace export instructions.
* Added distillation benchmarks and performance recommendations.
* **Bug Fixes & Improvements**
* Streamlined test infrastructure and workflow configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: root <root@pool0-00848.cm.cluster>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent 6395b1e commit d88dfcb
File tree
10 files changed
+234
-612
lines changed- .github/workflows
- examples
- megatron_bridge
- results
- puzzletron
- mbridge_distillation
- tests
- _test_utils/torch/puzzletron
- examples
- megatron_bridge
- puzzletron/mbridge_distillation
10 files changed
+234
-612
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
| 128 | + | |
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
| 161 | + | |
162 | 162 | | |
163 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
164 | 177 | | |
165 | 178 | | |
166 | 179 | | |
| |||
169 | 182 | | |
170 | 183 | | |
171 | 184 | | |
172 | | - | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
173 | 190 | | |
174 | 191 | | |
175 | 192 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
29 | 34 | | |
30 | 35 | | |
31 | 36 | | |
| |||
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
| 51 | + | |
46 | 52 | | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
50 | 59 | | |
51 | 60 | | |
52 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
53 | 95 | | |
54 | 96 | | |
55 | 97 | | |
| |||
124 | 166 | | |
125 | 167 | | |
126 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
127 | 187 | | |
128 | 188 | | |
129 | 189 | | |
130 | 190 | | |
131 | 191 | | |
132 | 192 | | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
133 | 196 | | |
134 | 197 | | |
135 | 198 | | |
| |||
252 | 315 | | |
253 | 316 | | |
254 | 317 | | |
255 | | - | |
| 318 | + | |
| 319 | + | |
256 | 320 | | |
257 | 321 | | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
258 | 347 | | |
259 | 348 | | |
260 | 349 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
302 | | - | |
| 302 | + | |
303 | 303 | | |
304 | 304 | | |
305 | 305 | | |
| |||
0 commit comments