Skip to content

Commit d244ca7

Browse files
A tutorial on mbridge distillation for puzzletron/any_model
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
1 parent 562f46b commit d244ca7

2 files changed

Lines changed: 95 additions & 14 deletions

File tree

examples/puzzletron/README.md

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -275,21 +275,9 @@ vllm bench throughput --model path/to/model --input-len 2000 --output-len 100 --
275275
276276
## Knowledge Distillation
277277
278-
To recover degradation in the quality of the compressed model, we can use knowledge distillation. This allows transferring the capabilities of the original model to the pruned one. For this, we will use [NeMo framework](https://github.com/NVIDIA-NeMo/NeMo) with the [nemo:25.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=25.07) container.
278+
To recover degradation in the quality of the compressed model, we can use knowledge distillation. This allows transferring the capabilities of the original model to the pruned one.
279279
280-
First, convert the HF model to NeMo format:
281-
282-
```bash
283-
python -m nemo_export/convert_hf_to_nemo --input-ckpt-path path/to/HF-model --output-ckpt-path path/to/save/model-nemo
284-
```
285-
286-
Now you can utilize all the training features available in NeMo, including distillation. Please refer to the [NeMo distillation documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).
287-
288-
[Optional] Once distillation is complete, you can convert the distilled model back to the HuggingFace format.
289-
290-
```bash
291-
python -m nemo_export/convert_nemo_to_hf --input-ckpt-path path/to/nemo-model --output-ckpt-path path/to/save/model-HF
292-
```
280+
See [mbridge_distillation/README.md](./mbridge_distillation/README.md) for instructions on using Megatron-Bridge for knowledge distillation.
293281
294282
## Advanced Usage
295283
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Knowledge Distillation with Megatron-Bridge
2+
3+
This guide shows how to perform knowledge distillation on Puzzletron-compressed AnyModel checkpoints using Megatron-Bridge.
4+
5+
## Overview
6+
7+
1. Set up the environment with Megatron-Bridge
8+
2. Convert AnyModel checkpoints (student and teacher) to Megatron-Bridge format
9+
3. Run knowledge distillation training
10+
11+
## Setup
12+
13+
> **Temporary Setup:** This manual Megatron-Bridge setup is required temporarily until the NeMo docker container includes Megatron-Bridge by default. Once the container is updated, this setup step will no longer be necessary.
14+
15+
**Note:** Set `$WORKSPACE` to your project root directory before running these commands:
16+
17+
```bash
18+
export WORKSPACE=/path/to/your/project
19+
```
20+
21+
1. **Clone Megatron-Bridge:**
22+
23+
Clone [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) into your workspace:
24+
25+
```bash
26+
cd $WORKSPACE
27+
git clone https://github.com/NVIDIA-NeMo/Megatron-Bridge.git
28+
```
29+
30+
2. **Initialize Megatron-Bridge submodules:**
31+
32+
```bash
33+
cd $WORKSPACE/Megatron-Bridge
34+
git submodule init
35+
git submodule update
36+
```
37+
38+
3. **Start Docker container with mounts:**
39+
40+
Use the [NeMo 25.11 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=25.11):
41+
42+
```bash
43+
docker run --gpus all -it --rm \
44+
-v $WORKSPACE:/workspace \
45+
-v $WORKSPACE/Megatron-Bridge/3rdparty/Megatron-LM:/opt/megatron-lm \
46+
nvcr.io/nvidia/nemo:25.11 \
47+
/bin/bash
48+
```
49+
50+
**Note:** The mount `/opt/megatron-lm` is required because Megatron-Bridge depends on the Megatron-LM submodule.
51+
52+
4. **Set up the environment inside the container:**
53+
54+
```bash
55+
export PYTHONPATH="/workspace/Megatron-Bridge/src:/workspace/Model-Optimizer:${PYTHONPATH}"
56+
```
57+
58+
## Step 1: Convert Checkpoints to Megatron-Bridge Format
59+
60+
Convert both student and teacher checkpoints:
61+
62+
```bash
63+
# Convert student checkpoint
64+
torchrun --nproc_per_node=1 examples/puzzletron/mbridge_distillation/import_anymodel_to_mbridge.py \
65+
--input-ckpt-path /path/to/student/anymodel/checkpoint \
66+
--output-ckpt-path /path/to/student/mbridge/checkpoint
67+
68+
# Convert teacher checkpoint
69+
torchrun --nproc_per_node=1 examples/puzzletron/mbridge_distillation/import_anymodel_to_mbridge.py \
70+
--input-ckpt-path /path/to/teacher/anymodel/checkpoint \
71+
--output-ckpt-path /path/to/teacher/mbridge/checkpoint
72+
```
73+
74+
## Step 2: Run Knowledge Distillation
75+
76+
Run distillation with tokenized dataset:
77+
78+
```bash
79+
torchrun --nproc_per_node=8 examples/puzzletron/mbridge_distillation/distill_anymodel.py \
80+
--student-mbridge-ckpt /path/to/student/mbridge/checkpoint/iter_0000000 \
81+
--teacher-mbridge-ckpt /path/to/teacher/mbridge/checkpoint/iter_0000000 \
82+
--data-path /path/to/tokenized/dataset \
83+
--output-dir ./distilled_output \
84+
dataset.sequence_length=8192 \
85+
model.tensor_model_parallel_size=8 \
86+
model.teacher.tensor_model_parallel_size=8 \
87+
train.global_batch_size=4 \
88+
train.micro_batch_size=1 \
89+
train.train_iters=5000 \
90+
logger.log_interval=1
91+
```
92+
93+
The distilled checkpoint will be saved to `--output-dir`.

0 commit comments

Comments
 (0)