Skip to content

Commit a0e4541

Browse files
authored
Merge branch 'main' into ernie-upstream
2 parents bddbb54 + 0195522 commit a0e4541

97 files changed

Lines changed: 8293 additions & 961 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.main.commit

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
d30c3ae5469fe3f6a64d4fd2e63b6e7f7844ea81
1+
59fc89485f18c47038c0cb9aed65a35850030d34

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
## 📣 News
1414

15+
- [04/10/2026] [**Qwen3-ASR**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/audio_lm/qwen3_asr) is now supported! Checkpoint conversion and inference for [Qwen3's ASR model](https://github.com/QwenLM/Qwen3-ASR) are available on **main**.
16+
1517
- [04/09/2026] [**Bailing MoE V2**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/bailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https://github.com/ccclyu) for the community contribution!
1618

1719
- [04/07/2026] Megatron Bridge’s PEFT support was featured at [PyTorch Conference Europe 2026 Talk](https://pytorchconferenceeu2026.sched.com/event/2Juce/optimizing-reinforcement-learning-at-trillion-parameter-scale-songlin-jiang-aalto-university-mind-lab).
@@ -201,6 +203,7 @@ Megatron Bridge provides out-of-the-box bridges and training recipes for a wide
201203

202204
- [Qwen2 Audio](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_audio)
203205
- [Qwen2.5-Omni](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_omni)
206+
- [Qwen3-ASR](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen3_asr)
204207

205208
#### Launching Recipes
206209

examples/conversion/hf_megatron_roundtrip_multi_gpu.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@
5353
from megatron.bridge import AutoBridge
5454
from megatron.bridge.models.decorators import torchrun_main
5555
from megatron.bridge.models.hf_pretrained.utils import is_safe_repo
56-
from megatron.bridge.utils.common_utils import fix_gpt_oss_export_transpose, get_hf_model_type
5756

5857

5958
HF_MODEL_ID = "meta-llama/Llama-3.2-1B"
@@ -189,11 +188,7 @@ def main(
189188
all_match = True
190189
fp8_skip_count = 0
191190
fp8_skip_samples: list[str] = []
192-
# TODO: Remove fix_gpt_oss_export_transpose once GPT-OSS bridge export is fixed.
193-
weight_iter = bridge.export_hf_weights(megatron_model, show_progress=False)
194-
if get_hf_model_type(bridge) == "gpt_oss":
195-
weight_iter = fix_gpt_oss_export_transpose(weight_iter)
196-
for name, param in weight_iter:
191+
for name, param in bridge.export_hf_weights(megatron_model, show_progress=False):
197192
if is_rank_0:
198193
original_param = bridge.hf_pretrained.state[name]
199194
compare_param = param

examples/conversion/hf_to_megatron_generate_audio_lm.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,24 +20,24 @@
2020
2121
Example:
2222
# Audio-Language generation with audio from URL:
23-
uv run python examples/conversion/hf_to_megatron_generate_alm.py \
23+
uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
2424
--hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
2525
--audio_url="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/glass-breaking-151256.mp3" \
2626
--prompt="What's that sound?"
2727
2828
# Audio-Language generation with local audio file:
29-
uv run python examples/conversion/hf_to_megatron_generate_alm.py \
29+
uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
3030
--hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
3131
--audio_path="/path/to/audio.wav" \
3232
--prompt="Describe what you hear in this audio."
3333
3434
# Text-only generation (no audio):
35-
uv run python examples/conversion/hf_to_megatron_generate_alm.py \
35+
uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
3636
--hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
3737
--prompt="Hello, how are you?"
3838
3939
# Load from Megatron checkpoint:
40-
uv run python examples/conversion/hf_to_megatron_generate_alm.py \
40+
uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
4141
--hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
4242
--megatron_model_path="/path/to/megatron/checkpoint" \
4343
--audio_url="https://example.com/audio.mp3" \
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Qwen2-Audio - Audio Language Model
2+
3+
This directory contains example scripts for Qwen2-Audio audio-language models.
4+
5+
## Inference
6+
7+
### Run Inference from HuggingFace Checkpoint
8+
9+
```bash
10+
uv run python -m torch.distributed.run --nproc_per_node=2 \
11+
examples/conversion/hf_to_megatron_generate_audio_lm.py \
12+
--hf_model_path Qwen/Qwen2-Audio-7B-Instruct \
13+
--audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
14+
--prompt "Describe what you hear in this audio." \
15+
--tp 2 \
16+
--max_new_tokens 50
17+
```
18+
19+
Note:
20+
- You can also use local audio files: `--audio_path /path/to/audio.wav`
21+
22+
See the [inference.sh](inference.sh) script for the full runnable commands.
23+
24+
**Expected output:**
25+
```
26+
======== GENERATED TEXT OUTPUT ========
27+
Audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac
28+
Prompt: Describe what you hear in this audio.
29+
Generated: system
30+
You are a helpful assistant.
31+
user
32+
Audio 1:
33+
Describe what you hear in this audio.
34+
assistant
35+
I heard a man speaking in English with the phrase 'Mister Quiller is the apostle of the middle classes and we are glad to welcome his gospel.'
36+
=======================================
37+
```
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env bash
2+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# Usage:
17+
# bash examples/models/audio_lm/qwen2_audio/inference.sh
18+
19+
set -e
20+
21+
export HF_MODEL="Qwen/Qwen2-Audio-7B-Instruct"
22+
23+
AUDIO_URL="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac"
24+
25+
echo "============================================"
26+
echo "Qwen2-Audio Megatron Bridge Inference Test"
27+
echo "============================================"
28+
29+
echo ""
30+
echo "Direct inference from HuggingFace..."
31+
echo "Audio: ${AUDIO_URL}"
32+
echo ""
33+
34+
uv run python -m torch.distributed.run --nproc_per_node=2 \
35+
examples/conversion/hf_to_megatron_generate_audio_lm.py \
36+
--hf_model_path ${HF_MODEL} \
37+
--audio_url "${AUDIO_URL}" \
38+
--prompt "Describe what you hear in this audio." \
39+
--tp 2 \
40+
--max_new_tokens 50
41+
42+
echo ""
43+
echo "============================================"
44+
echo "Inference complete!"
45+
echo "============================================"
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/usr/bin/env bash
2+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# ==============================================================================
17+
# Qwen2-Audio 7B SFT (Supervised Fine-Tuning) Script
18+
#
19+
# Usage:
20+
# bash sft.sh
21+
#
22+
# Environment variables:
23+
# WORKSPACE — root dir for models/results (default: /workspace)
24+
# NPROC — number of GPUs per node (default: 8)
25+
# HF_MODEL — HuggingFace model path (default: Qwen/Qwen2-Audio-7B)
26+
# ==============================================================================
27+
# WORKSPACE — root dir for models/results (default: /workspace/Megatron-Bridge/examples/models/audio_lm/qwen2_audio)
28+
LOG_FILE=./qwen2_audio_7b_asr.log
29+
exec > >(tee "${LOG_FILE}") 2>&1
30+
31+
export TORCHDYNAMO_DISABLE=1
32+
33+
set -euo pipefail
34+
35+
# Workspace directory for checkpoints and results
36+
WORKSPACE=${WORKSPACE:-/workspace/Megatron-Bridge/examples/models/audio_lm/qwen2_audio}
37+
NPROC=${NPROC:-8}
38+
HF_MODEL=${HF_MODEL:-Qwen/Qwen2-Audio-7B}
39+
40+
# Before training, make sure to set WANDB_API_KEY or disable wandb logging
41+
# export WANDB_API_KEY=<your_wandb_api_key>
42+
# export WANDB_MODE=disabled
43+
44+
# Common configurations
45+
MODEL_NAME=qwen2_audio_7b
46+
MEGATRON_CKPT_DIR=${WORKSPACE}/megatron_ckpts/${MODEL_NAME}
47+
48+
# Convert HF checkpoint to Megatron format if not already done
49+
if [ ! -d "${MEGATRON_CKPT_DIR}/iter_0000000" ]; then
50+
echo "Converting HF model to Megatron format..."
51+
uv run --no-sync python examples/conversion/convert_checkpoints.py import \
52+
--hf-model ${HF_MODEL} \
53+
--megatron-path ${MEGATRON_CKPT_DIR}
54+
fi
55+
PRETRAINED_CHECKPOINT=${PRETRAINED_CHECKPOINT:-${MEGATRON_CKPT_DIR}}
56+
WANDB_PROJECT=megatron-bridge-${MODEL_NAME}
57+
58+
# Training hyperparameters
59+
SEQ_LENGTH=16384
60+
TRAIN_ITERS=11250
61+
GLOBAL_BATCH_SIZE=32
62+
MICRO_BATCH_SIZE=4
63+
EVAL_INTERVAL=1000
64+
EVAL_ITERS=10
65+
LR=2e-5
66+
MIN_LR=2e-6
67+
LR_WARMUP_ITERS=5
68+
SAVE_INTERVAL=1000
69+
LOG_INTERVAL=1
70+
71+
# TP/PP combinations: "TP,PP"
72+
PARALLELISM_CONFIGS=("1,1")
73+
74+
for par_config in "${PARALLELISM_CONFIGS[@]}"; do
75+
IFS=',' read -r TP PP <<< "$par_config"
76+
echo "============================================================"
77+
echo " run_recipe.py | TP=${TP}, PP=${PP}"
78+
echo "============================================================"
79+
uv run --no-sync python -m torch.distributed.run --nproc_per_node=${NPROC} scripts/training/run_recipe.py \
80+
--recipe qwen2_audio_7b_finetune_config \
81+
--step_func audio_lm_step \
82+
--hf_path ${HF_MODEL} \
83+
checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \
84+
checkpoint.save=${WORKSPACE}/exp/${MODEL_NAME}_sft_tp${TP}_pp${PP} \
85+
checkpoint.save_interval=$SAVE_INTERVAL \
86+
checkpoint.save_optim=False \
87+
model.seq_length=$SEQ_LENGTH \
88+
model.tensor_model_parallel_size=$TP \
89+
model.pipeline_model_parallel_size=$PP \
90+
model.freeze_language_model=false \
91+
model.freeze_audio_model=false \
92+
model.freeze_audio_projection=false \
93+
train.train_iters=$TRAIN_ITERS \
94+
train.global_batch_size=$GLOBAL_BATCH_SIZE \
95+
train.micro_batch_size=$MICRO_BATCH_SIZE \
96+
validation.eval_interval=$EVAL_INTERVAL \
97+
validation.eval_iters=$EVAL_ITERS \
98+
optimizer.lr=$LR \
99+
optimizer.min_lr=$MIN_LR \
100+
scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \
101+
logger.log_interval=$LOG_INTERVAL \
102+
logger.wandb_project=$WANDB_PROJECT \
103+
logger.wandb_exp_name=${MODEL_NAME}_asr_tp${TP}_pp${PP} \
104+
dataset.maker_name=make_default_audio_dataset \
105+
"dataset.maker_kwargs.path_or_dataset=yuekai/aishell" \
106+
"dataset.maker_kwargs.subset=train" \
107+
"dataset.maker_kwargs.split=test" \
108+
"+dataset.maker_kwargs.prompt='Detect the language and recognize the speech: <|zh|>'" \
109+
"dataset.val_maker_kwargs.subset=dev" \
110+
"dataset.val_maker_kwargs.split=test" \
111+
dataset.skip_test=true \
112+
dataset.pack_sequences_in_batch=true \
113+
rng.seed=42 \
114+
ddp.grad_reduce_in_fp32=false
115+
done
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Qwen3-ASR - Audio Speech Recognition Model
2+
3+
This directory contains example scripts for Qwen3-ASR audio speech recognition models.
4+
5+
## Workspace Configuration
6+
7+
All scripts use a `WORKSPACE` environment variable to define the base directory for checkpoints and results. By default, this is set to `/workspace`. You can override it:
8+
9+
```bash
10+
export WORKSPACE=/your/custom/path
11+
```
12+
13+
14+
15+
## Checkpoint Conversion
16+
17+
### Import HF → Megatron
18+
To import the HF model to your desired Megatron path:
19+
```bash
20+
uv run python examples/conversion/convert_checkpoints.py import \
21+
--hf-model Qwen/Qwen3-ASR-1.7B \
22+
--megatron-path ${WORKSPACE}/models/Qwen3-ASR-1.7B
23+
```
24+
25+
### Export Megatron → HF
26+
```bash
27+
uv run python examples/conversion/convert_checkpoints.py export \
28+
--hf-model Qwen/Qwen3-ASR-1.7B \
29+
--megatron-path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
30+
--hf-path ${WORKSPACE}/models/Qwen3-ASR-1.7B-hf-export
31+
```
32+
33+
### Round-trip Validation
34+
```bash
35+
uv run python -m torch.distributed.run --nproc_per_node=2 \
36+
examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
37+
--hf-model-id Qwen/Qwen3-ASR-1.7B \
38+
--megatron-load-path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
39+
--trust-remote-code \
40+
--tp 2 --pp 1
41+
```
42+
43+
## Inference
44+
45+
### Run Inference on HuggingFace Checkpoint
46+
47+
```bash
48+
uv run python -m torch.distributed.run --nproc_per_node=2 \
49+
examples/conversion/hf_to_megatron_generate_audio_lm.py \
50+
--hf_model_path Qwen/Qwen3-ASR-1.7B \
51+
--audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
52+
--prompt "" \
53+
--tp 2 \
54+
--max_new_tokens 50
55+
```
56+
57+
### Run Inference on Converted Megatron Checkpoint
58+
59+
```bash
60+
uv run python -m torch.distributed.run --nproc_per_node=2 \
61+
examples/conversion/hf_to_megatron_generate_audio_lm.py \
62+
--hf_model_path Qwen/Qwen3-ASR-1.7B \
63+
--megatron_model_path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
64+
--audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
65+
--prompt "" \
66+
--tp 2 \
67+
--max_new_tokens 50
68+
```
69+
70+
Note:
71+
- `--megatron_model_path` is optional. If not specified, the script will convert the model in-memory and then run forward.
72+
- You can also use local audio files: `--audio_path /path/to/audio.wav`
73+
74+
See the [inference.sh](inference.sh) script for the full runnable commands.
75+
76+
**Expected output:**
77+
```
78+
======== GENERATED TEXT OUTPUT ========
79+
Audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac
80+
Prompt:
81+
Generated: system
82+
83+
user
84+
85+
assistant
86+
language English<asr_text>Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.
87+
=======================================
88+
```

0 commit comments

Comments
 (0)