NVIDIA-NeMo
diff --git a/‎.main.commit‎
Lines changed: 1 addition & 1 deletion b/‎.main.commit‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎3rdparty/Megatron-LM‎ b/‎3rdparty/Megatron-LM‎
diff --git a/‎README.md‎
Lines changed: 3 additions & 0 deletions b/‎README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎examples/conversion/hf_megatron_roundtrip_multi_gpu.py‎
Lines changed: 1 addition & 6 deletions b/‎examples/conversion/hf_megatron_roundtrip_multi_gpu.py‎
Lines changed: 1 addition & 6 deletions
diff --git a/‎examples/conversion/hf_to_megatron_generate_audio_lm.py‎
Lines changed: 4 additions & 4 deletions b/‎examples/conversion/hf_to_megatron_generate_audio_lm.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎examples/models/audio_lm/qwen2_audio/README.md‎
Lines changed: 37 additions & 0 deletions b/‎examples/models/audio_lm/qwen2_audio/README.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎examples/models/audio_lm/qwen2_audio/inference.sh‎
Lines changed: 45 additions & 0 deletions b/‎examples/models/audio_lm/qwen2_audio/inference.sh‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎examples/models/audio_lm/qwen2_audio/sft.sh‎
Lines changed: 115 additions & 0 deletions b/‎examples/models/audio_lm/qwen2_audio/sft.sh‎
Lines changed: 115 additions & 0 deletions
diff --git a/‎examples/models/audio_lm/qwen3_asr/README.md‎
Lines changed: 88 additions & 0 deletions b/‎examples/models/audio_lm/qwen3_asr/README.md‎
Lines changed: 88 additions & 0 deletions
@@ -1 +1 @@
-d30c3ae5469fe3f6a64d4fd2e63b6e7f7844ea81
+59fc89485f18c47038c0cb9aed65a35850030d34
@@ -12,6 +12,8 @@
 
 ## 📣 News
 
+- [04/10/2026] [**Qwen3-ASR**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/audio_lm/qwen3_asr) is now supported! Checkpoint conversion and inference for [Qwen3's ASR model](https://github.com/QwenLM/Qwen3-ASR) are available on **main**.
+
 - [04/09/2026] [**Bailing MoE V2**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/bailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https://github.com/ccclyu) for the community contribution!
 
 - [04/07/2026] Megatron Bridge’s PEFT support was featured at [PyTorch Conference Europe 2026 Talk](https://pytorchconferenceeu2026.sched.com/event/2Juce/optimizing-reinforcement-learning-at-trillion-parameter-scale-songlin-jiang-aalto-university-mind-lab).
@@ -201,6 +203,7 @@ Megatron Bridge provides out-of-the-box bridges and training recipes for a wide
 
 - [Qwen2 Audio](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_audio)
 - [Qwen2.5-Omni](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_omni)
+- [Qwen3-ASR](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen3_asr)
 
 #### Launching Recipes
 
 
@@ -53,7 +53,6 @@
 from megatron.bridge import AutoBridge
 from megatron.bridge.models.decorators import torchrun_main
 from megatron.bridge.models.hf_pretrained.utils import is_safe_repo
-from megatron.bridge.utils.common_utils import fix_gpt_oss_export_transpose, get_hf_model_type
 
 
 HF_MODEL_ID = "meta-llama/Llama-3.2-1B"
@@ -189,11 +188,7 @@ def main(
     all_match = True
     fp8_skip_count = 0
     fp8_skip_samples: list[str] = []
-    # TODO: Remove fix_gpt_oss_export_transpose once GPT-OSS bridge export is fixed.
-    weight_iter = bridge.export_hf_weights(megatron_model, show_progress=False)
-    if get_hf_model_type(bridge) == "gpt_oss":
-        weight_iter = fix_gpt_oss_export_transpose(weight_iter)
-    for name, param in weight_iter:
+    for name, param in bridge.export_hf_weights(megatron_model, show_progress=False):
         if is_rank_0:
             original_param = bridge.hf_pretrained.state[name]
             compare_param = param
 
@@ -20,24 +20,24 @@
 
 Example:
   # Audio-Language generation with audio from URL:
-  uv run python examples/conversion/hf_to_megatron_generate_alm.py \
+  uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
     --hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
     --audio_url="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/glass-breaking-151256.mp3" \
     --prompt="What's that sound?"
 
   # Audio-Language generation with local audio file:
-  uv run python examples/conversion/hf_to_megatron_generate_alm.py \
+  uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
     --hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
     --audio_path="/path/to/audio.wav" \
     --prompt="Describe what you hear in this audio."
 
   # Text-only generation (no audio):
-  uv run python examples/conversion/hf_to_megatron_generate_alm.py \
+  uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
     --hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
     --prompt="Hello, how are you?"
 
   # Load from Megatron checkpoint:
-  uv run python examples/conversion/hf_to_megatron_generate_alm.py \
+  uv run python examples/conversion/hf_to_megatron_generate_audio_lm.py \
     --hf_model_path="Qwen/Qwen2-Audio-7B-Instruct" \
     --megatron_model_path="/path/to/megatron/checkpoint" \
     --audio_url="https://example.com/audio.mp3" \
 
@@ -0,0 +1,37 @@
+# Qwen2-Audio - Audio Language Model
+
+This directory contains example scripts for Qwen2-Audio audio-language models.
+
+## Inference
+
+### Run Inference from HuggingFace Checkpoint
+
+```bash
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+  examples/conversion/hf_to_megatron_generate_audio_lm.py \
+  --hf_model_path Qwen/Qwen2-Audio-7B-Instruct \
+  --audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
+  --prompt "Describe what you hear in this audio." \
+  --tp 2 \
+  --max_new_tokens 50
+```
+
+Note:
+- You can also use local audio files: `--audio_path /path/to/audio.wav`
+
+See the [inference.sh](inference.sh) script for the full runnable commands.
+
+**Expected output:**
+```
+======== GENERATED TEXT OUTPUT ========
+Audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac
+Prompt: Describe what you hear in this audio.
+Generated: system
+You are a helpful assistant.
+user
+Audio 1:
+Describe what you hear in this audio.
+assistant
+I heard a man speaking in English with the phrase 'Mister Quiller is the apostle of the middle classes and we are glad to welcome his gospel.'
+=======================================
+```
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Usage:
+#   bash examples/models/audio_lm/qwen2_audio/inference.sh
+
+set -e
+
+export HF_MODEL="Qwen/Qwen2-Audio-7B-Instruct"
+
+AUDIO_URL="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac"
+
+echo "============================================"
+echo "Qwen2-Audio Megatron Bridge Inference Test"
+echo "============================================"
+
+echo ""
+echo "Direct inference from HuggingFace..."
+echo "Audio: ${AUDIO_URL}"
+echo ""
+
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+  examples/conversion/hf_to_megatron_generate_audio_lm.py \
+  --hf_model_path ${HF_MODEL} \
+  --audio_url "${AUDIO_URL}" \
+  --prompt "Describe what you hear in this audio." \
+  --tp 2 \
+  --max_new_tokens 50
+
+echo ""
+echo "============================================"
+echo "Inference complete!"
+echo "============================================"
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# ==============================================================================
+# Qwen2-Audio 7B SFT (Supervised Fine-Tuning) Script
+#
+# Usage:
+#   bash sft.sh
+#
+# Environment variables:
+#   WORKSPACE    — root dir for models/results (default: /workspace)
+#   NPROC        — number of GPUs per node (default: 8)
+#   HF_MODEL     — HuggingFace model path (default: Qwen/Qwen2-Audio-7B)
+# ==============================================================================
+#   WORKSPACE    — root dir for models/results (default: /workspace/Megatron-Bridge/examples/models/audio_lm/qwen2_audio)
+LOG_FILE=./qwen2_audio_7b_asr.log
+exec > >(tee "${LOG_FILE}") 2>&1
+
+export TORCHDYNAMO_DISABLE=1
+
+set -euo pipefail
+
+# Workspace directory for checkpoints and results
+WORKSPACE=${WORKSPACE:-/workspace/Megatron-Bridge/examples/models/audio_lm/qwen2_audio}
+NPROC=${NPROC:-8}
+HF_MODEL=${HF_MODEL:-Qwen/Qwen2-Audio-7B}
+
+# Before training, make sure to set WANDB_API_KEY or disable wandb logging
+# export WANDB_API_KEY=<your_wandb_api_key>
+# export WANDB_MODE=disabled
+
+# Common configurations
+MODEL_NAME=qwen2_audio_7b
+MEGATRON_CKPT_DIR=${WORKSPACE}/megatron_ckpts/${MODEL_NAME}
+
+# Convert HF checkpoint to Megatron format if not already done
+if [ ! -d "${MEGATRON_CKPT_DIR}/iter_0000000" ]; then
+    echo "Converting HF model to Megatron format..."
+    uv run --no-sync python examples/conversion/convert_checkpoints.py import \
+        --hf-model ${HF_MODEL} \
+        --megatron-path ${MEGATRON_CKPT_DIR}
+fi
+PRETRAINED_CHECKPOINT=${PRETRAINED_CHECKPOINT:-${MEGATRON_CKPT_DIR}}
+WANDB_PROJECT=megatron-bridge-${MODEL_NAME}
+
+# Training hyperparameters
+SEQ_LENGTH=16384
+TRAIN_ITERS=11250
+GLOBAL_BATCH_SIZE=32
+MICRO_BATCH_SIZE=4
+EVAL_INTERVAL=1000
+EVAL_ITERS=10
+LR=2e-5
+MIN_LR=2e-6
+LR_WARMUP_ITERS=5
+SAVE_INTERVAL=1000
+LOG_INTERVAL=1
+
+# TP/PP combinations: "TP,PP"
+PARALLELISM_CONFIGS=("1,1")
+
+for par_config in "${PARALLELISM_CONFIGS[@]}"; do
+    IFS=',' read -r TP PP <<< "$par_config"
+    echo "============================================================"
+    echo "  run_recipe.py | TP=${TP}, PP=${PP}"
+    echo "============================================================"
+    uv run --no-sync python -m torch.distributed.run --nproc_per_node=${NPROC} scripts/training/run_recipe.py \
+        --recipe qwen2_audio_7b_finetune_config \
+        --step_func audio_lm_step \
+        --hf_path ${HF_MODEL} \
+        checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \
+        checkpoint.save=${WORKSPACE}/exp/${MODEL_NAME}_sft_tp${TP}_pp${PP} \
+        checkpoint.save_interval=$SAVE_INTERVAL \
+        checkpoint.save_optim=False \
+        model.seq_length=$SEQ_LENGTH \
+        model.tensor_model_parallel_size=$TP \
+        model.pipeline_model_parallel_size=$PP \
+        model.freeze_language_model=false \
+        model.freeze_audio_model=false \
+        model.freeze_audio_projection=false \
+        train.train_iters=$TRAIN_ITERS \
+        train.global_batch_size=$GLOBAL_BATCH_SIZE \
+        train.micro_batch_size=$MICRO_BATCH_SIZE \
+        validation.eval_interval=$EVAL_INTERVAL \
+        validation.eval_iters=$EVAL_ITERS \
+        optimizer.lr=$LR \
+        optimizer.min_lr=$MIN_LR \
+        scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \
+        logger.log_interval=$LOG_INTERVAL \
+        logger.wandb_project=$WANDB_PROJECT \
+        logger.wandb_exp_name=${MODEL_NAME}_asr_tp${TP}_pp${PP} \
+        dataset.maker_name=make_default_audio_dataset \
+        "dataset.maker_kwargs.path_or_dataset=yuekai/aishell" \
+        "dataset.maker_kwargs.subset=train" \
+        "dataset.maker_kwargs.split=test" \
+        "+dataset.maker_kwargs.prompt='Detect the language and recognize the speech: <|zh|>'" \
+        "dataset.val_maker_kwargs.subset=dev" \
+        "dataset.val_maker_kwargs.split=test" \
+        dataset.skip_test=true \
+        dataset.pack_sequences_in_batch=true \
+        rng.seed=42 \
+        ddp.grad_reduce_in_fp32=false
+done
@@ -0,0 +1,88 @@
+# Qwen3-ASR - Audio Speech Recognition Model
+
+This directory contains example scripts for Qwen3-ASR audio speech recognition models.
+
+## Workspace Configuration
+
+All scripts use a `WORKSPACE` environment variable to define the base directory for checkpoints and results. By default, this is set to `/workspace`. You can override it:
+
+```bash
+export WORKSPACE=/your/custom/path
+```
+
+
+
+## Checkpoint Conversion
+
+### Import HF → Megatron
+To import the HF model to your desired Megatron path:
+```bash
+uv run python examples/conversion/convert_checkpoints.py import \
+  --hf-model Qwen/Qwen3-ASR-1.7B \
+  --megatron-path ${WORKSPACE}/models/Qwen3-ASR-1.7B
+```
+
+### Export Megatron → HF
+```bash
+uv run python examples/conversion/convert_checkpoints.py export \
+  --hf-model Qwen/Qwen3-ASR-1.7B \
+  --megatron-path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
+  --hf-path ${WORKSPACE}/models/Qwen3-ASR-1.7B-hf-export
+```
+
+### Round-trip Validation
+```bash
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+  examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
+  --hf-model-id Qwen/Qwen3-ASR-1.7B \
+  --megatron-load-path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
+  --trust-remote-code \
+  --tp 2 --pp 1
+```
+
+## Inference
+
+### Run Inference on HuggingFace Checkpoint
+
+```bash
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+  examples/conversion/hf_to_megatron_generate_audio_lm.py \
+  --hf_model_path Qwen/Qwen3-ASR-1.7B \
+  --audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
+  --prompt "" \
+  --tp 2 \
+  --max_new_tokens 50
+```
+
+### Run Inference on Converted Megatron Checkpoint
+
+```bash
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+  examples/conversion/hf_to_megatron_generate_audio_lm.py \
+  --hf_model_path Qwen/Qwen3-ASR-1.7B \
+  --megatron_model_path ${WORKSPACE}/models/Qwen3-ASR-1.7B/iter_0000000 \
+  --audio_url "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac" \
+  --prompt "" \
+  --tp 2 \
+  --max_new_tokens 50
+```
+
+Note:
+- `--megatron_model_path` is optional. If not specified, the script will convert the model in-memory and then run forward.
+- You can also use local audio files: `--audio_path /path/to/audio.wav`
+
+See the [inference.sh](inference.sh) script for the full runnable commands.
+
+**Expected output:**
+```
+======== GENERATED TEXT OUTPUT ========
+Audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac
+Prompt:
+Generated: system
+
+user
+
+assistant
+language English<asr_text>Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.
+=======================================
+```
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-d30c3ae5469fe3f6a64d4fd2e63b6e7f7844ea81`
	`1`	`+59fc89485f18c47038c0cb9aed65a35850030d34`