Skip to content

feat: add ASR CTC and seq2seq support#1671

Draft
rylativity wants to merge 2 commits intomainfrom
asr-support
Draft

feat: add ASR CTC and seq2seq support#1671
rylativity wants to merge 2 commits intomainfrom
asr-support

Conversation

@rylativity
Copy link
Copy Markdown

What does this PR do ?

[EXPERIMENTAL FEATURE] Adds ASR support for Whisper (5 variants) and Parakeet CTC (2 variants) with distributed training and PEFT and lays groundwork for incorporating additional ASR models in Nemo Automodel

Changelog

New Model Support:

  • Add NeMoAutoModelForSpeechSeq2Seq for encoder-decoder ASR models (Whisper family: tiny/base/small/medium/large-v3)
  • Add NeMoAutoModelForCTC for CTC-based ASR models (Parakeet CTC: 0.6B/1.1B)

New Components:

  • Add ASR dataset component with LibriSpeech, Common Voice, and custom dataset loaders (nemo_automodel/components/datasets/asr/)
  • Add processor-specific collate functions with automatic mel-spectrogram extraction and tokenization (nemo_automodel/components/datasets/asr/collate_fns.py)
  • Implement collate function registry for automatic processor selection

New Recipe:

  • Add ASR fine-tuning recipe with support for both CTC and Seq2Seq loss computation (nemo_automodel/recipes/asr/finetune.py)
  • Implement validation loop with loss tracking and metrics logging
  • Add pipeline parallelism support via AutoPipeline

Example Configurations:

  • Add 8 YAML configs for Whisper and Parakeet models with full and PEFT fine-tuning examples
  • Add finetune.py entry point script for ASR examples (examples/asr_finetune/finetune.py)
  • Include distributed training configurations with device mesh setup

Testing:

  • Add 4 functional tests covering Whisper and Parakeet fine-tuning (full and PEFT) (tests/functional_tests/asr_finetune/)
  • Add comprehensive unit tests for dataset loaders and collate functions (tests/unit_tests/datasets/asr/)
  • Include pytest test class with parameterized model/PEFT configurations

Documentation:

  • Add comprehensive README for ASR fine-tuning with quick start examples, PEFT guide, and troubleshooting (examples/asr_finetune/README.md)
  • Update root README with ASR examples and usage
  • Add inline documentation for ASR model classes and dataset utilities

Dependencies:

  • Add librosa and torchcodec as ASR extras in pyproject.toml
  • Update Docker build with ASR-specific dependencies

Other:

  • Update model exports in nemo_automodel/init.py and _transformers/init.py
  • Ensure component independence (no cross-component imports, verified by lint-imports)
  • Add copyright year 2026 across new files

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Linting/formatting passed
  • Commits DCO signed
  • Confirmed documentation builds successfully

(Note: Previously opened in PR #1263)

- Add NeMoAutoModelForSpeechSeq2Seq and NeMoAutoModelForCTC auto classes
- Add ASR dataset loaders for LibriSpeech, Common Voice, and custom datasets
- Add Whisper and Parakeet collate functions with mel spectrogram processing
- Add example configs and finetune recipe for Whisper and Parakeet models
- Add functional and unit tests for ASR models and datasets
- Add ASR PEFT config examples
- Update docs and model coverage overview

Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 3, 2026

/ok to test 1dad056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants