Commit 919c945
SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone (#15447)
* WIP: bringing Yifan's changes to main
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Add workaround for exp_manager issue
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Support reading indexed JSONL datasets with ShareGPT format
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Support reading indexed tarred datasets with ShareGPT format
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Refactor for compactness
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for real-life data
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for real-life data
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for real-life data
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for missing wids-meta.json
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for tarfile edge cases
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for real-world tar files
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* move salm llm init to configure_model
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix: delayed perception init
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Add AutomodelParallelStrategy for Automodel LLM support
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Replace HF Automodel with NeMo Automodel for SALM's LLM backbone
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Update salm default config with new options
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Init fixes
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix dtype initialization
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix mesh selection for speech encoder
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix for mismatched device_mesh axis names in gradient clipping - use automodel's utility
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix for using embed_tokens in FSDP context before running forward on full LLM
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Definitive fix for using embed_tokens outside of llm with fsdp
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* this version actually works with Automodel
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* fix from_pretrained with transformers v5
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix from_pretrained with transformers v5
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix generate/eval
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix to_hf
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for AutoTokenizer decoding in v5
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Flag to run configure_model() at the end of __init__ for safetensors converted models
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* preliminary: support distributed models in to_hf.py
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix passing automodel kwargs
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Enable inference with model parallelism
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix for lightning save_hyperparameters() call
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix for loading into DTensor
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Accelerate loading DTensor
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Accelerate loading DTensor
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Accelerate loading DTensor
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix for pe buffers not in ckpt (essentially strict=False)
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Add Nemotron Nano v3 prompt formatter with <think> reasoning support
Implements NemotronNanoV3PromptFormatter (NAME="nemotron-nano-v3") using
ChatML-style <|im_start|>/<|im_end|> template with encode_dialog override
that handles: auto-insert empty system turn, history thinking truncation,
<think></think> prepend for non-thinking assistant turns, and dynamic
inference prefix (thinking on/off). Includes Lhotse Cut integration via
registered_prompt_format_fn. Verified against HF apply_chat_template for
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (both string and token match).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Automodel LoRA support
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes for model parallel
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* LoRA fix
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* small ckpt conversion/inference fix
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Separate SALM and SALMAutomodel into independent classes
Restore salm.py to its original HF Transformers + PEFT LoRA implementation
from main, and extract the NeMo Automodel-based implementation into a new
SALMAutomodel class in salm_automodel.py. This keeps both backends available
and independent, with scripts auto-detecting the model class from config.json.
- salm.py: restored from main (eager init, HF PEFT, move_embedding)
- salm_automodel.py: new file with SALMAutomodel (deferred init, automodel LoRA)
- salm_train.py: selects model class via model.use_nemo_automodel config key
- salm_eval.py/salm_generate.py: auto-detect model class from config.json
- salm_automodel.yaml: new config for SALMAutomodel training
- Tests split into test_salm.py (CPU) and test_salm_automodel.py (CUDA)
- New functional test SPEECHLM_Automodel_Training_SALM.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix linters
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Add SALMAutomodel docs and speechlm2 pip extra
Add documentation for SALMAutomodel (NeMo Automodel variant of SALM)
across all speechlm2 doc pages: intro, models, configs, and
training_and_scaling. Create pip install nemo-toolkit[speechlm2] extra
that composes speechlm2-only (nemo_automodel git dep) + asr + tts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add SALMAutomodel tutorial notebook and fix EP/FSDP2 docs
Add tutorials/speechlm2/SpeechLM_With_NeMo_Automodel.ipynb covering
the full pipeline: data download, training, checkpoint conversion, and
evaluation with Nemotron Nano V3 MoE backbone on 2 GPUs.
Fix docs to clarify that Expert Parallelism reuses the FSDP2
data-parallel axis — dense layers are sharded via FSDP2 while MoE
layers use EP on the same GPUs, not a separate dimension.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix uv torch index conflict for speechlm2 extra
The docs CI runs `uv sync --all-extras --all-groups` which resolves
the speechlm2 extra pulling nemo_automodel from git. uv treats git
source deps as workspace members and applies their [tool.uv.sources],
causing a conflict: Automodel maps torch to per-platform indexes
while NeMo defaulted to PyPI for all platforms.
Add matching [tool.uv.sources] for torch to pyproject.toml and
regenerate uv.lock with nemo_automodel included.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove direction arg
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* fix linter
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fixes
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* fixes for trust_remote_code
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* Add explicit enable_thinking support to SALM eval paths
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* fix inference with ep_size=1 for automodel models
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Fixes
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Fixes for inference and tutorial
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* Remove deprecated activation_checkpointing parameter everywhere
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix CI
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Fix to_hf.py crash when run without torchrun
Guard dist.init_process_group on RANK env var presence so the script
works with plain `python` (single-file checkpoints) as well as
`torchrun` (distributed checkpoints).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Apply suggestions from code review
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
* Add flashoptim support and bf16-automodel half precision setup
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* patch flashoptim handling of unevenly sharded state dicts
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Reproducibility fix
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Refactor AutomodelPrecision to FlashPrecision to enable re-use by other collections in subsequent PRs
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* Address code review
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* disable linter
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix test
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* fix for torch.compile config
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Dataloader DP rank patch for Automodel's device_mesh
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* fix sloppy fix
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* fix CI HF tokenizer download issue
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Add tests for correct DP rank resolution in the dataloader
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
* xfail tests with corrupted tokenizer in CI
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Update test pytorch version safeguard
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fix new peft version requiring newer torchao than available in CI container
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Fixes
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
* Bump Automodel pin for transformers compat
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
---------
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: pzelasko <pzelasko@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>1 parent 73a5e7d commit 919c945
61 files changed
Lines changed: 10327 additions & 2296 deletions
File tree
- docs/source
- features
- speechlm2
- examples/speechlm2
- conf
- nemo
- collections
- common
- data
- lhotse
- prompts
- tokenizers/huggingface
- speechlm2
- data
- models
- parts
- core
- classes
- mixins
- optim
- utils
- callbacks
- requirements
- tests
- collections
- common
- prompt_formatters
- speechlm2
- core
- functional_tests
- utils
- tutorials/speechlm2
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
26 | 36 | | |
27 | 37 | | |
28 | 38 | | |
| |||
37 | 47 | | |
38 | 48 | | |
39 | 49 | | |
| 50 | + | |
| 51 | + | |
40 | 52 | | |
41 | 53 | | |
42 | 54 | | |
| |||
71 | 83 | | |
72 | 84 | | |
73 | 85 | | |
74 | | - | |
| 86 | + | |
75 | 87 | | |
76 | 88 | | |
77 | 89 | | |
| |||
94 | 106 | | |
95 | 107 | | |
96 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
| |||
94 | 97 | | |
95 | 98 | | |
96 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
97 | 162 | | |
98 | 163 | | |
99 | 164 | | |
| |||
264 | 329 | | |
265 | 330 | | |
266 | 331 | | |
| 332 | + | |
267 | 333 | | |
268 | 334 | | |
269 | 335 | | |
| |||
291 | 357 | | |
292 | 358 | | |
293 | 359 | | |
| 360 | + | |
294 | 361 | | |
295 | 362 | | |
296 | 363 | | |
| |||
307 | 374 | | |
308 | 375 | | |
309 | 376 | | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
310 | 381 | | |
311 | 382 | | |
312 | 383 | | |
| |||
316 | 387 | | |
317 | 388 | | |
318 | 389 | | |
319 | | - | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
7 | 9 | | |
| 10 | + | |
8 | 11 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
17 | 21 | | |
18 | 22 | | |
19 | 23 | | |
| |||
71 | 75 | | |
72 | 76 | | |
73 | 77 | | |
74 | | - | |
| 78 | + | |
75 | 79 | | |
76 | 80 | | |
77 | 81 | | |
| |||
83 | 87 | | |
84 | 88 | | |
85 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
86 | 127 | | |
87 | 128 | | |
88 | 129 | | |
| |||
310 | 351 | | |
311 | 352 | | |
312 | 353 | | |
313 | | - | |
| 354 | + | |
314 | 355 | | |
315 | 356 | | |
316 | 357 | | |
317 | 358 | | |
318 | 359 | | |
319 | 360 | | |
320 | 361 | | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
321 | 375 | | |
322 | 376 | | |
323 | 377 | | |
324 | 378 | | |
325 | 379 | | |
326 | 380 | | |
327 | 381 | | |
328 | | - | |
| 382 | + | |
329 | 383 | | |
330 | 384 | | |
331 | 385 | | |
| |||
0 commit comments